How Easily do Irrelevant Inputs Skew the Responses of Large Language Models? Paper • 2404.03302 • Published Apr 4, 2024 • 2
From Persona to Personalization: A Survey on Role-Playing Language Agents Paper • 2404.18231 • Published Apr 28, 2024 • 1
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 30 days ago • 189
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 30 days ago • 189
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 110
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 403
Is Extending Modality The Right Path Towards Omni-Modality? Paper • 2506.01872 • Published Jun 2, 2025 • 24
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31, 2025 • 30