Learning from Language Feedback via Variational Policy Distillation Paper • 2605.15113 • Published 29 days ago • 10
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 195
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization Paper • 2605.09996 • Published May 11 • 8
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning Paper • 2605.00347 • Published May 1 • 16
RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing Paper • 2604.23644 • Published Apr 26 • 5
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models Paper • 2511.10262 • Published Apr 17 • 2
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published Apr 5 • 53