AutoMedBench: Towards Medical AutoResearch with Agentic AI Models Paper • 2606.01961 • Published 23 days ago • 27
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Paper • 2605.26029 • Published 29 days ago • 18
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents Paper • 2605.17873 • Published May 18 • 12
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published May 19 • 85
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help? Paper • 2604.09408 • Published Apr 29 • 5
Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback Paper • 2605.03848 • Published May 5 • 6