OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model Paper • 2604.20806 • Published Apr 22 • 1
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 12 days ago • 39
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 14 days ago • 30
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published Apr 26 • 33
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis Paper • 2604.15093 • Published Apr 16 • 30
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published Apr 13 • 66
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction Paper • 2603.00610 • Published Feb 28 • 35
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments Paper • 2602.01244 • Published Feb 1 • 16
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published Feb 5 • 61
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published Feb 9 • 159
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published Jan 29 • 10
SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Paper • 2602.02472 • Published Feb 2 • 47
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 8
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published Jan 29 • 42
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models Paper • 2503.09567 • Published Mar 12, 2025 • 1