WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 10 days ago • 45
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents Paper • 2605.12481 • Published 9 days ago • 27
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents Paper • 2605.12481 • Published 9 days ago • 27
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents Paper • 2605.12481 • Published 9 days ago • 27
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 24 days ago • 22
Synthetic Computers at Scale for Long-Horizon Productivity Simulation Paper • 2604.28181 • Published 21 days ago • 19
TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents Paper • 2604.24005 • Published 24 days ago • 8
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published about 1 month ago • 84
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published Apr 13 • 143
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor Paper • 2604.04215 • Published Apr 5 • 21
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor Paper • 2604.04215 • Published Apr 5 • 21