OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs Paper • 2606.03890 • Published 3 days ago • 23
Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments Paper • 2605.27209 • Published 10 days ago • 16
Data-Gouv-FR/temps-dattente-par-train-en-correspondance-a-tours-depuis-vers-le-mans Viewer • Updated 4 days ago • 680 • 32 • 1
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing Paper • 2605.23986 • Published 20 days ago • 17
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 15 days ago • 177
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 102
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published Mar 25 • 183
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 248