OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published 4 days ago • 30
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 3 days ago • 33
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Paper • 2604.08545 • Published 3 days ago • 33
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published 3 days ago • 36
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 3 days ago • 37
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 3 days ago • 39
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 4 days ago • 61
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 23 days ago • 330
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 18 days ago • 96
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization Paper • 2603.12743 • Published 30 days ago • 3
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Paper • 2307.00862 • Published Jul 3, 2023 • 1