Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 2 days ago • 26
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published Dec 28, 2025 • 10
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 48
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published Dec 28, 2025 • 10
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Paper • 2502.12558 • Published Feb 18, 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval Paper • 2502.11431 • Published Feb 17, 2025 • 1
Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification Paper • 2506.19225 • Published Jun 24, 2025