Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 15 days ago • 233
FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published 15 days ago • 40
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 96
view article Article Mitigating False Negatives in Multiple Negatives Ranking Loss for Retriever Training May 25, 2025 • 28
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 136
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published Jun 17, 2024 • 40
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published Sep 23, 2024 • 30
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6, 2025 • 28
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35