See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published 5 days ago • 19
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published Feb 13 • 59
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory Paper • 2602.02393 • Published Feb 2 • 19
Active Intelligence in Video Avatars via Closed-loop World Modeling Paper • 2512.20615 • Published Dec 23, 2025 • 9
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22, 2025 • 27