OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 94
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29, 2025 • 53
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 12 days ago • 185
World Model for Robot Learning: A Comprehensive Survey Paper • 2605.00080 • Published 24 days ago • 16
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 11 days ago • 96
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 203
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 17 days ago • 51
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 16 days ago • 97
HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness Paper • 2605.02396 • Published 20 days ago • 23
GigaWorld-Policy: An Efficient Action-Centered World--Action Model Paper • 2603.17240 • Published Mar 18 • 26
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 27 days ago • 118
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 27 days ago • 71
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 27 days ago • 67
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 30 days ago • 226