TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 8 days ago • 51
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 18 days ago • 36
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks Paper • 2509.23738 • Published Sep 28, 2025 • 2
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation Paper • 2509.23736 • Published Sep 28, 2025 • 2
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert Paper • 2510.03896 • Published Oct 4, 2025
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 26
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 23
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 23
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 23
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 26
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 26
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published Apr 3 • 233
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention Paper • 2602.05847 • Published Feb 5 • 12