StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 7 days ago • 15
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 7 days ago • 15
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 7 days ago • 15
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 18 days ago • 51
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 18 days ago • 51
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Paper • 2312.16693 • Published Dec 27, 2023 • 14
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6, 2024 • 25
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published Jan 23 • 33
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published Jan 21 • 45