SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 3 days ago • 58
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories Paper • 2604.15311 • Published Apr 16 • 13
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25, 2025 • 36
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published Feb 24, 2025 • 79
Sleeping Featured 123 CountGD_Multi-Modal_Open-World_Counting 🚀 123 Count objects in images using text and example boxes