FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching Paper • 2604.06757 • Published Apr 8 • 10
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper • 2602.10179 • Published Feb 10 • 6
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper • 2602.10179 • Published Feb 10 • 6
Olaf-World: Orienting Latent Actions for Video World Modeling Paper • 2602.10104 • Published Feb 10 • 27
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published Nov 14, 2025 • 47
🔱 Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs • 32 items • Updated Mar 2 • 30
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 103
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper • 2511.01678 • Published Nov 3, 2025 • 38
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper • 2510.17932 • Published Oct 20, 2025 • 8
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6, 2025 • 120