When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper โข 2602.10179 โข Published 3 days ago โข 6
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper โข 2504.06148 โข Published Apr 8, 2025 โข 13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper โข 2503.20198 โข Published Mar 26, 2025 โข 4
UniVTG: Towards Unified Video-Language Temporal Grounding Paper โข 2307.16715 โข Published Jul 31, 2023 โข 11