Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis Paper • 2605.18451 • Published 4 days ago • 40
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions Paper • 2308.09936 • Published Aug 19, 2023 • 1
Matryoshka Query Transformer for Large Vision-Language Models Paper • 2405.19315 • Published May 29, 2024 • 1
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models Paper • 2410.08182 • Published Oct 10, 2024
Verbalized Representation Learning for Interpretable Few-Shot Generalization Paper • 2411.18651 • Published Nov 27, 2024
Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8, 2025 • 16