From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 3 days ago • 65
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 3 days ago • 65 • 3
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 1 day ago • 6
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 3 days ago • 65
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 3 days ago • 65
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 1 day ago • 6
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 5 days ago • 24
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published 4 days ago • 64
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects Paper • 2605.21572 • Published 10 days ago • 51
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 75
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? Paper • 2602.04802 • Published Feb 4 • 2
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 18 days ago • 191
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 9 items • Updated 1 day ago • 67