Bernini: Latent Semantic Planning for Video Diffusion Paper • 2605.22344 • Published 12 days ago • 14
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 5 days ago • 70
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 7 days ago • 20
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published 8 days ago • 37
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 7 days ago • 130
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 15 days ago • 77
APRES: An Agentic Paper Revision and Evaluation System Paper • 2603.03142 • Published Mar 3 • 3
jinaai/jina-embeddings-v5-omni-small Feature Extraction • 2B • Updated 5 days ago • 134k • 65
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding Paper • 2605.09874 • Published 22 days ago • 2
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition Paper • 2605.08384 • Published 25 days ago • 11
jina-embeddings-v5-omni Collection Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated 20 days ago • 36
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models Paper • 2605.08735 • Published 24 days ago • 70
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 26 days ago • 46