LMMs-Lab

community

https://www.lmms-lab.com/

lmmslab

EvolvingLMMs-Lab

Activity Feed

AI & ML interests

Feeling and building the multimodal intelligence.

Recent Activity

xiangan authored a paper 2 days ago

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

xiangan authored a paper 2 days ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

THUdyh authored a paper 2 days ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

View all activity

Papers

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

View all Papers

lmms-lab 's collections 19

LLaVA-OneVision-2

lmms-lab-encoder/LLaVA-OneVision-2-8B-Instruct

Image-Text-to-Text • 9B • Updated 9 days ago • 4.87k • 7
mvp-lab/LLaVA-OneVision-2-Data

Viewer • Updated 19 days ago • 24 • 205k • 24

LongVT

Runtime error

Agents

3

LongVT Demo

🎬

3

Analyze long videos and answer questions about them
longvideotool/LongVT-RL

Video-Text-to-Text • Updated Dec 4, 2025 • 8 • 3
longvideotool/LongVT-SFT

Video-Text-to-Text • Updated Dec 4, 2025 • 10 • 1
longvideotool/LongVT-RFT

Video-Text-to-Text • Updated Dec 4, 2025 • 162 • 1

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Viewer • Updated Nov 21, 2025 • 21.9M • 92.9k • 75
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M

Viewer • Updated Nov 24, 2025 • 91.5M • 147k • 71
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 21, 2025 • 74.9k • 64
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated Feb 6 • 3.07k • 18

MMSearch-R1

MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.

lmms-lab/MMSearch-R1-7B-0807

8B • Updated Aug 7, 2025 • 4
lmms-lab/MMSearch-R1-7B

8B • Updated Jul 30, 2025 • 27 • 9
lmms-lab/FVQA

Viewer • Updated Aug 9, 2025 • 6.66k • 502 • 8
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 64

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5, 2025 • 46
Runtime error

Agents

14

EgoGPT

👁

14

Analyze video to describe actions and transcribe audio
lmms-lab/EgoIT-99K

Viewer • Updated Mar 7, 2025 • 199k • 3.75k • 9
lmms-lab/EgoLife

Viewer • Updated Mar 13, 2025 • 32k • 37.6k • 18

Multimodal-SAE

The collection of the sae that hooked on llava

Build error

Agents

9

Multimodal SAE

💬

9

Demo for Multimodal-SAE
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19
lmms-lab/llava-sae-explanations-5k

Viewer • Updated Nov 22, 2024 • 9.8k • 263 • 6
lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated Nov 26, 2024 • 173 • 8

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 23.9k • 195
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 20.5k • 128
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 674 • 21

LMMs-Eval

Dataset Collection of LMMs-Eval

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
lmms-lab/VQAv2

Viewer • Updated Jan 26, 2024 • 770k • 22.9k • 36
lmms-lab/MME

Viewer • Updated Dec 23, 2023 • 2.37k • 40.6k • 35
lmms-lab/DocVQA

Viewer • Updated Apr 18, 2024 • 16.6k • 30.9k • 80

LLaVA-Next-Interleave

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
lmms-lab/llava-next-interleave-qwen-7b

Text Generation • 8B • Updated Jul 24, 2024 • 159 • 27
lmms-lab/llava-next-interleave-qwen-7b-dpo

Text Generation • 8B • Updated Jul 12, 2024 • 39 • 12
lmms-lab/M4-Instruct-Data

Updated Jul 21, 2024 • 1.28k • 79

LMMs-Eval-Lite

Making Lite version of the dataset to accelerate holistic evaluation during model development!

lmms-lab/LMMs-Eval-Lite

Viewer • Updated Jul 4, 2024 • 8.5k • 9.06k • 7
lmms-lab/llava-bench-in-the-wild

Viewer • Updated Mar 8, 2024 • 60 • 4.85k • 10
lmms-lab/CMMMU

Viewer • Updated Mar 8, 2024 • 12k • 1.25k • 4
lmms-lab/MMMU

Viewer • Updated Mar 8, 2024 • 11.6k • 30.2k • 7

OneVision-Encoder

HEVC-Style Vision Transformer

lmms-lab-encoder/onevision-encoder-large

0.3B • Updated Feb 5 • 824 • 14
lmms-lab-encoder/onevision-encoder-large-lang

Image Feature Extraction • 0.3B • Updated 1 day ago • 371 • 8

OpenMMReasoner

OpenMMReasoner/OpenMMReasoner-ColdStart

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 15 • 3
OpenMMReasoner/OpenMMReasoner-RL

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 34 • 17
OpenMMReasoner/OpenMMReasoner-SFT-874K

Viewer • Updated Dec 30, 2025 • 874k • 462 • 8
OpenMMReasoner/OpenMMReasoner-RL-74K

Viewer • Updated Nov 25, 2025 • 74.7k • 321 • 10

LLaVA-Critic-R1

lmms-lab/LLaVA-Critic-R1-7B

8B • Updated Jul 19, 2025 • 48
lmms-lab/LLaVA-Critic-R1-7B-Plus-Qwen

8B • Updated Jul 26, 2025 • 106 • 5
lmms-lab/LLaVA-Critic-R1-7B-Plus-Mimo

8B • Updated Aug 28, 2025 • 10 • 1
lmms-lab/LLaVA-Critic-R1-7B-LLaMA32v

11B • Updated Aug 28, 2025 • 4

Aero-1-Audio

Runtime error

Agents

43

Aero 1 Audio Demo

💬

43

Demo for Aero-1-Audio
lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7, 2025 • 2.57k • 90

VideoMMMU

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23, 2025 • 24
lmms-lab/VideoMMMU

Viewer • Updated May 5, 2025 • 900 • 2.56k • 14

LLaVA-Critic

as a general evaluator for assessing model performance

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
lmms-lab/llava-critic-7b

8B • Updated Oct 4, 2024 • 295 • 15
lmms-lab/llava-critic-72b

73B • Updated Oct 4, 2024 • 9 • 15
lmms-lab/llava-critic-113k

Viewer • Updated Oct 5, 2024 • 113k • 973 • 28

LLaVA-OneVision

a model good at arbitrary types of visual input

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
lmms-lab/LLaVA-OneVision-Mid-Data

Viewer • Updated Aug 26, 2024 • 563k • 241 • 21
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 13.2k • 236
lmms-lab/LLaVA-NeXT-Data

Viewer • Updated Aug 30, 2024 • 779k • 9.62k • 46

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
lmms-lab/LongVA-7B

Text Generation • 8B • Updated Jun 26, 2024 • 946 • 15
lmms-lab/LongVA-7B-DPO

Text Generation • 8B • Updated Jun 26, 2024 • 247 • 10
lmms-lab/v_niah_needles

Viewer • Updated Jun 15, 2024 • 5 • 17 • 4

LLaVA-NeXT

Some powerful image models.

lmms-lab/llava-next-110b

Text Generation • 112B • Updated May 14, 2024 • 10 • 21
lmms-lab/llava-next-72b

Text Generation • 73B • Updated Aug 22, 2024 • 37 • 14
lmms-lab/llava-next-qwen-32b

Text Generation • 33B • Updated Jul 16, 2024 • 24 • 7
lmms-lab/llama3-llava-next-8b

Text Generation • 8B • Updated Aug 17, 2024 • 1.93k • 106