Revisual-R1
Collection
πReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculumβcold-start pre-training, multimodal reinforcement. β’ 6 items β’ Updated
β’ 3
One cold-start, two RL stages, endless reasoning power.
SOTA on 9 tough benchmarks covering visualβmath + text reasoning.
Three-Stage SRO Training
PAD (Prioritized Advantage Distillation) keeps gradients alive.
Efficient-Length Reward = concise, self-reflective CoT.
@article{chen2025advancing,
title={Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning},
author={Chen, Shuang and Guo, Yue and Su, Zhaochen and Li, Yafu and Wu, Yulun and Chen, Jiacheng and Chen, Jiayu and Wang, Weijie and Qu, Xiaoye and Cheng, Yu},
journal={arXiv preprint arXiv:2506.04207},
year={2025}
}
Take ReVisual-R1 for a spin and let us know what you build! π―