Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding
Paper • 2606.27922 • Published
Model checkpoints for Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding.
Reflect-R1-SFT-6000/ Cold-start SFT checkpoint.
Reflect-R1-GRPO-Final/ Final SD-GRPO checkpoint.
Both checkpoints are based on Qwen2.5-VL-7B and include sharded safetensors weights together with the corresponding tokenizer and processor configuration files.
@article{chen2026reflectr1,
title = {Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding},
author = {Shuimu Chen and Yuteng Chen and Yuanshen Guan and Zebang Cheng and Zeyu Zhang and Shengqian Qin and Bin Xia and Jiaran Li and Wenming Yang and Fei Ma},
journal = {arXiv preprint arXiv:2606.27922},
year = {2026}
}