Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper โข 2602.05261 โข Published 11 days ago โข 48
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper โข 2512.16676 โข Published Dec 18, 2025 โข 218
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper โข 2512.17532 โข Published Dec 19, 2025 โข 67
view post Post 2440 NEW: @mistralai released a fantastic family of multimodal models, Ministral 3. You can fine-tune them for free on Colab using TRL โก๏ธ, supporting both SFT and GRPOLink to the notebooks:- SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_ministral3_vl.ipynb- GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb- TRL and more examples: https://huggingface.co/docs/trl/index See translation 2 replies ยท ๐ฅ 8 8 + Reply