Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 6 days ago • 76 • 5
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 10 days ago • 80 • 3
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published 24 days ago • 31 • 3
When Models Manipulate Manifolds: The Geometry of a Counting Task Paper • 2601.04480 • Published Jan 8 • 4 • 1
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 313 • 8
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 343 • 3
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 106 • 5
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 138 • 7
Learning a Generative Meta-Model of LLM Activations Paper • 2602.06964 • Published about 1 month ago • 3 • 3
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229 • 9
Guiding a Diffusion Transformer with the Internal Dynamics of Itself Paper • 2512.24176 • Published Dec 30, 2025 • 8 • 4