nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 Text Generation • 124B • Updated about 4 hours ago • 2.85k • 142
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published 11 days ago • 33
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day Dec 8, 2025 • 53
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 305
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 52
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 63