-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11
Hleb Stenin
halaction
·
AI & ML interests
None yet
Recent Activity
updated a dataset 15 days ago
halaction/humor-generation liked a Space 15 days ago
mteb/leaderboard updated a collection 18 days ago
Reading ListOrganizations
None yet