Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Paper • 2605.24213 • Published 10 days ago • 10
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 20 days ago • 195
IntentGrasp: A Comprehensive Benchmark for Intent Understanding Paper • 2605.06832 • Published 25 days ago • 8
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 25 days ago • 231
openai/whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 8.26M • • 3.05k
DCAgent/e1_embedding_d1_original_sandboxes_glm_4.7_traces_jupiter Viewer • Updated Apr 12 • 12.1k • 45
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published Apr 9 • 115