Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Paper • 2605.25052 • Published 9 days ago • 14
DCAgent3/dev_set_v2_rl__24GPU_base_excl_timeouts__exp_rpt_pymethods2test_large__GLM_4_7_c2148a8d Viewer • Updated 5 days ago • 296 • 55 • 1
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering Paper • 2605.17526 • Published 16 days ago • 7
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 26 days ago • 231
Forge-UGC: FX optimization and register-graph engine for universal graph compiler Paper • 2604.16498 • Published Apr 14 • 5
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630