Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published 25 days ago • 26
Monthly-SWEBench Collection A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues. • 2 items • Updated 20 days ago • 1
Monthly-SWEBench Collection A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues. • 2 items • Updated 20 days ago • 1
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32