ScaleAI/SWE-bench_Pro
Viewer
•
Updated
•
731
•
19.8k
•
48
None defined yet.
Agentic Rubrics as Contextual Verifiers for SWE Agents
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents