BigCodeArena: Judging code generations end to end with code executions
•
21
None defined yet.
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Compare two AI models by sending them code and seeing their responses
Explore code-generation model leaderboards and task details
Compare two AI models by sending them code and seeing their responses
Explore code-generation model leaderboards and task details