EvoEval

university

https://evo-eval.github.io/

AI & ML interests

None defined yet.

Organization Card

Community About org cards

EvoEval: Evolving Coding Benchmarks via LLM

EvoEval¹ is a holistic benchmark suite created by evolving HumanEval problems:

🔥 Contains 828 new problems across 5 🌠 semantic-altering and 2 ⭐ semantic-preserving benchmarks
🔮 Allows evaluation/comparison across different dimensions and problem types (i.e., Difficult, Creative or Tool Use problems). See our visualization tool for ready-to-use comparison
🏆 Complete with leaderboard, groundtruth solutions, robust testcases and evaluation scripts to easily fit into your evaluation pipeline
🤖 Generated LLM code samples from >50 different models to save you time in running experiments

¹ coincidentally similar pronunciation with 😈 EvilEval

GitHub: evo-eval/evoeval
Webpage: evo-eval.github.io
Leaderboard: evo-eval.github.io/leaderboard.html
Visualization: evo-eval.github.io/visualization.html
Paper: arXiv
PyPI: evoeval

models 0

None public yet

datasets 5

evoeval/EvoEval_tool_use

Viewer • Updated Mar 27, 2024 • 100 • 15 • 3

evoeval/EvoEval_combine

Viewer • Updated Mar 27, 2024 • 100 • 7

evoeval/EvoEval_subtle

Viewer • Updated Mar 27, 2024 • 100 • 31

evoeval/EvoEval_creative

Viewer • Updated Mar 27, 2024 • 100 • 11

evoeval/EvoEval_difficult

Viewer • Updated Mar 27, 2024 • 100 • 17 • 2