AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
datasets 14
benchflow/env0-experiment-trajectories
Updated • 4.9k
benchflow/env0-qwen35-9b-full1703-prime-sft
Viewer • Updated • 1.7k
benchflow/env0-prime-sft-smoke10-arrow
Viewer • Updated • 10 • 5
benchflow/env0-prime-sft-smoke10
Viewer • Updated • 10 • 4
benchflow/skillsbench
Updated • 4.15k • 6
benchflow/skillsbench-leaderboard
Updated • 12.3k • 1
benchflow/benchmarks
Updated • 49
benchflow/skillsbench-research-artifacts
Updated • 35
benchflow/skillsbench-trajectories-apr2026
Updated • 281 • 1
benchflow/skillsbench-data
Viewer • Updated • 94.3k • 231