agent-benchmark
Framework for measuring and tracking agent response quality over time. Detects regressions before they reach production. Use when evaluating agent changes, auditing quality, or establishing performanc
Also installable via skills CLI
npx skills add vibeeval/vibecosystem/skills/agent-benchmark