auto-arena

Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-gener

by agentscope-ai· Repository·other
Also installable via skills CLI
npx skills add agentscope-ai/OpenJudge/skills/auto-arena

Source

Path:skills/auto-arena/SKILL.md(main)

Related in other

auto-arena | AgentArea Skills