evaluation-harness

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model

by patricio0312rev· Repository·data

Run in AgentArea Browse All Skills

Also installable via skills CLI

npx skills add patricio0312rev/skillset/templates/ai-engineering/evaluation-harness

Source

Repo:SkillsMP + GitHub Raw

Path:templates/ai-engineering/evaluation-harness(main)

Related in data

electron-chromium-upgrade-electron-electron

Guide for performing Chromium version upgrades in the Electron project. Use when working on the roller/chromium/main branch to fix...

by electron

119,879

qiuzhi

指导Claude按照二哥的风格撰写求职类文章，包括公司薪资爆料、年终奖盘点、求职攻略、offer选择建议等内容。

by itwanger

16,619

awesome-ai-agents-illacloud-illa-builder

The chat will provide safety guides, equipment suggestions, reviews, and techniques, with context-driven summary and multimedia to...

by illacloud

12,356