evaluation-harness

Builds repeatable evaluation systems with golden datasets, scoring rubrics, pass/fail thresholds, and regression reports. Use for "LLM evaluation", "testing AI systems", "quality assurance", or "model

by patricio0312rev· Repository·data
Also installable via skills CLI
npx skills add patricio0312rev/skillset/templates/ai-engineering/evaluation-harness

Source

Path:templates/ai-engineering/evaluation-harness(main)

Related in data

evaluation-harness | AgentArea Skills