grey-haven-evaluation
Evaluate LLM outputs with multi-dimensional rubrics, handle non-determinism, and implement LLM-as-judge patterns. Essential for production LLM systems. Use when testing prompts, validating outputs, co
Also installable via skills CLI
npx skills add greyhaven-ai/claude-code-config/product/grey-haven-evaluation