Implement agent evaluation and safety gates using MLflow 3.x. Use for creating LLM-as-Judge scorers, evaluation datasets, quality gates, tracing, and continuous evaluation. Triggers on "evaluate agent
skills/testing/agent-evaluation-mlflow/SKILL.md(main)