llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy
Also installable via skills CLI
npx skills add ovachiever/droid-tings/skills/llamaguard