unsloth-orpo
Unsloth-orpo facilitates one-step preference alignment using Odds Ratio Preference Optimization (ORPO). Unlike DPO, which requires a separate reference model, ORPO incorporates a penalty for disfavore
Also installable via skills CLI
npx skills add cuba6112/skillfactory/skills/unsloth-orpo