unsloth-orpo

Unsloth-orpo facilitates one-step preference alignment using Odds Ratio Preference Optimization (ORPO). Unlike DPO, which requires a separate reference model, ORPO incorporates a penalty for disfavore

by cuba6112· Repository·other
Also installable via skills CLI
npx skills add cuba6112/skillfactory/skills/unsloth-orpo

Source

Path:skills/unsloth-orpo/SKILL.md(main)

Related in other

unsloth-orpo | AgentArea Skills