unsloth-dpo

Direct Preference Optimization (DPO) in Unsloth provides a way to align models with human preferences using paired data (chosen/rejected). Unsloth optimizes this process by allowing refmodel=None, sig

by cuba6112· Repository·other
Also installable via skills CLI
npx skills add cuba6112/skillfactory/skills/unsloth-dpo

Source

Path:skills/unsloth-dpo(main)

Related in other

unsloth-dpo | AgentArea Skills