unsloth-grpo

Unsloth-grpo enables training of reasoning models using Group Relative Policy Optimization (GRPO). This technique replaces traditional PPO Reward and Value models with group statistics, achieving 8x m

by cuba6112· Repository·other
Also installable via skills CLI
npx skills add cuba6112/skillfactory/skills/unsloth-grpo

Source

Path:skills/unsloth-grpo(main)

Related in other

unsloth-grpo | AgentArea Skills