trl-fine-tuning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align

by Orchestra-Research· Repository·other

Run in AgentArea Browse All Skills

Also installable via skills CLI

npx skills add Orchestra-Research/AI-Research-SKILLs/06-post-training/trl-fine-tuning

Source

Repo:github.com/Orchestra-Research/AI-Research-SKILLs

Path:06-post-training/trl-fine-tuning/SKILL.md(main)

Related in other

agent-memory-yamadashy-repomix

Use this skill when the user asks to save, remember, recall, or organize memories. Triggers on: 'remember this', 'save this', 'not...

by yamadashy

21,427

task-execution-engine

CLI tool for configuring and monitoring Claude Code

by davila7

18,218

qiuzhi

指导Claude按照二哥的风格撰写求职类文章，包括公司薪资爆料、年终奖盘点、求职攻略、offer选择建议等内容。

by itwanger

16,619