policy-gradient-methods
Master REINFORCE, PPO, TRPO - direct policy optimization with trust regions
Also installable via skills CLI
npx skills add tachyon-beep/hamlet/.claude/skills/yzmir-deep-rl/skills/policy-gradient-methods
Source
Path:
.claude/skills/yzmir-deep-rl/skills/policy-gradient-methods(main)