policy-gradient-methods
Master REINFORCE, PPO, TRPO - direct policy optimization with trust regions
Also installable via skills CLI
npx skills add tachyon-beep/hamlet/development/policy-gradient-methods
Master REINFORCE, PPO, TRPO - direct policy optimization with trust regions