policy-gradient-methods

Master REINFORCE, PPO, TRPO - direct policy optimization with trust regions

Also installable via skills CLI

npx skills add tachyon-beep/hamlet/.claude/skills/yzmir-deep-rl/skills/policy-gradient-methods

Source

Path:.claude/skills/yzmir-deep-rl/skills/policy-gradient-methods(main)

Use when you have lint errors, formatting issues, or before committing code to ensure it passes CI.

This skill should be used when the user asks to "update documentation for my changes", "check docs for this PR", "what d...

Write docstrings for PyTorch functions and methods following PyTorch conventions. Use when writing or updating docstring...