-
-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add REINFORCE implementation tutorial #155
Add REINFORCE implementation tutorial #155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tutorial, it looks very helpful.
Could you fix the pre-commit issues and address the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a number of issues when I build the tutorial, could you build the tutorial (see the readme.md) and look at the previous tutorials to fix the issues.
reinforce_reacher_gym_v26.rst:2: WARNING: Field list ends without a blank line; unexpected unindent.
reinforce_reacher_gym_v26.rst:14: ERROR: Unexpected indentation.
reinforce_reacher_gym_v26.rst:25: WARNING: Block quote ends without a blank line; unexpected unindent.
reinforce_reacher_gym_v26.rst:220: ERROR: Unexpected indentation.
reinforce_reacher_gym_v26.rst:224: WARNING: Definition list ends without a blank line; unexpected unindent.
I have updated the files as per the readme.md |
No worries, currently when I build the tutorial, on the left hand side, all of the titles appear here. |
I have modified the structure of the titles and code following the Blackjack tutorial. And running through the tests mentioned in readme.md I do not get any warnings. |
@siddarth-c I have made a number of upgrades to the tutorials. There are only a couple more thing before we can merge
|
Done with the mentioned changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That policy network figure is very nice, to confirm, this is not copied from someone else and is your own creation.
Also for the top gif, is this from the final agent? The agent doesn't seem to do very well in the environment
The policy learned via REINFORCE is not optimal in Reacher (despite extensive hyperparameter searches). And yes, the policy network was designed by me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thank you for the tutorial, we would be interested in anymore tutorials that you create. Probably more on the gym environment side than training though they are always helpful
Description
Created a new tutorial depicting the new .step() function of gymnasium v26 using PyTorch. REINFORCE is employed to solve Mujoco's Reacher.
Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)