-
Notifications
You must be signed in to change notification settings - Fork 744
Add RPO to CleanRL #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RPO to CleanRL #331
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Hey btw I noticed you ran into some issues with cleanrl/.github/workflows/tests.yaml Lines 154 to 156 in b558b2b
See https://github.com/deepmind/dm_control#rendering. I added a bit of docs to explain this issue. |
I have tried these commands on my local machine, but the issue persists. I just wanted to give an update. |
It might be the case you are using hardware-accelerated rendering. Maybe look up https://github.com/deepmind/dm_control#rendering and google-deepmind/dm_control#136 |
Hi @vwxyzjn , did you get around this issue? Farama-Foundation/Shimmy#15 (comment) |
Actually yes, I can reproduce this error on my end. For now try running the code without |
Here is the result comparison between PPO and RPO. These dm_control environments are used in the RPO paper. The RPO paper: https://arxiv.org/pdf/2212.07536.pdf. PPO and RPO run for 8M timesteps, and results are computed over 10 random seeds.
Learning curve: The rest of the dm_control experiments are running and will be added later. |
This is perfect. Looking forward to seeing the rest of the results. A quick tip is that instead of creating a separate Feel free to prepare the documentation in |
Thanks for the tips. I was too worried not to mess up with existing experiments with the name As of the final merging, I will only keep the |
This used to be a concern but not anymore since we are now doing filter-based workflow.
cleanrl/cleanrl/ppo_continuous_action.py Lines 140 to 153 in 2dd73af
There is no need to delete those experiments because the last benchmarked experiments are tagged with |
I am thinking about this PR. I have created some additional files ( |
That is correct. To keep track of the
I would however say that the name |
Re-running would not be a problem. |
That’s right. The
Sounds good. Thanks. |
Results on all dm_control environments. The PPO and RPO run for 8M timesteps, and results are computed over 10 random seeds. Table:
|
The results look exceptionally clean 👍. After all the reported results, I would feel comfortable marking the existing Please update the docs with these results. Please also remove the Thank you, and have a great holiday. |
Seeing how much this improves the scores, I'm now really curious what happens if we change from Gaussian to the Beta distribution... Or simply clip log std, so that std will not be able to go down to too small value |
@Howuhh Trying Beta distribution would be an interesting idea. I observed some performance effects when trying with Laplace and Gumbel (https://arxiv.org/pdf/2212.07536.pdf Figure 7 & 8). An aspect of the RPO method is maintaining randomness through Uniform distribution. It would be interesting to see if clip log std can achieve a similar effect. |
Help needed: RPO Doc: https://github.com/masud99r/cleanrl/blob/master/docs/rl-algorithms/rpo.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey I fixed some formatting issues with the iframe. Please let me know if you run into additional issues.
|
Thanks for the fix. Were those tab/space issues or something different?
I have added a warning section with the depreciation notice. With that, I think I have addressed all the comments, but I might lose some in the thread! Could you please take a look and let me know what else to be done for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your patience and this awesome contribution. Please give it a final read at https://cleanrl-git-fork-masud99r-master-vwxyzjn.vercel.app/rl-algorithms/rpo/. I have added you as a collaborator of this repo, and feel free to go ahead and merge.
It was my pleasure working on this. Thanks for your excellent guidance throughout the process. I have learned a lot. I took a pass on the docs and fixed some typos. I am going to merge. |
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.