-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gymnasium support for DDPG continuous (+Jax) #371
Gymnasium support for DDPG continuous (+Jax) #371
Conversation
…g_continuous_action_gymnasium
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, otherwise LGTM. Feel free to start the RLops process.
README.md
Outdated
> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277). | ||
> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277). | ||
|
||
Currently, `ppo_continuous_action_isaacgym.py`, `ddpg_continuous_action_jax.py`, `ddpg_continuous_action.py` have been ported to gymnasium. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppo_continuous_action_isaacgym.py
should not be included, right? It should be ppo_continuous_action.py
cleanrl/ddpg_continuous_action.py
Outdated
import numpy as np | ||
import pybullet_envs # noqa | ||
|
||
# import pybullet_envs # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of commenting, just remove it :)
This is for DDPG continuous. There seem to be somewhat significant differences but I'm not sure how to interpret them. I used gymnasium 0.28.1, numpy 1.24 (I later noticed poetry downgrading it to 1.21 so it might be significant, but there were some errors with this, so I had tried 1.24), and SB3 alpha1. Let me know what you think. I can re-run if needed. |
@arjun-kg I think the report looks great. DDPG is definitely more unstable, so the results are expected. Feel free to update the docs and we can merge. |
README.md
Outdated
Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like, | ||
|
||
``` | ||
poetry run pip install sb3==2.0.0a1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to the usage docs?
@vwxyzjn That's great! Just started the runs for ddpg-jax, will update results of that as well soon. Do I need to update the results of the ddpg_continuous run / RLOps process anywhere? |
@vwxyzjn The results of RLOps for DDPG-Jax - https://wandb.ai/openrlbenchmark/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MDE2NzA2 |
Looks great! |
Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like, | ||
|
||
``` | ||
poetry run pip install sb3==2.0.0a1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be poetry run pip install stable_baselines3==2.0.0a1
No sign of regression as shown in the PR description. Merging now. |
Description
Port
ddpg_continuous_action.py
andddpg_continuous_action_jax.py
to gymnasium.Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you need to run benchmark experiments for a performance-impacting changes:
--capture-video
.python -m openrlbenchmark.rlops
.python -m openrlbenchmark.rlops
utility to the documentation.python -m openrlbenchmark.rlops ....your_args... --report
, to the documentation.Rlops report
https://wandb.ai/costa-huang/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MjUwNDAx