Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gymnasium support for DDPG continuous (+Jax) #371

Merged
merged 16 commits into from
May 3, 2023

Conversation

arjun-kg
Copy link
Contributor

@arjun-kg arjun-kg commented Apr 3, 2023

Description

Port ddpg_continuous_action.py and ddpg_continuous_action_jax.py to gymnasium.

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Rlops report

python -m openrlbenchmark.rlops \
    --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
        'ddpg_continuous_action?tag=pr-371' \
        'ddpg_continuous_action_jax?tag=pr-371-jax' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --check-empty-runs False \
    --ncols 3 \
    --ncols-legend 2 \
    --output-filename figures/0compare \
    --scan-history \
    --report
────────────────────────────────────────────────────────────────────────────────────── Runtime (m) (mean ± std) ──────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment    ┃ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']}) ┃ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hopper-v2      │ 82.48884665340242                                                    │ 97.04908408278409                                                            │
│ Walker2d-v2    │ 83.70214285646155                                                    │ 99.79698188415784                                                            │
│ HalfCheetah-v2 │ 84.70859018747274                                                    │ 99.89238566430278                                                            │
└────────────────┴──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────── Episodic Return (mean ± std) ────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment    ┃ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']}) ┃ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hopper-v2      │ 1182.86 ± 58.52                                                      │ 1523.78 ± 201.77                                                             │
│ Walker2d-v2    │ 1174.04 ± 2.72                                                       │ 1254.34 ± 135.92                                                             │
│ HalfCheetah-v2 │ 10073.02 ± 615.81                                                    │ 10249.45 ± 373.49                                                            │
└────────────────┴──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────────── Runtime (m) Average ─────────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Environment                                                                  ┃ Average Runtime   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']})         │ 83.63319323244558 │
│ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) │ 98.9128172104149  │
└──────────────────────────────────────────────────────────────────────────────┴───────────────────┘

image

https://wandb.ai/costa-huang/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MjUwNDAx

@vercel
Copy link

vercel bot commented Apr 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 3, 2023 6:38pm

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, otherwise LGTM. Feel free to start the RLops process.

README.md Outdated
> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277).
> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277).

Currently, `ppo_continuous_action_isaacgym.py`, `ddpg_continuous_action_jax.py`, `ddpg_continuous_action.py` have been ported to gymnasium.
Copy link
Owner

@vwxyzjn vwxyzjn Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppo_continuous_action_isaacgym.py should not be included, right? It should be ppo_continuous_action.py

import numpy as np
import pybullet_envs # noqa

# import pybullet_envs # noqa
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of commenting, just remove it :)

@arjun-kg
Copy link
Contributor Author

arjun-kg commented Apr 7, 2023

Feel free to start the RLops process.

https://wandb.ai/openrlbenchmark/cleanrl/reports/Regression-Report-ddpg_continuous_action--VmlldzozOTk4NzY1

This is for DDPG continuous. There seem to be somewhat significant differences but I'm not sure how to interpret them. I used gymnasium 0.28.1, numpy 1.24 (I later noticed poetry downgrading it to 1.21 so it might be significant, but there were some errors with this, so I had tried 1.24), and SB3 alpha1. Let me know what you think. I can re-run if needed.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Apr 7, 2023

@arjun-kg I think the report looks great. DDPG is definitely more unstable, so the results are expected. Feel free to update the docs and we can merge.

README.md Outdated
Comment on lines 37 to 40
Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like,

```
poetry run pip install sb3==2.0.0a1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to the usage docs?

@arjun-kg
Copy link
Contributor Author

arjun-kg commented Apr 7, 2023

@vwxyzjn That's great! Just started the runs for ddpg-jax, will update results of that as well soon. Do I need to update the results of the ddpg_continuous run / RLOps process anywhere?

@arjun-kg
Copy link
Contributor Author

@vwxyzjn The results of RLOps for DDPG-Jax - https://wandb.ai/openrlbenchmark/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MDE2NzA2

@vwxyzjn
Copy link
Owner

vwxyzjn commented Apr 10, 2023

Looks great!

@arjun-kg arjun-kg marked this pull request as ready for review April 11, 2023 02:01
Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like,

```
poetry run pip install sb3==2.0.0a1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be poetry run pip install stable_baselines3==2.0.0a1

@pseudo-rnd-thoughts pseudo-rnd-thoughts mentioned this pull request May 1, 2023
21 tasks
@vwxyzjn
Copy link
Owner

vwxyzjn commented May 3, 2023

No sign of regression as shown in the PR description. Merging now.

@vwxyzjn vwxyzjn merged commit 9f8b64b into vwxyzjn:master May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants