Gymnasium support for DDPG continuous (+Jax) #371

arjun-kg · 2023-04-03T13:19:34Z

Description

Port ddpg_continuous_action.py and ddpg_continuous_action_jax.py to gymnasium.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Rlops report

python -m openrlbenchmark.rlops \
    --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
        'ddpg_continuous_action?tag=pr-371' \
        'ddpg_continuous_action_jax?tag=pr-371-jax' \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --check-empty-runs False \
    --ncols 3 \
    --ncols-legend 2 \
    --output-filename figures/0compare \
    --scan-history \
    --report

────────────────────────────────────────────────────────────────────────────────────── Runtime (m) (mean ± std) ──────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment    ┃ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']}) ┃ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hopper-v2      │ 82.48884665340242                                                    │ 97.04908408278409                                                            │
│ Walker2d-v2    │ 83.70214285646155                                                    │ 99.79698188415784                                                            │
│ HalfCheetah-v2 │ 84.70859018747274                                                    │ 99.89238566430278                                                            │
└────────────────┴──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────── Episodic Return (mean ± std) ────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment    ┃ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']}) ┃ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hopper-v2      │ 1182.86 ± 58.52                                                      │ 1523.78 ± 201.77                                                             │
│ Walker2d-v2    │ 1174.04 ± 2.72                                                       │ 1254.34 ± 135.92                                                             │
│ HalfCheetah-v2 │ 10073.02 ± 615.81                                                    │ 10249.45 ± 373.49                                                            │
└────────────────┴──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────────── Runtime (m) Average ─────────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Environment                                                                  ┃ Average Runtime   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ openrlbenchmark/cleanrl/ddpg_continuous_action ({'tag': ['pr-371']})         │ 83.63319323244558 │
│ openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-371-jax']}) │ 98.9128172104149  │
└──────────────────────────────────────────────────────────────────────────────┴───────────────────┘

https://wandb.ai/costa-huang/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MjUwNDAx

…g_continuous_action_gymnasium

vercel · 2023-04-03T13:19:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 3, 2023 6:38pm

vwxyzjn

Small comment, otherwise LGTM. Feel free to start the RLops process.

vwxyzjn · 2023-04-06T13:58:33Z

README.md

-> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277).
+> ℹ️ **Support for Gymnasium**: [Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) is the next generation of [`openai/gym`](https://github.com/openai/gym) that will continue to be maintained and introduce new features. Please see their [announcement](https://farama.org/Announcing-The-Farama-Foundation) for further detail. We are migrating to `gymnasium` and the progress can be tracked in [vwxyzjn/cleanrl#277](https://github.com/vwxyzjn/cleanrl/pull/277). 
+
+Currently, `ppo_continuous_action_isaacgym.py`, `ddpg_continuous_action_jax.py`, `ddpg_continuous_action.py` have been ported to gymnasium. 


ppo_continuous_action_isaacgym.py should not be included, right? It should be ppo_continuous_action.py

vwxyzjn · 2023-04-06T13:59:02Z

cleanrl/ddpg_continuous_action.py

 import numpy as np
-import pybullet_envs  # noqa
+
+# import pybullet_envs  # noqa


Instead of commenting, just remove it :)

arjun-kg · 2023-04-07T07:26:46Z

Feel free to start the RLops process.

https://wandb.ai/openrlbenchmark/cleanrl/reports/Regression-Report-ddpg_continuous_action--VmlldzozOTk4NzY1

This is for DDPG continuous. There seem to be somewhat significant differences but I'm not sure how to interpret them. I used gymnasium 0.28.1, numpy 1.24 (I later noticed poetry downgrading it to 1.21 so it might be significant, but there were some errors with this, so I had tried 1.24), and SB3 alpha1. Let me know what you think. I can re-run if needed.

vwxyzjn · 2023-04-07T12:57:42Z

@arjun-kg I think the report looks great. DDPG is definitely more unstable, so the results are expected. Feel free to update the docs and we can merge.

vwxyzjn · 2023-04-07T12:58:29Z

README.md

+Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like, 
+
+```
+poetry run pip install sb3==2.0.0a1


Could we move this to the usage docs?

arjun-kg · 2023-04-07T15:31:17Z

@vwxyzjn That's great! Just started the runs for ddpg-jax, will update results of that as well soon. Do I need to update the results of the ddpg_continuous run / RLOps process anywhere?

arjun-kg · 2023-04-10T01:34:23Z

@vwxyzjn The results of RLOps for DDPG-Jax - https://wandb.ai/openrlbenchmark/cleanrl/reports/Regression-Report-ddpg_continuous_action_jax--Vmlldzo0MDE2NzA2

vwxyzjn · 2023-04-10T16:34:15Z

Looks great!

vwxyzjn · 2023-04-25T14:44:37Z

docs/get-started/basic-usage.md

+Please note that, `stable-baselines3` version `1.2` does not support `gymnasium`. To use these scripts, please install the `alpha1` version like, 
+
+```
+poetry run pip install sb3==2.0.0a1


This should be poetry run pip install stable_baselines3==2.0.0a1

vwxyzjn · 2023-05-03T21:35:21Z

No sign of regression as shown in the PR description. Merging now.

arjun-kg added 3 commits April 3, 2023 21:13

ddpg continuous + jax

54439c8

fix video recording

6f4f072

Merge branch 'master' of https://github.com/arjun-kg/cleanrl into ddp…

91aae1d

…g_continuous_action_gymnasium

vercel bot deployed to Preview April 3, 2023 13:20 View deployment

vwxyzjn reviewed Apr 6, 2023

View reviewed changes

vwxyzjn reviewed Apr 7, 2023

View reviewed changes

arjun-kg added 3 commits April 8, 2023 00:17

remove pybullet

4171609

move to usage docs

f2608a3

isort

6e6a5b5

vercel bot deployed to Preview April 7, 2023 15:24 View deployment

arjun-kg marked this pull request as ready for review April 11, 2023 02:01

update lock files

d8dd801

vercel bot deployed to Preview April 24, 2023 19:46 View deployment

Merge branch 'master' into ddpg_continuous_action_gymnasium

06f41ce

vercel bot deployed to Preview April 25, 2023 13:35 View deployment

try trigger CI

d9825b4

vercel bot deployed to Preview April 25, 2023 13:40 View deployment

Merge branch 'master' into ddpg_continuous_action_gymnasium

91f770f

vercel bot deployed to Preview April 25, 2023 14:05 View deployment

vwxyzjn reviewed Apr 25, 2023

View reviewed changes

pseudo-rnd-thoughts mentioned this pull request May 1, 2023

Update to support Gymnasium #277

Closed

21 tasks

vwxyzjn added 2 commits May 3, 2023 11:11

Merge branch 'master' into ddpg_continuous_action_gymnasium

a05e618

update ddpg default v4 environments

03b3c7e

vercel bot deployed to Preview May 3, 2023 15:19 View deployment

trigger CI

b6d8598

vercel bot deployed to Preview May 3, 2023 17:34 View deployment

install jax dependency

8f0029d

vercel bot deployed to Preview May 3, 2023 18:02 View deployment

fix CI

d630603

vercel bot deployed to Preview May 3, 2023 18:11 View deployment

remove windows CI

65000a8

vercel bot deployed to Preview May 3, 2023 18:38 View deployment

vwxyzjn merged commit 9f8b64b into vwxyzjn:master May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gymnasium support for DDPG continuous (+Jax) #371

Gymnasium support for DDPG continuous (+Jax) #371

arjun-kg commented Apr 3, 2023 •

edited by vwxyzjn

Loading

vercel bot commented Apr 3, 2023 •

edited

Loading

vwxyzjn left a comment

vwxyzjn Apr 6, 2023 •

edited

Loading

vwxyzjn Apr 6, 2023

arjun-kg commented Apr 7, 2023 •

edited

Loading

vwxyzjn commented Apr 7, 2023

vwxyzjn Apr 7, 2023

arjun-kg commented Apr 7, 2023

arjun-kg commented Apr 10, 2023

vwxyzjn commented Apr 10, 2023

vwxyzjn Apr 25, 2023

vwxyzjn commented May 3, 2023

Gymnasium support for DDPG continuous (+Jax) #371

Gymnasium support for DDPG continuous (+Jax) #371

Conversation

arjun-kg commented Apr 3, 2023 • edited by vwxyzjn Loading

Description

Types of changes

Checklist:

Rlops report

vercel bot commented Apr 3, 2023 • edited Loading

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Apr 6, 2023 • edited Loading

Choose a reason for hiding this comment

vwxyzjn Apr 6, 2023

Choose a reason for hiding this comment

arjun-kg commented Apr 7, 2023 • edited Loading

vwxyzjn commented Apr 7, 2023

vwxyzjn Apr 7, 2023

Choose a reason for hiding this comment

arjun-kg commented Apr 7, 2023

arjun-kg commented Apr 10, 2023

vwxyzjn commented Apr 10, 2023

vwxyzjn Apr 25, 2023

Choose a reason for hiding this comment

vwxyzjn commented May 3, 2023

arjun-kg commented Apr 3, 2023 •

edited by vwxyzjn

Loading

vercel bot commented Apr 3, 2023 •

edited

Loading

vwxyzjn Apr 6, 2023 •

edited

Loading

arjun-kg commented Apr 7, 2023 •

edited

Loading