Add gymnasium support for DQN #370

vcharraut · 2023-04-01T18:34:59Z

Description

This PR updates the DQN files to the lastest version of gymnasium, replacing gym.

dqn.py
dqn_jax.py
dqn_atari.py
dqn_atari_jax.py

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Regression report

python -m openrlbenchmark.rlops \
    --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
        'dqn_atari_jax?tag=rlops-pilot' \
        'dqn_atari_jax?tag=pr-370-atari-jax' \
    --env-ids Breakout-v5 BeamRider-v5 Pong-v5 \
    --check-empty-runs False \
    --ncols 5 \
    --ncols-legend 2 \
    --output-filename figures/0compare \
    --scan-history \
    --report

────────────────────────────────────────────────────────────────────────────────────── Runtime (m) (mean ± std) ──────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment  ┃ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['rlops-pilot']}) ┃ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['pr-370-atari-jax']}) ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Breakout-v5  │ 270.1473263136972                                                │ 538.7802477303775                                                     │
│ BeamRider-v5 │ 271.7741639644951                                                │ 538.6782197420808                                                     │
│ Pong-v5      │ 261.6593977599932                                                │ 522.4641281567034                                                     │
└──────────────┴──────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────── Episodic Return (mean ± std) ────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment  ┃ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['rlops-pilot']}) ┃ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['pr-370-atari-jax']}) ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Breakout-v5  │ 365.77 ± 15.64                                                   │ 356.66 ± 5.64                                                         │
│ BeamRider-v5 │ 5888.53 ± 185.09                                                 │ 6058.41 ± 116.74                                                      │
│ Pong-v5      │ 20.39 ± 0.17                                                     │ 20.39 ± 0.02                                                          │
└──────────────┴──────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────────── Runtime (m) Average ─────────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Environment                                                           ┃ Average Runtime   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['rlops-pilot']})      │ 267.8602960127285 │
│ openrlbenchmark/cleanrl/dqn_atari_jax ({'tag': ['pr-370-atari-jax']}) │ 533.3075318763872 │
└───────────────────────────────────────────────────────────────────────┴───────────────────┘

https://wandb.ai/costa-huang/cleanrl/reports/Regression-Report-dqn_atari_jax--Vmlldzo0MjQ5OTA2

python -m openrlbenchmark.rlops \
    --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
        'dqn?tag=pr-370' \
        'dqn_jax?tag=pr-370-jax' \
        'dqn?tag=rlops-pilot' \
        'dqn_jax?tag=rlops-pilot' \
    --env-ids CartPole-v1 Acrobot-v1 MountainCar-v0 \
    --check-empty-runs False \
    --ncols 3 \
    --ncols-legend 2 \
    --output-filename figures/0compare \
    --scan-history \
    --report

────────────────────────────────────────────────────────────────────────────────────── Runtime (m) (mean ± std) ──────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                ┃ openrlbenchmark/cleanrl/dqn ({'tag':       ┃ openrlbenchmark/cleanrl/dqn_jax ({'tag':   ┃ openrlbenchmark/cleanrl/dqn ({'tag':       ┃ openrlbenchmark/cleanrl/dqn_jax ({'tag':   ┃
┃ Environment    ┃ ['pr-370']})                               ┃ ['pr-370-jax']})                           ┃ ['rlops-pilot']})                          ┃ ['rlops-pilot']})                          ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ CartPole-v1    │ 3.099431800075442                          │ 1.8901799905559769                         │ 2.229200170565302                          │ 2.0570977331846896                         │
│ Acrobot-v1     │ 4.185325574186605                          │ 3.3383588646594835                         │ 3.2403913728341207                         │ 3.005497937894226                          │
│ MountainCar-v0 │ 3.5431891388538053                         │ 2.2788801149391746                         │ 2.5699978012313105                         │ 2.3790336879432625                         │
└────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────── Episodic Return (mean ± std) ────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                ┃ openrlbenchmark/cleanrl/dqn ({'tag':       ┃ openrlbenchmark/cleanrl/dqn_jax ({'tag':   ┃ openrlbenchmark/cleanrl/dqn ({'tag':       ┃ openrlbenchmark/cleanrl/dqn_jax ({'tag':   ┃
┃ Environment    ┃ ['pr-370']})                               ┃ ['pr-370-jax']})                           ┃ ['rlops-pilot']})                          ┃ ['rlops-pilot']})                          ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ CartPole-v1    │ 486.82 ± 8.32                              │ 324.99 ± 212.99                            │ 486.82 ± 8.32                              │ 499.26 ± 1.05                              │
│ Acrobot-v1     │ -90.20 ± 1.84                              │ -90.81 ± 1.94                              │ -90.20 ± 1.84                              │ -90.44 ± 0.99                              │
│ MountainCar-v0 │ -194.73 ± 7.30                             │ -191.72 ± 9.33                             │ -194.73 ± 7.30                             │ -169.26 ± 23.75                            │
└────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────────── Runtime (m) Average ─────────────────────────────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Environment                                                ┃ Average Runtime    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ openrlbenchmark/cleanrl/dqn ({'tag': ['pr-370']})          │ 3.6093155043719505 │
│ openrlbenchmark/cleanrl/dqn_jax ({'tag': ['pr-370-jax']})  │ 2.502472990051545  │
│ openrlbenchmark/cleanrl/dqn ({'tag': ['rlops-pilot']})     │ 2.679863114876911  │
│ openrlbenchmark/cleanrl/dqn_jax ({'tag': ['rlops-pilot']}) │ 2.4805431196740595 │
└────────────────────────────────────────────────────────────┴────────────────────┘

https://wandb.ai/costa-huang/cleanrl/reports/Regression-Report-dqn_jax--Vmlldzo0MjUwMDM1

vercel · 2023-04-01T18:35:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 2, 2023 8:03pm

vwxyzjn

One small comment but otherwise LGTM. Feel free to start the RLops process.

cleanrl/dqn.py

pseudo-rnd-thoughts

@vwxyzjn On line 208 of jax atari and 180 of jax classic control have np rather than jnp
https://github.com/vwxyzjn/cleanrl/blob/599f9adfec89d63721578b08b75ec38ab0209372/cleanrl/dqn_jax.py#L180

Im guessing this is a simple mistake (it shouldn't affect performance), can we change to jnp

pseudo-rnd-thoughts · 2023-04-25T14:27:32Z

The error is due to needing stable baselines 3 ==2

vwxyzjn · 2023-05-03T15:04:49Z

No sign of regression, as shown in the PR description. Merging now.

ronuchit · 2023-06-05T23:54:19Z

Hi @vwxyzjn @charraut, I'm wondering what part of this change forced us to add the following line:
assert args.num_envs == 1, "vectorized envs are not supported at the moment"

Vectorization was a useful feature earlier. Thank you!

vwxyzjn · 2023-06-06T00:18:08Z

@ronuchit this is due to SB3's replay buffer don't support num_envs>1 I think.

ronuchit · 2023-06-06T01:52:42Z

I believe it does, actually: https://github.com/DLR-RM/stable-baselines3/blame/master/stable_baselines3/common/buffers.py#L162

We would just need to pass in n_envs=args.num_envs when we instantiate the ReplayBuffer. Perhaps there are other issues at play here?

vwxyzjn · 2023-06-06T13:30:02Z

I believe it does, actually: https://github.com/DLR-RM/stable-baselines3/blame/master/stable_baselines3/common/buffers.py#L162

We would just need to pass in n_envs=args.num_envs when we instantiate the ReplayBuffer. Perhaps there are other issues at play here?

I see. That’s interesting. Would you be interested in making a PR that optionally supports num_envs>1?

ronuchit · 2023-06-06T21:02:46Z

sure, done: #395

Valentin Canete added 2 commits April 1, 2023 20:28

Add gymnasium dqn.py

90cd3f9

Add gymnasium support for dqn_jax.py

0bc6b8f

vercel bot deployed to Preview April 1, 2023 18:35 View deployment

vwxyzjn reviewed Apr 6, 2023

View reviewed changes

cleanrl/dqn.py Outdated Show resolved Hide resolved

Valentin Canete added 2 commits April 6, 2023 22:59

moved assert to parse func

2903faa

add gymnasium support for dqn atari

089c14a

vercel bot deployed to Preview April 8, 2023 22:30 View deployment

black formatting

4b1f417

vercel bot deployed to Preview April 8, 2023 22:34 View deployment

fix make_env for rendering

599f9ad

vercel bot deployed to Preview April 9, 2023 09:24 View deployment

pseudo-rnd-thoughts reviewed Apr 13, 2023

View reviewed changes

moved np to jnp

b25647b

vercel bot deployed to Preview April 19, 2023 13:34 View deployment

Merge branch 'master' into dqn-gymnasium

4ef9cd0

vercel bot deployed to Preview April 25, 2023 14:06 View deployment

vcharraut marked this pull request as ready for review April 29, 2023 20:28

pseudo-rnd-thoughts mentioned this pull request May 1, 2023

Update to support Gymnasium #277

Closed

21 tasks

add warning mesage and update dependencies

60200d2

vercel bot deployed to Preview May 2, 2023 13:44 View deployment

update test cases

3d93220

vercel bot deployed to Preview May 2, 2023 13:49 View deployment

vwxyzjn added 2 commits May 2, 2023 09:53

update test cases

27de676

pre-commit

c3b952e

vercel bot deployed to Preview May 2, 2023 13:58 View deployment

vwxyzjn added 3 commits May 2, 2023 10:39

bump shimmy version

4d50ecf

update shimmy

5d694bc

update shimmy

717c98f

vercel bot deployed to Preview May 2, 2023 14:47 View deployment

trigger CI

c39ab46

vercel bot deployed to Preview May 2, 2023 14:49 View deployment

trigger CI

f342948

vercel bot deployed to Preview May 2, 2023 15:02 View deployment

trigger CI

9aea38d

vercel bot deployed to Preview May 2, 2023 19:11 View deployment

fix poetry

264d0e6

vercel bot deployed to Preview May 2, 2023 20:03 View deployment

vwxyzjn merged commit 39670fc into vwxyzjn:master May 3, 2023

sdpkjc mentioned this pull request May 6, 2023

Bug of cleanrl_utils/evals #380

Closed

3 tasks

ronuchit mentioned this pull request Jun 6, 2023

handle num_envs > 1 in DQN #395

Open

9 tasks

vcharraut deleted the dqn-gymnasium branch July 26, 2023 19:42

sdpkjc mentioned this pull request Aug 26, 2024

Why does num_envs has to be 1 for DQN #480

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gymnasium support for DQN #370

Add gymnasium support for DQN #370

vcharraut commented Apr 1, 2023 •

edited by vwxyzjn

Loading

vercel bot commented Apr 1, 2023 •

edited

Loading

vwxyzjn left a comment

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts commented Apr 25, 2023

vwxyzjn commented May 3, 2023

ronuchit commented Jun 5, 2023

vwxyzjn commented Jun 6, 2023

ronuchit commented Jun 6, 2023

vwxyzjn commented Jun 6, 2023

ronuchit commented Jun 6, 2023

Add gymnasium support for DQN #370

Add gymnasium support for DQN #370

Conversation

vcharraut commented Apr 1, 2023 • edited by vwxyzjn Loading

Description

Types of changes

Checklist:

Regression report

vercel bot commented Apr 1, 2023 • edited Loading

vwxyzjn left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts commented Apr 25, 2023

vwxyzjn commented May 3, 2023

ronuchit commented Jun 5, 2023

vwxyzjn commented Jun 6, 2023

ronuchit commented Jun 6, 2023

vwxyzjn commented Jun 6, 2023

ronuchit commented Jun 6, 2023

vcharraut commented Apr 1, 2023 •

edited by vwxyzjn

Loading

vercel bot commented Apr 1, 2023 •

edited

Loading