Update to support Gymnasium #277

arjun-kg · 2022-09-25T03:47:10Z

Description

A draft PR updating CleanRL to support Gymnasium. Closes #263

This mostly includes updating step and seed API. Tries to use gymnasium branches on the dependent packages (SB3 etc) After these are updated, will verify the changes, check the tests, and get the PR ready for review.

Costa's comment:

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

non-performance-impacting changes is fine and we can just merge them straightaway
performance-impacting changes is more complicated — we would need to run benchmark experiments to ensure there is no performance regression. We need to do this because even bug fixes could make the performance worse (see Change mujoco_py bindings for mujoco Deepmind bindings openai/gym#2762 (comment) as an example).

In this PR for initial support fo v0.26.1, let's aim to make only non-performance-impacting changes. With that said, here is a todo list:

Deprecate pybullet (since the new mujoco environments are being maintained again but pybullet is not)
Temporarily remove Isaac Gym Support
re-run mojoco experiments (use the v4 environments instead of the current v2 environments)
Atari wrappers upstream fixes (Add Gym 0.26 support DLR-RM/stable-baselines3#780)
procgen

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-09-25T03:47:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Mar 27, 2023 at 6:37AM (UTC)

vwxyzjn · 2022-09-27T19:38:30Z

Related to #271

vwxyzjn

Hey @arjun-kg thanks for preparing the PR. Looking forward to using the latest gym. I left some very preliminary comments.

One important thing to note during the refactor is to see if the change could result in a performance difference (not just a simple variable renaming). For example, the current PPO scripts did not handle the time out correctly, so handling time out correctly in this PR is a performance-impacting change.

We need to be careful with the performance-impacting changes because we would need to re-run the benchmarks on those changes to ensure there is no surprise regression in the performance.

vwxyzjn · 2022-09-27T19:39:50Z

cleanrl/ppo.py

@@ -213,18 +213,18 @@ def get_action_and_value(self, x, action=None):
                    writer.add_scalar("charts/episodic_length", item["episode"]["l"], global_step)
                    break


This part may need to be changed to

if "episode" in info: for item in info["episode"]["r"]: print(f"global_step={global_step}, episodic_return={item}") writer.add_scalar("charts/episodic_return", item, global_step) break for item in info["episode"]["l"]: writer.add_scalar("charts/episodic_length", item, global_step) break

To replicate the original behavior, you probably need something like (but hopefully better-looking than!):

if "episode" in info: first_idx = info["_episode"].nonzero()[0][0] r = info["episode"]["r"][first_idx] l = info["episode"]["l"][first_idx] print(f"global_step={global_step}, episodic_return={r}") writer.add_scalar("charts/episodic_return", r, global_step) writer.add_scalar("charts/episodic_length", l, global_step)

There's no guarantee that the first index in "episode" won't just be a zero, need the mask to specify which one.

Alternatively, it might be better to track a running average using the deques built into the RecordEpisodeStatistics wrapper, though that would likely results in different performance graphs.

vwxyzjn · 2022-09-27T19:40:16Z

cleanrl/c51.py

@@ -159,12 +159,12 @@ def linear_schedule(start_e: float, end_e: float, duration: int, t: int):
        envs.single_observation_space,
        envs.single_action_space,
        device,
-        handle_timeout_termination=True,
+        handle_timeout_termination=False,


vwxyzjn · 2022-09-27T19:42:01Z

cleanrl/ppo.py

        with torch.no_grad():
            next_value = agent.get_value(next_obs).reshape(1, -1)
            if args.gae:
                advantages = torch.zeros_like(rewards).to(device)
                lastgaelam = 0
                for t in reversed(range(args.num_steps)):
                    if t == args.num_steps - 1:
-                        nextnonterminal = 1.0 - next_done
+                        nextnonterminal = 1.0 - next_terminated


A note for myself: this is a change that could impact performance. We would need to re-run the benchmark here.

vwxyzjn · 2022-10-03T20:40:27Z

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

non-performance-impacting changes is fine and we can just merge them straightaway
performance-impacting changes is more complicated — we would need to run benchmark experiments to ensure there is no performance regression. We need to do this because even bug fixes could make the performance worse (see Change mujoco_py bindings for mujoco Deepmind bindings openai/gym#2762 (comment) as an example).

In this PR for initial support for v0.26.1, let's aim to make only non-performance-impacting changes. With that said, I have added a todo list in the PR description.

vwxyzjn · 2022-10-04T01:27:40Z

@arjun-kg I made the first pass of editing to make ppo.py and dqn.py to pass CI. Could you try looking at ddpg_continuous_action.py, TD3, and DDPG?

Btw the plan is to have an announcement like the following on the main page, since I expect to encounter more issues.

…v0.26

GaetanLepage · 2022-10-26T09:00:16Z

Hi !
When reading through the proposed changes, I am not sure to understand the following:
Why do you replace done with terminated and not terminated or truncated ?
I am not sure to get why the truncated return value is ignored.

vwxyzjn · 2022-11-12T02:52:45Z

@GaetanLepage yeah, we should do terminated or truncated for the moment until we properly deal with #198.

@arjun-kg I added some changes to ppo_continuous_action.py to make it work with DM control power by https://github.com/Farama-Foundation/Shimmy/blob/main/tests/test_dm_control.py

vwxyzjn · 2022-11-15T02:17:04Z

@arjun-kg we are thinking of probably supporting both gymnasium and gym simultaneously. See #318 (comment) as an example. This will give us a much smoother transition

arjun-kg · 2022-11-15T03:29:48Z

@vwxyzjn sounds good, will check it out. I'm a bit tied up this week. I'll continue work on this from next week if it's okay.

…_v0.26

…v0.26

ffelten · 2022-12-02T14:42:03Z

cleanrl/sac_continuous_action.py

-        for idx, d in enumerate(dones):
-            if d:
-                real_next_obs[idx] = infos[idx]["terminal_observation"]
-        rb.add(obs, real_next_obs, actions, rewards, dones, infos)


Hi, I guess this line has been forgotten in the migration :-).

pseudo-rnd-thoughts · 2022-12-13T17:16:32Z

We have just released Gymnasium v0.27.0, this should be backward compatible. Would it be possible to update this Pr to v0.27 and check that nothing new breaks

arjun-kg · 2023-02-26T14:35:09Z

@vwxyzjn recently SB3 supports gymnasium with a branch, but I'm not sure if some parallel work is going on to update cleanrl to gymnasium? Would you like me to update this PR to gymnasium with SB3 on the gymnasium branch?

…_v0.26

vcharraut · 2023-03-29T16:59:50Z

cleanrl/ddpg_continuous_action.py

        real_next_obs = next_obs.copy()
-        for idx, d in enumerate(dones):
+        for idx, d in enumerate(terminateds):


Should it use truncated instead of terminated here ?

With truncated, the results are identical with same seeding between the old and new implementation

Yes, this was a mistake, it should be truncated

vcharraut · 2023-03-29T17:01:37Z

cleanrl/ddpg_continuous_action.py

@@ -191,12 +190,12 @@ def forward(self, x):
                writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)
                break


Making the assumption that there will be no parrallel env, this could work:

if "final_info" in infos: info = infos["final_info"][0] print(f"global_step={global_step}, episodic_return={info['episode']['r']}") writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step) writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)

But I have seen that there is a different solution in the DQN file

vcharraut · 2023-03-29T17:02:18Z

cleanrl/ddpg_continuous_action.py

@@ -71,7 +70,7 @@ def thunk():
        if capture_video:
            if idx == 0:
                env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
-        env.seed(seed)
+
        env.action_space.seed(seed)
        env.observation_space.seed(seed)


I think env.observation_space.seed(seed) can be remove

kvrban · 2023-04-30T20:59:54Z

Just tried the PR with:

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
 import time
 from distutils.util import strtobool
 
-import gym
+import gymnasium as gym
 import numpy as np
 import torch
 import torch.nn as nn

with stable-baselines3==2.0.0a5 and gymnasium==0.28.1

when i run

python3 cleanrl/cleanrl/dqn.py

always after 'global_step=10009' execution stop with this error:

Traceback (most recent call last):
  File "/home/kris/dev/cleanRL-Gymnasium/cleanrl/cleanrl/dqn.py", line 195, in <module>
    data = rb.sample(args.batch_size)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 285, in sample
    return super().sample(batch_size=batch_size, env=env)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 110, in sample
    batch_inds = np.random.randint(0, upper_bound, size=batch_size)
  File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

kvrban · 2023-05-01T16:44:29Z

i think it was not intended to remove the following line:

rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)

This could be the fix (fixed it for me):

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
index 14864e7..4e73a6e 100644
--- a/cleanrl/dqn.py
+++ b/cleanrl/dqn.py
@@ -156,7 +156,7 @@ if __name__ == "__main__":
     start_time = time.time()
 
     # TRY NOT TO MODIFY: start the game
-    obs = envs.reset(seed=args.seed)
+    obs, _ = envs.reset(seed=args.seed)
     for global_step in range(args.total_timesteps):
         # ALGO LOGIC: put action logic here
         epsilon = linear_schedule(args.start_e, args.end_e, args.exploration_fraction * args.total_timesteps, global_step)
@@ -185,6 +185,7 @@ if __name__ == "__main__":
             for idx, d in enumerate(infos["_final_observation"]):
                 if d:
                     real_next_obs[idx] = infos["final_observation"][idx]
+        rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)
 
         # TRY NOT TO MODIFY: CRUCIAL step easy to overlook
         obs = next_obs

pseudo-rnd-thoughts · 2023-05-01T21:41:32Z

@kvrban Thanks for the comment but I think the plan is to complete this PR as several smaller PRs, see #370 and #371.

@arjun-kg or @vwxyzjn Should this PR be closed to avoid confusion?

vwxyzjn · 2023-05-01T21:47:55Z

@pseudo-rnd-thoughts absolutely. Closing this PR now.

arjun-kg added 4 commits September 20, 2022 19:06

update gym version

4c39915

update done, reset

a7c42fb

bump ale-py version, dont handle timeout in rb

0c0a647

pre-commit

35d01ee

vercel bot deployed to Preview September 25, 2022 03:47 View deployment

vwxyzjn reviewed Sep 27, 2022

View reviewed changes

vwxyzjn mentioned this pull request Oct 3, 2022

V1.0.0b2 #286

Merged

4 tasks

vwxyzjn added 2 commits October 3, 2022 16:22

cache changes

0ddcac7

Merge branch 'throwaway' into gym_v0.26

2495c5f

vercel bot deployed to Preview October 3, 2022 20:25 View deployment

pre-commit

1d64b5b

vercel bot deployed to Preview October 3, 2022 20:31 View deployment

vwxyzjn added 2 commits October 3, 2022 16:45

fix indent

d4dcc60

remove pybullet

4e8f8b8

vercel bot deployed to Preview October 3, 2022 20:52 View deployment

Fix next observation

59e727c

vercel bot deployed to Preview October 3, 2022 21:04 View deployment

fix dqn script

2ae0be5

vercel bot deployed to Preview October 3, 2022 21:10 View deployment

update some docs

ba8983f

vercel bot deployed to Preview October 3, 2022 21:12 View deployment

Test API

ed68e76

vercel bot deployed to Preview October 3, 2022 22:31 View deployment

arjun-kg added 2 commits October 7, 2022 14:33

Merge branch 'master' of https://github.com/vwxyzjn/cleanrl into gym_…

7e8f2db

…v0.26

Merge branch 'master' of https://github.com/vwxyzjn/cleanrl into gym_…

4a05385

…v0.26

vercel bot deployed to Preview October 7, 2022 09:10 View deployment

Support DM control and make backward compatibility

f8271fe

vercel bot deployed to Preview November 12, 2022 02:51 View deployment

vwxyzjn mentioned this pull request Nov 15, 2022

Implement Gymnasium-compliant PPO script #318

Closed

20 tasks

Merge branch 'master' of https://github.com/arjun-kg/cleanrl into gym…

ecffa00

…_v0.26

vercel bot deployed to Preview November 23, 2022 10:42 View deployment

Merge branch 'master' of https://github.com/vwxyzjn/cleanrl into gym_…

28fd178

…v0.26

vercel bot deployed to Preview November 29, 2022 07:31 View deployment

ffelten reviewed Dec 2, 2022

View reviewed changes

Merge branch 'master' of https://github.com/arjun-kg/cleanrl into gym…

813192d

…_v0.26

vercel bot deployed to Preview March 27, 2023 06:37 View deployment

arjun-kg changed the title ~~Update to support Gym v0.26.1~~ Update to support Gymnasium Mar 27, 2023

vcharraut reviewed Mar 29, 2023

View reviewed changes

vwxyzjn closed this May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to support Gymnasium #277

Update to support Gymnasium #277

arjun-kg commented Sep 25, 2022 •

edited

Loading

vercel bot commented Sep 25, 2022 •

edited

Loading

vwxyzjn commented Sep 27, 2022

vwxyzjn left a comment

vwxyzjn Sep 27, 2022

DavidSlayback Oct 3, 2022

vwxyzjn Sep 27, 2022

vwxyzjn Sep 27, 2022

vwxyzjn commented Oct 3, 2022

vwxyzjn commented Oct 4, 2022

GaetanLepage commented Oct 26, 2022

vwxyzjn commented Nov 12, 2022

vwxyzjn commented Nov 15, 2022 •

edited

Loading

arjun-kg commented Nov 15, 2022

ffelten Dec 2, 2022

pseudo-rnd-thoughts commented Dec 13, 2022

arjun-kg commented Feb 26, 2023

vcharraut Mar 29, 2023 •

edited

Loading

arjun-kg Mar 31, 2023

vcharraut Mar 29, 2023 •

edited

Loading

vcharraut Mar 29, 2023

kvrban commented Apr 30, 2023

kvrban commented May 1, 2023

pseudo-rnd-thoughts commented May 1, 2023

vwxyzjn commented May 1, 2023

		@@ -213,18 +213,18 @@ def get_action_and_value(self, x, action=None):
		writer.add_scalar("charts/episodic_length", item["episode"]["l"], global_step)
		break

		@@ -191,12 +190,12 @@ def forward(self, x):
		writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)
		break

Update to support Gymnasium #277

Update to support Gymnasium #277

Conversation

arjun-kg commented Sep 25, 2022 • edited Loading

Description

Costa's comment:

Checklist:

vercel bot commented Sep 25, 2022 • edited Loading

vwxyzjn commented Sep 27, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Sep 27, 2022

Choose a reason for hiding this comment

DavidSlayback Oct 3, 2022

Choose a reason for hiding this comment

vwxyzjn Sep 27, 2022

Choose a reason for hiding this comment

vwxyzjn Sep 27, 2022

Choose a reason for hiding this comment

vwxyzjn commented Oct 3, 2022

vwxyzjn commented Oct 4, 2022

GaetanLepage commented Oct 26, 2022

vwxyzjn commented Nov 12, 2022

vwxyzjn commented Nov 15, 2022 • edited Loading

arjun-kg commented Nov 15, 2022

ffelten Dec 2, 2022

Choose a reason for hiding this comment

pseudo-rnd-thoughts commented Dec 13, 2022

arjun-kg commented Feb 26, 2023

vcharraut Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

arjun-kg Mar 31, 2023

Choose a reason for hiding this comment

vcharraut Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

vcharraut Mar 29, 2023

Choose a reason for hiding this comment

kvrban commented Apr 30, 2023

kvrban commented May 1, 2023

pseudo-rnd-thoughts commented May 1, 2023

vwxyzjn commented May 1, 2023

arjun-kg commented Sep 25, 2022 •

edited

Loading

vercel bot commented Sep 25, 2022 •

edited

Loading

vwxyzjn commented Nov 15, 2022 •

edited

Loading

vcharraut Mar 29, 2023 •

edited

Loading

vcharraut Mar 29, 2023 •

edited

Loading