Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Remove old gym monitor code and add new API #37922

Closed
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
9dd8b29
bump to 0.28.1
May 24, 2023
5e860cd
Merge branch 'master' into gymnasium
May 24, 2023
3189e30
Merge branch 'master' into gymnasium
May 25, 2023
bd070e5
fix test
May 26, 2023
fbedf59
Merge branch 'master' into gymnasium
Jun 5, 2023
7402830
Atari is now supported by gymnasium
Rohan138 Jun 5, 2023
9d2e601
Merge branch 'gymnasium' of https://github.com/Rohan138/ray into gymn…
Rohan138 Jun 5, 2023
03e3c4b
Remove all import gym calls
Rohan138 Jun 5, 2023
98506e3
Fix env instantiation for env classes
Jun 5, 2023
0f39209
fix atari wrappers
Jun 5, 2023
0affa2a
Merge branch 'master' into gymnasium
Jun 5, 2023
c925e59
fix pong notebook
Jun 6, 2023
06e78e1
fix pong notebook
Jun 6, 2023
ecce074
Merge branch 'master' into gymnasium
Jun 6, 2023
65b2eec
Add comment
Jun 6, 2023
a5bba3c
Merge branch 'master' into gymnasium
Jun 8, 2023
68f3cb5
Empty commit
Jun 8, 2023
74d4461
Remove gym from requirements
Jun 8, 2023
635ab9c
Merge branch 'master' into gymnasium
Jun 9, 2023
4f5ef24
Merge branch 'master' of github.com:ray-project/ray into gymnasium
Jun 9, 2023
f5ecaa6
merge
sven1977 Jun 19, 2023
cbc22f3
test other version combination
sven1977 Jun 20, 2023
5833a90
merge
sven1977 Jun 20, 2023
acaa683
wip
sven1977 Jun 20, 2023
829d290
wip
sven1977 Jun 20, 2023
7842ff9
wip
sven1977 Jun 20, 2023
b95a0a5
wip
sven1977 Jun 20, 2023
f926316
wip
sven1977 Jun 20, 2023
721cfc9
merge
sven1977 Jun 22, 2023
58d7b11
wip
sven1977 Jun 22, 2023
78cd2b6
wip
sven1977 Jun 22, 2023
859d999
wip
sven1977 Jun 22, 2023
4d20fa2
wip
sven1977 Jun 22, 2023
87f38b7
wip
sven1977 Jun 22, 2023
f8c4c71
wip
sven1977 Jun 22, 2023
866fec3
wip
sven1977 Jun 22, 2023
c3ce9c9
wip
sven1977 Jun 23, 2023
7c41ef6
wip
sven1977 Jun 23, 2023
5ef2b8a
wip
sven1977 Jun 23, 2023
988980d
wip
sven1977 Jun 23, 2023
00aba6c
wip
sven1977 Jun 23, 2023
606c69c
wip
sven1977 Jun 23, 2023
3dda88e
wip
sven1977 Jun 23, 2023
37fbd31
wip
sven1977 Jun 23, 2023
05085c6
wip
sven1977 Jun 23, 2023
873e307
wip
sven1977 Jun 23, 2023
e984149
wip
sven1977 Jun 23, 2023
4caebe0
wip
sven1977 Jun 23, 2023
ed5928b
wip
sven1977 Jun 23, 2023
fe49364
Merge branch 'master' of https://github.com/ray-project/ray into gymn…
sven1977 Jun 23, 2023
b8f4fad
wip
sven1977 Jul 1, 2023
f24f94c
LINT
sven1977 Jul 1, 2023
f80ceee
merge
sven1977 Jul 5, 2023
ed2ccc2
LINT
sven1977 Jul 5, 2023
18baa94
wip
sven1977 Jul 5, 2023
2b8ebef
Merge branch 'master' of https://github.com/ray-project/ray into gymn…
sven1977 Jul 6, 2023
43fa608
wip
sven1977 Jul 6, 2023
10975a2
wip
sven1977 Jul 27, 2023
59fd88d
wip
sven1977 Jul 27, 2023
80ea98e
wip
sven1977 Jul 27, 2023
d113b64
wip
sven1977 Jul 27, 2023
b8183ca
wip
sven1977 Jul 27, 2023
f094723
wip
sven1977 Jul 27, 2023
70c71e5
wip
sven1977 Jul 28, 2023
264c836
wip
sven1977 Jul 28, 2023
ab7cab2
Merge branch 'master' of https://github.com/ray-project/ray into gymn…
sven1977 Jul 28, 2023
4ce8498
wip
sven1977 Jul 28, 2023
5536b4b
wip
sven1977 Jul 28, 2023
91112ce
add recording example etc
ArturNiederfahrenhorst Jul 29, 2023
ff77a3c
Add to CI deps file
ArturNiederfahrenhorst Jul 30, 2023
c3d7b2b
kick off CI again
ArturNiederfahrenhorst Jul 31, 2023
f817776
fix deps
ArturNiederfahrenhorst Jul 31, 2023
c1c34fb
Attempt to pin decorator
ArturNiederfahrenhorst Jul 31, 2023
f893c67
merge master
ArturNiederfahrenhorst Jul 31, 2023
2ffcacd
Merge branch 'master' into removeataristuffaddrecorder
ArturNiederfahrenhorst Aug 4, 2023
0439830
test ffmpeg upgrade
ArturNiederfahrenhorst Aug 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ci/env/install-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,10 @@ install_pip_packages() {
requirements_files+=("${WORKSPACE_DIR}/python/requirements/ml/rllib-test-requirements.txt")
#TODO(amogkam): Add this back to rllib-requirements.txt once mlagents no longer pins torch<1.9.0 version.
pip install --no-dependencies mlagents==0.28.0
pip install moviepy
pip install decorator==4.0.2 # Moviepy 1.0.3 will error on decorator==4.4.2 so we have to pin
sudo apt install ffmpeg -y
export IMAGEIO_FFMPEG_EXE=/usr/bin/ffmpeg

# Install MuJoCo.
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf -y
Expand Down
11 changes: 4 additions & 7 deletions doc/source/rllib/rllib-training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -528,18 +528,15 @@ Debugging RLlib Experiments
Gym Monitor
~~~~~~~~~~~

The ``"monitor": true`` config can be used to save Gym episode videos to the result dir. For example:
The ``"record": true`` config can be used to save videos of episodes to the result dir. For example:

.. code-block:: bash

rllib train --env=PongDeterministic-v4 \
--run=A2C --config '{"num_workers": 2, "monitor": true}'
--run=A2C --config '{"num_workers": 2, "record": true}'

Videos will be saved in the ``~/ray_results/<experiment>`` directory.

# videos will be saved in the ~/ray_results/<experiment> dir, for example
openaigym.video.0.31401.video000000.meta.json
openaigym.video.0.31401.video000000.mp4
openaigym.video.0.31403.video000000.meta.json
openaigym.video.0.31403.video000000.mp4

Eager Mode
~~~~~~~~~~
Expand Down
8 changes: 8 additions & 0 deletions rllib/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -4010,6 +4010,14 @@ py_test(
args = ["--stop-iters=2", "--num-steps-sampled-before-learning_starts=100", "--framework=tf2", "--use-tune", "--random-test-episodes=10", "--env-num-candidates=50", "--env-slate-size=2"],
)

py_test(
name = "examples/record_videos",
main = "examples/record_videos.py",
tags = ["team:rllib", "examples"],
size = "small",
srcs = ["examples/record_videos.py"],
)

py_test(
name = "examples/remote_envs_with_inference_done_on_main_node_tf",
main = "examples/remote_envs_with_inference_done_on_main_node.py",
Expand Down
23 changes: 20 additions & 3 deletions rllib/algorithms/algorithm_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,9 @@ def __init__(self, algo_class=None):
self.disable_env_checking = False
self.auto_wrap_old_gym_envs = True
self.action_mask_key = "action_mask"
self.record = False
self.video_folder = os.path.expanduser("~/ray_results")
self.recording_interval = 10
# Whether this env is an atari env (for atari-specific preprocessing).
# If not specified, we will try to auto-detect this.
self._is_atari = None
Expand Down Expand Up @@ -455,7 +458,6 @@ def __init__(self, algo_class=None):
# have been removed.
# === Deprecated keys ===
self.simple_optimizer = DEPRECATED_VALUE
self.monitor = DEPRECATED_VALUE
self.evaluation_num_episodes = DEPRECATED_VALUE
self.metrics_smoothing_episodes = DEPRECATED_VALUE
self.timesteps_per_iteration = DEPRECATED_VALUE
Expand Down Expand Up @@ -533,7 +535,6 @@ def to_dict(self) -> AlgorithmConfigDict:
# Simplify: Remove all deprecated keys that have as value `DEPRECATED_VALUE`.
# These would be useless in the returned dict anyways.
for dep_k in [
"monitor",
"evaluation_num_episodes",
"metrics_smoothing_episodes",
"timesteps_per_iteration",
Expand Down Expand Up @@ -1334,6 +1335,9 @@ def environment(
is_atari: Optional[bool] = NotProvided,
auto_wrap_old_gym_envs: Optional[bool] = NotProvided,
action_mask_key: Optional[str] = NotProvided,
record: Optional[bool] = NotProvided,
video_folder: Optional[str] = NotProvided,
recording_interval: Optional[int] = NotProvided,
) -> "AlgorithmConfig":
"""Sets the config's RL-environment settings.

Expand Down Expand Up @@ -1385,9 +1389,16 @@ def environment(
(gym.wrappers.EnvCompatibility). If False, RLlib will produce a
descriptive error on which steps to perform to upgrade to gymnasium
(or to switch this flag to True).
action_mask_key: If observation is a dictionary, expect the value by
action_mask_key: If observation is a dictionary, expect the value by
the key `action_mask_key` to contain a valid actions mask (`numpy.int8`
array of zeros and ones). Defaults to "action_mask".
record: Whether to record videos of the environment according to the
`recording_schedule` callable.
video_folder: Path to the directory where to save the recordings.
Defaults to "~/ray_results".
recording_interval: The interval between recordings. If you set
recording interval to any integer number, videos will be recorded every
`recording_interval` episodes. Defaults to 10.

Returns:
This updated AlgorithmConfig object.
Expand Down Expand Up @@ -1422,6 +1433,12 @@ def environment(
self.auto_wrap_old_gym_envs = auto_wrap_old_gym_envs
if action_mask_key is not NotProvided:
self.action_mask_key = action_mask_key
if record is not NotProvided:
self.record = record
if recording_interval is not NotProvided:
self.recording_interval = recording_interval
if video_folder is not NotProvided:
self.video_folder = video_folder

return self

Expand Down
52 changes: 0 additions & 52 deletions rllib/env/wrappers/atari_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,57 +46,6 @@ def get_wrapper_by_cls(env, cls):
return None


@PublicAPI
class MonitorEnv(gym.Wrapper):
def __init__(self, env=None):
"""Record episodes stats prior to EpisodicLifeEnv, etc."""
gym.Wrapper.__init__(self, env)
self._current_reward = None
self._num_steps = None
self._total_steps = None
self._episode_rewards = []
self._episode_lengths = []
self._num_episodes = 0
self._num_returned = 0

def reset(self, **kwargs):
obs, info = self.env.reset(**kwargs)

if self._total_steps is None:
self._total_steps = sum(self._episode_lengths)

if self._current_reward is not None:
self._episode_rewards.append(self._current_reward)
self._episode_lengths.append(self._num_steps)
self._num_episodes += 1

self._current_reward = 0
self._num_steps = 0

return obs, info

def step(self, action):
obs, rew, terminated, truncated, info = self.env.step(action)
self._current_reward += rew
self._num_steps += 1
self._total_steps += 1
return obs, rew, terminated, truncated, info

def get_episode_rewards(self):
return self._episode_rewards

def get_episode_lengths(self):
return self._episode_lengths

def get_total_steps(self):
return self._total_steps

def next_episode_results(self):
for i in range(self._num_returned, len(self._episode_rewards)):
yield (self._episode_rewards[i], self._episode_lengths[i])
self._num_returned = len(self._episode_rewards)


@PublicAPI
class NoopResetEnv(gym.Wrapper):
def __init__(self, env, noop_max=30):
Expand Down Expand Up @@ -328,7 +277,6 @@ def wrap_deepmind(env, dim=84, framestack=True, noframeskip=False):
dim: Dimension to resize observations to (dim x dim).
framestack: Whether to framestack observations.
"""
env = MonitorEnv(env)
env = NoopResetEnv(env, noop_max=30)
if env.spec is not None and noframeskip is True:
env = MaxAndSkipEnv(env, skip=4)
Expand Down
26 changes: 0 additions & 26 deletions rllib/evaluation/env_runner_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@

from ray.rllib.env.base_env import ASYNC_RESET_RETURN, BaseEnv
from ray.rllib.env.external_env import ExternalEnvWrapper
from ray.rllib.env.wrappers.atari_wrappers import MonitorEnv, get_wrapper_by_cls
from ray.rllib.evaluation.collectors.simple_list_collector import _PolicyCollectorGroup
from ray.rllib.policy.rnn_sequencing import pad_batch_to_sequences_of_same_size
from ray.rllib.evaluation.episode_v2 import EpisodeV2
Expand Down Expand Up @@ -408,13 +407,6 @@ def _get_rollout_metrics(
self, episode: EpisodeV2, policy_map: Dict[str, Policy]
) -> List[RolloutMetrics]:
"""Get rollout metrics from completed episode."""
# TODO(jungong) : why do we need to handle atari metrics differently?
# Can we unify atari and normal env metrics?
atari_metrics: List[RolloutMetrics] = _fetch_atari_metrics(self._base_env)
if atari_metrics is not None:
for m in atari_metrics:
m._replace(custom_metrics=episode.custom_metrics)
return atari_metrics
# Create connector metrics
connector_metrics = {}
active_agents = episode.get_agents()
Expand Down Expand Up @@ -1209,24 +1201,6 @@ def _maybe_render(self):
self._perf_stats.incr("env_render_time", time.time() - t5)


def _fetch_atari_metrics(base_env: BaseEnv) -> List[RolloutMetrics]:
"""Atari games have multiple logical episodes, one per life.

However, for metrics reporting we count full episodes, all lives included.
"""
sub_environments = base_env.get_sub_environments()
if not sub_environments:
return None
atari_out = []
for sub_env in sub_environments:
monitor = get_wrapper_by_cls(sub_env, MonitorEnv)
if not monitor:
return None
for eps_rew, eps_len in monitor.next_episode_results():
atari_out.append(RolloutMetrics(eps_len, eps_rew))
return atari_out


def _get_or_raise(
mapping: Dict[PolicyID, Union[Policy, Preprocessor, Filter]], policy_id: PolicyID
) -> Union[Policy, Preprocessor, Filter]:
Expand Down
18 changes: 18 additions & 0 deletions rllib/evaluation/rollout_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import os
import platform
import threading
import gymnasium as gym
from collections import defaultdict
from types import FunctionType
from typing import (
Expand Down Expand Up @@ -449,6 +450,23 @@ def wrap(env):

# Wrap env through the correct wrapper.
self.env: EnvType = wrap(self.env)

if self.config.record:
folder = (
self.config.video_folder
if self.config.video_folder is not None
else self.log_dir + "/videos"
)
logger.info(f"Recording videos to {folder}")

self.env = gym.wrappers.RecordVideo(
env=self.env,
video_folder=folder,
# Defines when to capture an episode based on episode ID.
episode_trigger=lambda e: e % self.config.recording_interval == 0,
name_prefix=f"RolloutWorker_{self.worker_index}_",
)

# Ideally, we would use the same make_sub_env() function below
# to create self.env, but wrap(env) and self.env has a cyclic
# dependency on each other right now, so we would settle on
Expand Down
31 changes: 10 additions & 21 deletions rllib/evaluation/sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
from ray.rllib.evaluation.collectors.simple_list_collector import SimpleListCollector
from ray.rllib.evaluation.env_runner_v2 import (
EnvRunnerV2,
_fetch_atari_metrics,
_get_or_raise,
_PerfStats,
)
Expand Down Expand Up @@ -994,28 +993,18 @@ def _process_observations(
# Now that all callbacks are done and users had the chance to add custom
# metrics based on the last observation in the episode, finish up metrics
# object and append to `outputs`.
atari_metrics: List[RolloutMetrics] = _fetch_atari_metrics(base_env)
if not episode.is_faulty:
if atari_metrics is not None:
for m in atari_metrics:
outputs.append(
m._replace(
custom_metrics=episode.custom_metrics,
hist_data=episode.hist_data,
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the original concern of the PR, but I believe that we should clear our sampler of such logic that distinguishes between environments. We are not transparent about this and user has to deepdive rllib to understand/modify what is going on here.

else:
outputs.append(
RolloutMetrics(
episode.length,
episode.total_reward,
dict(episode.agent_rewards),
episode.custom_metrics,
{},
episode.hist_data,
episode.media,
)
outputs.append(
RolloutMetrics(
episode.length,
episode.total_reward,
dict(episode.agent_rewards),
episode.custom_metrics,
{},
episode.hist_data,
episode.media,
)
)
else:
# Add metrics about a faulty episode.
outputs.append(RolloutMetrics(episode_faulty=True))
Expand Down
35 changes: 35 additions & 0 deletions rllib/examples/record_videos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""
The following example demonstrates how to record videos of your agent's behavior.

RLlib exposes the ability of the Gymnasium API to record videos.
This is done internally by wrapping the environment with the
gymnasium.wrappers.RecordVideo wrapper. You can also wrap your environment with this
wrapper manually to record videos of your agent's behavior if RLlib's built-in
video recording does not meet your needs.

In order to run this example, please regard the following:
- You must have moviepy installed (pip install moviepy)
- You must have ffmpeg installed (system dependent, e.g. brew install ffmpeg)
- moviepy must find ffmpeg -> https://github.com/Zulko/moviepy/issues/1158.
- An environment can only be recorded if it can be rendered. For most environments,
this can be achieved by setting the render_mode to 'rgb_array' in the environment
config. See the gymnasium API for more information.
"""

# First, we create videos with default settings:
from ray.rllib.algorithms.ppo import PPOConfig

config = PPOConfig().environment(
env="CartPole-v1", record=True, env_config={"render_mode": "rgb_array"}
)

# By default, videos will be saved to your experiment logs directory under
# ~/ray_results.

# Secondly, we create videos every 100 episodes::
config.environment(recording_interval=100)
algo = config.build()
algo.train()

algo = config.build()
algo.train()