Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ray.rllib to 2.5 #2067

Merged
merged 17 commits into from
Jun 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,17 @@ Copy and pasting the git commit messages is __NOT__ enough.

## [Unreleased]
### Added
- Added `rllib/pg_example.py` to demonstrate a simple integration with `RLlib` and `tensorflow` for policy training.
- Added `rllib/pg_pbt_example.py` to demonstrate integration with `ray.RLlib`, `tensorflow`, and `ray.tune` for scheduled policy training.
### Changed
- Updated `smarts[ray]` (`ray==2.2`) and `smarts[rllib]` (`ray[rllib]==1.4`) to use `ray~=2.5`.
- Introduced `tensorflow-probability` to `smarts[rllib]`.
- Updated `RLlibHiWayEnv` to use the `gymnasium` interface.
- Renamed `rllib/rllib.py` to `rllib/pg_pbt_example.py`.
- Loosened constraint of `gymnasium` from `==0.27.0` to `>=0.26.3`.
### Deprecated
### Fixed
- Missing neighborhood vehicle state `'lane_id'` is now added to the `hiway-v1` formatted observations.
- Fixed a regression where `pybullet` build time messages returned.
### Removed
### Security
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ Several agent control policies and agent [action types](smarts/core/controllers/
### RL Model
1. [Drive](examples/rl/drive). See [Driving SMARTS 2023.1 & 2023.2](https://smarts.readthedocs.io/en/latest/benchmarks/driving_smarts_2023_1.html) for more info.
1. [VehicleFollowing](examples/rl/platoon). See [Driving SMARTS 2023.3](https://smarts.readthedocs.io/en/latest/benchmarks/driving_smarts_2023_3.html) for more info.
1. [PG](examples/rl/rllib/pg_example.py). See [RLlib](https://smarts.readthedocs.io/en/latest/docs/ecosystem/rllib.html) for more info.
1. [PG Population Based Training](examples/rl/rllib/pg_pbt_example.py). See [RLlib](https://smarts.readthedocs.io/en/latest/docs/ecosystem/rllib.html) for more info.

### RL Environment
1. [ULTRA](https://github.com/smarts-project/smarts-project.rl/blob/master/ultra) provides a gym-based environment built upon SMARTS to tackle intersection navigation, specifically the unprotected left turn.
Expand Down
11 changes: 9 additions & 2 deletions docs/ecosystem/rllib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ RLlib
of applications. ``RLlib`` natively supports ``TensorFlow``, ``TensorFlow Eager``, and ``PyTorch``. Most of its internals are agnostic to such
deep learning frameworks.

SMARTS contains two examples using `Policy Gradients (PG) <https://docs.ray.io/en/latest/rllib-algorithms.html#policy-gradients-pg>`_.

1. ``rllib/pg_example.py``
This example shows the basics of using RLlib with SMARTS through :class:`~smarts.env.rllib_hiway_env.RLlibHiWayEnv`.
1. ``rllib/pg_pbt_example.py``
This example combines Policy Gradients with `Population Based Training (PBT) <https://docs.ray.io/en/latest/tune/api/doc/ray.tune.schedulers.PopulationBasedTraining.html>`_ scheduling.

Recommended reads
-----------------

Expand All @@ -28,7 +35,7 @@ many docs about ``Ray`` and ``RLlib``. We recommend to read the following pages
Resume training
---------------

With respect to ``SMARTS/examples/rl/rllib`` example, if you want to continue an aborted experiment, you can set ``resume=True`` in ``tune.run``. But note that ``resume=True`` will continue to use the same configuration as was set in the original experiment.
With respect to ``SMARTS/examples/rl/rllib`` examples, if you want to continue an aborted experiment, you can set ``resume_training=True``. But note that ``resume_training=True`` will continue to use the same configuration as was set in the original experiment.
To make changes to a started experiment, you can edit the latest experiment file in ``./results``.

Or if you want to start a new experiment but train from an existing checkpoint, you can set ``restore=checkpoint_path`` in ``tune.run``.
Or if you want to start a new experiment but train from an existing checkpoint, you will need to look into `How to Save and Load Trial Checkpoints <https://docs.ray.io/en/latest/tune/tutorials/tune-trial-checkpoints>`_.
9 changes: 6 additions & 3 deletions docs/sim/env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,20 @@ Base environments
SMARTS environment module is defined in :mod:`~smarts.env` package. Currently SMARTS provides two kinds of training
environments, namely:

+ ``HiWayEnv`` utilizing ``gym.env`` style interface
+ ``HiWayEnv`` utilizing a ``gymnasium.Env`` interface
+ ``RLlibHiwayEnv`` customized for `RLlib <https://docs.ray.io/en/latest/rllib/index.html>`_ training

.. image:: ../_static/env.png

HiWayEnv
^^^^^^^^

``HiWayEnv`` inherits class ``gym.Env`` and supports gym APIs like ``reset``, ``step``, ``close``. An usage example is shown below.
``HiWayEnv`` inherits class ``gymnasium.Env`` and supports gym APIs like ``reset``, ``step``, ``close``. An usage example is shown below.
Refer to :class:`~smarts.env.hiway_env.HiWayEnv` for more details.

.. code-block:: python

import gymnasium as gym
# Make env
env = gym.make(
"smarts.env:hiway-v0", # Env entry name.
Expand Down Expand Up @@ -53,6 +54,7 @@ exactly matches the `env.observation_space`, and `ObservationOptions.multi_agent

.. code-block:: python

import gymnasium as gym
# Make env
env = gym.make(
"smarts.env:hiway-v1", # Env entry name.
Expand Down Expand Up @@ -81,6 +83,7 @@ This can be done with :class:`~smarts.env.gymnasium.wrappers.api_reversion.Api02

.. code-block:: python

import gymnasium as gym
# Make env
env = gym.make(
"smarts.env:hiway-v1", # Env entry name.
Expand All @@ -91,7 +94,7 @@ This can be done with :class:`~smarts.env.gymnasium.wrappers.api_reversion.Api02
RLlibHiwayEnv
^^^^^^^^^^^^^

``RLlibHiwayEnv`` inherits class ``MultiAgentEnv``, which is defined in `RLlib <https://docs.ray.io/en/latest/rllib/index.html>`_. It also supports common env APIs like ``reset``,
``RLlibHiwayEnv`` inherits class ``MultiAgentEnv``, which is defined in `RLlib <https://docs.ray.io/en/latest/rllib/index.html>`_. It also supports common environment APIs like ``reset``,
``step``, ``close``. An usage example is shown below. Refer to :class:`~smarts.env.rllib_hiway_env.RLlibHiWayEnv` for more details.

.. code-block:: python
Expand Down
69 changes: 69 additions & 0 deletions examples/rl/rllib/configs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import argparse
import multiprocessing
from pathlib import Path


def gen_parser(prog: str, default_result_dir: str) -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(prog)
parser.add_argument(
"scenarios",
help="A list of scenarios. Each element can be either the scenario to"
"run or a directory of scenarios to sample from. See `scenarios/`"
"folder for some samples you can use.",
type=str,
nargs="*",
)
parser.add_argument(
"--envision",
action="store_true",
help="Run simulation with Envision display.",
)
parser.add_argument(
"--train_batch_size",
type=int,
default=2000,
help="The training batch size. This value must be > 0.",
)
parser.add_argument(
"--time_total_s",
type=int,
default=1 * 60 * 60, # 1 hour
help="Total time in seconds to run the simulation for. This is a rough end time as it will be checked per training batch.",
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="The base random seed to use, intended to be mixed with --num_samples",
)
parser.add_argument(
"--num_agents", type=int, default=2, help="Number of agents (one per policy)"
)
parser.add_argument(
"--num_workers",
type=int,
default=(multiprocessing.cpu_count() // 2 + 1),
help="Number of workers (defaults to use all system cores)",
)
parser.add_argument(
"--resume_training",
default=False,
action="store_true",
help="Resume an errored or 'ctrl+c' cancelled training. This does not extend a fully run original experiment.",
)
parser.add_argument(
"--result_dir",
type=str,
default=default_result_dir,
help="Directory containing results",
)
parser.add_argument(
"--log_level",
type=str,
default="ERROR",
help="Log level (DEBUG|INFO|WARN|ERROR)",
)
parser.add_argument(
"--checkpoint_freq", type=int, default=3, help="Checkpoint frequency"
)
return parser
2 changes: 1 addition & 1 deletion examples/rl/rllib/model/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## Model Binaries

The binaries located in this directory are the components of a trained rllib model. These are related to the `examples/rl/rllib/rllib.py` example script. Results from `examples/rl/rllib/rllib.py` are loaded and written to this directory.
The binaries located in this directory are the components of a trained rllib model. These are related to the `examples/rl/rllib/pg_pbt_example.py` example script. Results from `examples/rl/rllib/pg_pbt_example.py` are loaded and written to this directory.
Binary file removed examples/rl/rllib/model/saved_model.pb
Binary file not shown.
Binary file not shown.
Binary file removed examples/rl/rllib/model/variables/variables.index
Binary file not shown.
218 changes: 218 additions & 0 deletions examples/rl/rllib/pg_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
from pathlib import Path
from pprint import pprint as print
from typing import Dict, Literal, Optional, Union

import numpy as np

try:
from ray.rllib.algorithms.algorithm import Algorithm, AlgorithmConfig
from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.rllib.algorithms.pg import PGConfig
from ray.rllib.env.base_env import BaseEnv
from ray.rllib.evaluation.episode import Episode
from ray.rllib.evaluation.episode_v2 import EpisodeV2
from ray.rllib.evaluation.rollout_worker import RolloutWorker
from ray.rllib.policy.policy import Policy
from ray.rllib.utils.typing import PolicyID
except Exception as e:
from smarts.core.utils.custom_exceptions import RayException

raise RayException.required_to("rllib.py")

import smarts
from smarts.env.rllib_hiway_env import RLlibHiWayEnv
from smarts.sstudio.scenario_construction import build_scenarios

if __name__ == "__main__":
from configs import gen_parser
from rllib_agent import TrainingModel, rllib_agent
else:
from .configs import gen_parser
from .rllib_agent import TrainingModel, rllib_agent

# Add custom metrics to your tensorboard using these callbacks
# See: https://ray.readthedocs.io/en/latest/rllib-training.html#callbacks-and-custom-metrics
class Callbacks(DefaultCallbacks):
@staticmethod
def on_episode_start(
worker: RolloutWorker,
base_env: BaseEnv,
policies: Dict[PolicyID, Policy],
episode: Union[Episode, EpisodeV2],
env_index: int,
**kwargs,
):

episode.user_data["ego_reward"] = []

@staticmethod
def on_episode_step(
worker: RolloutWorker,
base_env: BaseEnv,
episode: Union[Episode, EpisodeV2],
env_index: int,
**kwargs,
):
single_agent_id = list(episode.get_agents())[0]
infos = episode._last_infos.get(single_agent_id)
if infos is not None:
episode.user_data["ego_reward"].append(infos["reward"])

@staticmethod
def on_episode_end(
worker: RolloutWorker,
base_env: BaseEnv,
policies: Dict[PolicyID, Policy],
episode: Union[Episode, EpisodeV2],
env_index: int,
**kwargs,
):

mean_ego_speed = np.mean(episode.user_data["ego_reward"])
print(
f"ep. {episode.episode_id:<12} ended;"
f" length={episode.length:<6}"
f" mean_ego_reward={mean_ego_speed:.2f}"
)
episode.custom_metrics["mean_ego_reward"] = mean_ego_speed


def main(
scenarios,
envision,
time_total_s,
rollout_fragment_length,
train_batch_size,
seed,
num_agents,
num_workers,
resume_training,
result_dir,
checkpoint_freq: int,
checkpoint_num: Optional[int],
log_level: Literal["DEBUG", "INFO", "WARN", "ERROR"],
):
rllib_policies = {
f"AGENT-{i}": (
None,
rllib_agent["observation_space"],
rllib_agent["action_space"],
{"model": {"custom_model": TrainingModel.NAME}},
)
for i in range(num_agents)
}
agent_specs = {f"AGENT-{i}": rllib_agent["agent_spec"] for i in range(num_agents)}

smarts.core.seed(seed)
assert len(set(rllib_policies.keys()).difference(agent_specs)) == 0
algo_config: AlgorithmConfig = (
PGConfig()
.environment(
env=RLlibHiWayEnv,
env_config={
"seed": seed,
"scenarios": [
str(Path(scenario).expanduser().resolve().absolute())
for scenario in scenarios
],
"headless": not envision,
"agent_specs": agent_specs,
"observation_options": "multi_agent",
},
disable_env_checking=True,
)
.framework(framework="tf2", eager_tracing=True)
.rollouts(
rollout_fragment_length=rollout_fragment_length,
num_rollout_workers=num_workers,
num_envs_per_worker=1,
enable_tf1_exec_eagerly=True,
)
.training(
lr_schedule=[(0, 1e-3), (1e3, 5e-4), (1e5, 1e-4), (1e7, 5e-5), (1e8, 1e-5)],
train_batch_size=train_batch_size,
)
.multi_agent(
policies=rllib_policies,
policy_mapping_fn=lambda agent_id, episode, worker, **kwargs: f"{agent_id}",
)
.callbacks(callbacks_class=Callbacks)
.debugging(log_level=log_level)
)

def get_checkpoint_dir(num):
checkpoint_dir = Path(result_dir) / f"checkpoint_{num}" / f"checkpoint-{num}"
checkpoint_dir.mkdir(parents=True, exist_ok=True)
return checkpoint_dir

if resume_training:
checkpoint = str(get_checkpoint_dir("latest"))
if checkpoint_num:
checkpoint = str(get_checkpoint_dir(checkpoint_num))
else:
checkpoint = None

print(f"======= Checkpointing at {str(result_dir)} =======")

algo = algo_config.build()
if checkpoint is not None:
algo.load_checkpoint(checkpoint=checkpoint)
result = {}
current_iteration = 0
checkpoint_iteration = checkpoint_num or 0

try:
while result.get("time_total_s", 0) < time_total_s:
result = algo.train()
print(f"======== Iteration {result['training_iteration']} ========")
print(result, depth=1)

if current_iteration % checkpoint_freq == 0:
checkpoint_dir = get_checkpoint_dir(checkpoint_iteration)
print(f"======= Saving checkpoint {checkpoint_iteration} =======")
algo.save_checkpoint(checkpoint_dir)
checkpoint_iteration += 1
current_iteration += 1
algo.save_checkpoint(get_checkpoint_dir(checkpoint_iteration))
finally:
algo.save_checkpoint(get_checkpoint_dir("latest"))
algo.stop()


if __name__ == "__main__":
default_result_dir = str(Path(__file__).resolve().parent / "results" / "pg_results")
parser = gen_parser("rllib-example", default_result_dir)
parser.add_argument(
"--checkpoint_num",
type=int,
default=None,
help="The checkpoint number to restart from.",
)
parser.add_argument(
"--rollout_fragment_length",
type=str,
default="auto",
help="Episodes are divided into fragments of this many steps for each rollout. In this example this will be ensured to be `1=<rollout_fragment_length<=train_batch_size`",
)
args = parser.parse_args()
if not args.scenarios:
args.scenarios = [
str(Path(__file__).absolute().parents[3] / "scenarios" / "sumo" / "loop"),
]
build_scenarios(scenarios=args.scenarios, clean=False, seed=args.seed)

main(
scenarios=args.scenarios,
envision=args.envision,
time_total_s=args.time_total_s,
rollout_fragment_length=args.rollout_fragment_length,
train_batch_size=args.train_batch_size,
seed=args.seed,
num_agents=args.num_agents,
num_workers=args.num_workers,
resume_training=args.resume_training,
result_dir=args.result_dir,
checkpoint_freq=max(args.checkpoint_freq, 1),
checkpoint_num=args.checkpoint_num,
log_level=args.log_level,
)
Loading