Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Docs do-over (new API stack): Add scaling guide rst page. #49528

Merged
merged 28 commits into from
Jan 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b82172
wip
sven1977 Dec 27, 2024
ab5c579
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 31, 2024
9741f40
wip
sven1977 Dec 31, 2024
1f32693
wip
sven1977 Jan 1, 2025
8cf5b79
example script working well (with planned CI options)
sven1977 Jan 1, 2025
ee499e2
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 1, 2025
c483b3a
wip
sven1977 Jan 1, 2025
85534cd
wip
sven1977 Jan 1, 2025
aa13fb7
wip
sven1977 Jan 1, 2025
74db8f1
wip
sven1977 Jan 1, 2025
8245fc6
Merge branch 'cleanup_examples_folder_41_async_vector_env' into docs_…
sven1977 Jan 1, 2025
e706ede
k
sven1977 Jan 1, 2025
2714297
Merge branch 'cleanup_examples_folder_41_async_vector_env' into docs_…
sven1977 Jan 1, 2025
226b6c2
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 1, 2025
730f514
wip
sven1977 Jan 1, 2025
8cba077
wip
sven1977 Jan 1, 2025
58cc407
wip
sven1977 Jan 1, 2025
8a26f20
merge
sven1977 Jan 3, 2025
5c45c55
fix
sven1977 Jan 3, 2025
3329af7
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 4, 2025
38d986d
fix
sven1977 Jan 4, 2025
3c326f1
wip
sven1977 Jan 4, 2025
3efaa47
Apply suggestions from code review
sven1977 Jan 4, 2025
aa6a216
fix
sven1977 Jan 4, 2025
9a3cf27
Merge remote-tracking branch 'origin/docs_redo_scaling_guide' into do…
sven1977 Jan 4, 2025
add15c5
fix
sven1977 Jan 4, 2025
9f808da
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 4, 2025
c22c792
wip
sven1977 Jan 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Add new terms alphabetically.
[Aa]lgos?
(APPO|appo)
[Aa]utoscal(e|ing)
boolean
[Cc]allables?
coeff
Expand All @@ -28,3 +29,4 @@ rollout
SGD
[Tt]ensor[Ff]low
timesteps?
vectorizes?
6 changes: 1 addition & 5 deletions doc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -442,16 +442,12 @@ doctest(

doctest(
name = "doctest[rllib]",
size = "enormous",
size = "large",
files = glob(
include = [
"source/rllib/**/*.rst",
"source/rllib/**/*.md",
],
exclude = [
"source/rllib/rllib-env.rst",
"source/rllib/rllib-sample-collection.rst",
],
),
data = ["//rllib:cartpole-v1_large"],
tags = ["team:rllib"],
Expand Down
1 change: 0 additions & 1 deletion doc/source/rllib/images/rllib-config.svg

This file was deleted.

1 change: 1 addition & 0 deletions doc/source/rllib/images/scaling_axes_overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,37 @@ RLlib: Industry-Grade, Scalable Reinforcement Learning

.. sphinx_rllib_readme_end

.. todo (sven): redo toctree:
suggestion:
getting-started (replaces rllib-training)
key-concepts
rllib-env (single-agent)
... <- multi-agent
... <- external
... <- hierarchical
algorithm-configs
rllib-algorithms (overview of all available algos)
dev-guide (replaces user-guides)
debugging
scaling-guide
fault-tolerance
checkpoints
callbacks
metrics-logger
rllib-advanced-api
algorithm (general description of how algos work)
rllib-rlmodule
rllib-offline
single-agent-episode
multi-agent-episode
connector-v2
rllib-learner
env-runners
rllib-examples
rllib-new-api-stack <- remove?
new-api-stack-migration-guide
package_ref/index

.. toctree::
:hidden:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ This way, you can scale up sample collection and training, respectively, from a

.. todo: Separate out our scaling guide into its own page in new PR

See this `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
See this :ref:`scaling guide <rllib-scaling-guide>` for more details here.

You have two ways to interact with and run an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`:

Expand Down
5 changes: 4 additions & 1 deletion doc/source/rllib/rllib-advanced-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -318,8 +318,11 @@ evaluation (for example you have an environment that sometimes crashes or stalls
you should use the following combination of settings, minimizing the negative effects
of such environment behavior:

.. todo (sven): Add link here to new fault-tolerance page, once done.
:ref:`fault tolerance settings <rllib-fault-tolerance-docs>`, such as

Note that with or without parallel evaluation, all
:ref:`fault tolerance settings <rllib-scaling-guide>`, such as
fault tolerance settings, such as
``ignore_env_runner_failures`` or ``restart_failed_env_runners`` are respected and applied
to the failed evaluation workers.

Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ in combination.

.. tip::

See the `scaling guide <rllib-training.html#scaling-guide>`__ for more on RLlib training at scale.
See the :ref:`scaling guide <rllib-scaling-guide>` for more on RLlib training at scale.


Expensive Environments
Expand Down
31 changes: 0 additions & 31 deletions doc/source/rllib/rllib-training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,37 +183,6 @@ an algorithm.
`custom model classes <rllib-models.html>`__.


.. _rllib-scaling-guide:

RLlib Scaling Guide
-------------------

Here are some rules of thumb for scaling training with RLlib.

1. If the environment is slow and cannot be replicated (e.g., since it requires interaction with physical systems), then you should use a sample-efficient off-policy algorithm such as :ref:`DQN <dqn>` or :ref:`SAC <sac>`. These algorithms default to ``num_env_runners: 0`` for single-process operation. Make sure to set ``num_gpus: 1`` if you want to use a GPU. Consider also batch RL training with the `offline data <rllib-offline.html>`__ API.

2. If the environment is fast and the model is small (most models for RL are), use time-efficient algorithms such as :ref:`PPO <ppo>`, or :ref:`IMPALA <impala>`.
These can be scaled by increasing ``num_env_runners`` to add rollout workers. It may also make sense to enable `vectorization <rllib-env.html#vectorized>`__ for
inference. Make sure to set ``num_gpus: 1`` if you want to use a GPU. If the learner becomes a bottleneck, you can use multiple GPUs for learning by setting
``num_gpus > 1``.

3. If the model is compute intensive (e.g., a large deep residual network) and inference is the bottleneck, consider allocating GPUs to workers by setting ``num_gpus_per_env_runner: 1``. If you only have a single GPU, consider ``num_env_runners: 0`` to use the learner GPU for inference. For efficient use of GPU time, use a small number of GPU workers and a large number of `envs per worker <rllib-env.html#vectorized>`__.

4. Finally, if both model and environment are compute intensive, then enable `remote worker envs <rllib-env.html#vectorized>`__ with `async batching <rllib-env.html#vectorized>`__ by setting ``remote_worker_envs: True`` and optionally ``remote_env_batch_wait_ms``. This batches inference on GPUs in the rollout workers while letting envs run asynchronously in separate actors, similar to the `SEED <https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html>`__ architecture. The number of workers and number of envs per worker should be tuned to maximize GPU utilization.

In case you are using lots of workers (``num_env_runners >> 10``) and you observe worker failures for whatever reasons, which normally interrupt your RLlib training runs, consider using
the config settings ``ignore_env_runner_failures=True``, ``restart_failed_env_runners=True``, or ``restart_failed_sub_environments=True``:

``restart_failed_env_runners``: When set to True (default), your Algorithm will attempt to restart any failed EnvRunner and replace it with a newly created one. This way, your number of workers will never decrease, even if some of them fail from time to time.
``ignore_env_runner_failures``: When set to True, your Algorithm will not crash due to an EnvRunner error, but continue for as long as there is at least one functional worker remaining. This setting is ignored when ``restart_failed_env_runners=True``.
``restart_failed_sub_environments``: When set to True and there is a failure in one of the vectorized sub-environments in one of your EnvRunners, RLlib tries to recreate only the failed sub-environment and re-integrate the newly created one into your vectorized env stack on that EnvRunner.

Note that only one of ``ignore_env_runner_failures`` or ``restart_failed_env_runners`` should be set to True (they are mutually exclusive settings). However,
you can combine each of these with the ``restart_failed_sub_environments=True`` setting.
Using these options will make your training runs much more stable and more robust against occasional OOM or other similar "once in a while" errors on the EnvRunners
themselves or inside your custom environments.


Debugging RLlib Experiments
---------------------------

Expand Down
194 changes: 194 additions & 0 deletions doc/source/rllib/scaling-guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-scaling-guide:

RLlib scaling guide
===================

RLlib is a distributed and scalable RL library, based on `Ray <https://www.ray.io/>`__. An RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
uses `Ray actors <https://docs.ray.io/en/latest/ray-core/actors.html>`__ wherever parallelization of
its sub-components can speed up sample and learning throughput.

.. figure:: images/scaling_axes_overview.svg
:width: 600
:align: left

**Scalable axes in RLlib**: Three scaling axes are available across all RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` classes:

- The number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors in the :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`,
settable through ``config.env_runners(num_env_runners=n)``.

- The number of vectorized sub-environments on each
:py:class:`~ray.rllib.env.env_runner.EnvRunner` actor, settable through ``config.env_runners(num_envs_per_env_runner=p)``.

- The number of :py:class:`~ray.rllib.core.learner.learner.Learner` actors in the
:py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`, settable through ``config.learners(num_learners=m)``.


Scaling the number of EnvRunner actors
--------------------------------------

You can control the degree of parallelism for the sampling machinery of the
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` by increasing the number of remote
:py:class:`~ray.rllib.env.env_runner.EnvRunner` actors in the :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`
through the config as follows.

.. testcode::

from ray.rllib.algorithms.ppo import PPOConfig

config = (
PPOConfig()
# Use 4 EnvRunner actors (default is 2).
.env_runners(num_env_runners=4)
)

To assign resources to each :py:class:`~ray.rllib.env.env_runner.EnvRunner`, use these config settings:

.. code-block:: python

config.env_runners(
num_cpus_per_env_runner=..,
num_gpus_per_env_runner=..,
)

See this
`example of an EnvRunner and RL environment requiring a GPU resource <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/gpus_on_env_runners.py>`__.

The number of GPUs may be fractional quantities, for example 0.5, to allocate only a fraction of a GPU per
:py:class:`~ray.rllib.env.env_runner.EnvRunner`.

Note that there's always one "local" :py:class:`~ray.rllib.env.env_runner.EnvRunner` in the
:py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`.
If you only want to sample using this local :py:class:`~ray.rllib.env.env_runner.EnvRunner`,
set ``num_env_runners=0``. This local :py:class:`~ray.rllib.env.env_runner.EnvRunner` directly sits in the main
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` process.

.. hint::
The Ray team may decide to deprecate the local :py:class:`~ray.rllib.env.env_runner.EnvRunner` some time in the future.
It still exists for historical reasons. It's usefulness to keep in the set is still under debate.


Scaling the number of envs per EnvRunner actor
----------------------------------------------

RLlib vectorizes :ref:`RL environments <rllib-key-concepts-environments>` on :py:class:`~ray.rllib.env.env_runner.EnvRunner`
actors through the `gymnasium's VectorEnv <https://gymnasium.farama.org/api/vector/>`__ API.
To create more than one environment copy per :py:class:`~ray.rllib.env.env_runner.EnvRunner`, set the following in your config:

.. testcode::

from ray.rllib.algorithms.ppo import PPOConfig

config = (
PPOConfig()
# Use 10 sub-environments (vector) per EnvRunner.
.env_runners(num_envs_per_env_runner=10)
)

.. note::
Unlike single-agent environments, RLlib can't vectorize multi-agent setups yet.
The Ray team is working on a solution for this restriction by utilizing
`gymnasium >= 1.x` custom vectorization feature.

Doing so allows the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` on the
:py:class:`~ray.rllib.env.env_runner.EnvRunner` to run inference on a batch of data and
thus compute actions for all sub-environments in parallel.

By default, the individual sub-environments in a vector ``step`` and ``reset``, in sequence, making only
the action computation of the RL environment loop parallel, because observations can move through the model
in a batch.
However, `gymnasium <https://gymnasium.farama.org/>`__ supports an asynchronous
vectorization setting, in which each sub-environment receives its own Python process.
This way, the vector environment can ``step`` or ``reset`` in parallel. Activate
this asynchronous vectorization behavior through:

.. testcode::

import gymnasium as gym

config.env_runners(
gym_env_vectorize_mode=gym.envs.registration.VectorizeMode.ASYNC, # default is `SYNC`
)

This setting can speed up the sampling process significantly in combination with ``num_envs_per_env_runner > 1``,
especially when your RL environment's stepping process is time consuming.

See this `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/async_gym_env_vectorization.py>`__ that demonstrates a massive speedup with async vectorization.


Scaling the number of Learner actors
------------------------------------

Learning updates happen in the :py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`, which manages either a single,
local :py:class:`~ray.rllib.core.learner.learner.Learner` instance or any number of remote
:py:class:`~ray.rllib.core.learner.learner.Learner` actors.

Set the number of remote :py:class:`~ray.rllib.core.learner.learner.Learner` actors through:

.. testcode::

from ray.rllib.algorithms.ppo import PPOConfig

config = (
PPOConfig()
# Use 2 remote Learner actors (default is 0) for distributed data parallelism.
# Choosing 0 creates a local Learner instance on the main Algorithm process.
.learners(num_learners=2)
)

Typically, you use as many :py:class:`~ray.rllib.core.learner.learner.Learner` actors as you have GPUs available for training.
Make sure to set the number of GPUs per :py:class:`~ray.rllib.core.learner.learner.Learner` to 1:

.. testcode::

config.learners(num_gpus_per_learner=1)

.. warning::
For some algorithms, such as IMPALA and APPO, the performance of a single remote
:py:class:`~ray.rllib.core.learner.learner.Learner` actor (``num_learners=1``) compared to a
single local :py:class:`~ray.rllib.core.learner.learner.Learner` instance (``num_learners=0``),
depends on whether you have a GPU available or not.
If exactly one GPU is available, you should run these two algorithms with ``num_learners=0, num_gpus_per_learner=1``,
if no GPU is available, set ``num_learners=1, num_gpus_per_learner=0``. If more than 1 GPU is available,
set ``num_learners=.., num_gpus_per_learner=1``.

The number of GPUs may be fractional quantities, for example 0.5, to allocate only a fraction of a GPU per
:py:class:`~ray.rllib.env.env_runner.EnvRunner`. For example, you can pack five :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
instances onto one GPU by setting ``num_learners=1, num_gpus_per_learner=0.2``.
See this `fractional GPU example <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/fractional_gpus.py>`__
for details.

.. note::
If you specify ``num_gpus_per_learner > 0`` and your machine doesn't have the required number of GPUs
available, the experiment may stall until the Ray autoscaler brings up enough machines to fulfill the resource request.
If your cluster has autoscaling turned off, this setting then results in a seemingly hanging experiment run.

On the other hand, if you set ``num_gpus_per_learner=0``, RLlib builds the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`
instances solely on CPUs, even if GPUs are available on the cluster.


Outlook: More RLlib elements that should scale
----------------------------------------------

There are other components and aspects in RLlib that should be able to scale up.

For example, the model size is limited to whatever fits on a single GPU, due to
"distributed data parallel" (DDP) being the only way in which RLlib scales :py:class:`~ray.rllib.core.learner.learner.Learner`
actors.

The Ray team is working on closing these gaps. In particular, future areas of improvements are:

- Enable **training very large models**, such as a "large language model" (LLM). The team is actively working on a
"Reinforcement Learning from Human Feedback" (RLHF) prototype setup. The main problems to solve are the
model-parallel and tensor-parallel distribution across multiple GPUs, as well as, a reasonably fast transfer of
weights between Ray actors.

- Enable training with **thousands of multi-agent policies**. A possible solution for this scaling problem
could be to split up the :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule` into
manageable groups of individual policies across the various :py:class:`~ray.rllib.env.env_runner.EnvRunner`
and :py:class:`~ray.rllib.core.learner.learner.Learner` actors.

- Enabling **vector envs for multi-agent**.
9 changes: 9 additions & 0 deletions doc/source/rllib/user-guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ User Guides
rllib-torch2x
rllib-fault-tolerance
rllib-dev
scaling-guide


.. _rllib-feature-guide:
Expand Down Expand Up @@ -97,3 +98,11 @@ RLlib Feature Guides
.. button-ref:: rllib-dev

How to contribute to RLlib?

.. grid-item-card::
:img-top: /rllib/images/rllib-logo.svg
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: scaling-guide

How to run RLlib experiments at scale
Loading