Skip to content

Commit 57a3dc2

Browse files
sven1977richardsliu
authored andcommitted
[RLlib] Cleanup examples folder ray-project#13. Fix main examples docs page for RLlib. (ray-project#45382)
Signed-off-by: Richard Liu <[email protected]>
1 parent 09e846f commit 57a3dc2

File tree

90 files changed

+437
-283
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+437
-283
lines changed

.vale/styles/config/vocabularies/RLlib/accept.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,12 @@ config
99
(IMPALA|impala)
1010
hyperparameters?
1111
MARLModule
12+
MLAgents
13+
multiagent
1214
postprocessing
1315
(PPO|ppo)
1416
[Pp]y[Tt]orch
17+
pragmas?
1518
(RL|rl)lib
1619
RLModule
1720
rollout

doc/source/rllib/images/sigils/new-api-stack.svg

Lines changed: 1 addition & 0 deletions
Loading

doc/source/rllib/images/sigils/old-api-stack.svg

Lines changed: 1 addition & 0 deletions
Loading

doc/source/rllib/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ Feature Overview
167167

168168
**RLlib Algorithms**
169169
^^^
170-
Check out the many available RL algorithms of RLlib for model-free and model-based
170+
See the many available RL algorithms of RLlib for model-free and model-based
171171
RL, on-policy and off-policy training, multi-agent RL, and more.
172172
+++
173173
.. button-ref:: rllib-algorithms-doc

doc/source/rllib/key-concepts.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ The following figure shows *synchronous sampling*, the simplest of `these patter
114114

115115
RLlib uses `Ray actors <actors.html>`__ to scale training from a single core to many thousands of cores in a cluster.
116116
You can `configure the parallelism <rllib-training.html#specifying-resources>`__ used for training by changing the ``num_env_runners`` parameter.
117-
Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
117+
See this `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
118118

119119

120120
RL Modules

doc/source/rllib/package_ref/evaluation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ which sit inside a :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`
2323

2424
**A typical RLlib EnvRunnerGroup setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` contains
2525
exactly one local :py:class:`~ray.rllib.env.env_runner.EnvRunner` object and N ray remote
26-
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (ray actors).
26+
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (Ray actors).
2727
The workers contain a policy map (with one or more policies), and - in case a simulator
2828
(env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
2929
(containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls

doc/source/rllib/rllib-advanced-api.rst

Lines changed: 23 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -19,87 +19,31 @@ implement `custom training workflows (example) <https://github.com/ray-project/r
1919
Curriculum Learning
2020
~~~~~~~~~~~~~~~~~~~
2121

22-
In Curriculum learning, the environment can be set to different difficulties
23-
(or "tasks") to allow for learning to progress through controlled phases (from easy to
24-
more difficult). RLlib comes with a basic curriculum learning API utilizing the
25-
`TaskSettableEnv <https://github.com/ray-project/ray/blob/master/rllib/env/apis/task_settable_env.py>`__ environment API.
26-
Your environment only needs to implement the `set_task` and `get_task` methods
27-
for this to work. You can then define an `env_task_fn` in your config,
28-
which receives the last training results and returns a new task for the env to be set to:
29-
30-
.. TODO move to doc_code and make it use algo configs.
31-
.. code-block:: python
32-
33-
from ray.rllib.env.apis.task_settable_env import TaskSettableEnv
34-
35-
class MyEnv(TaskSettableEnv):
36-
def get_task(self):
37-
return self.current_difficulty
38-
39-
def set_task(self, task):
40-
self.current_difficulty = task
41-
42-
def curriculum_fn(train_results, task_settable_env, env_ctx):
43-
# Very simple curriculum function.
44-
current_task = task_settable_env.get_task()
45-
new_task = current_task + 1
46-
return new_task
47-
48-
# Setup your Algorithm's config like so:
49-
config = {
50-
"env": MyEnv,
51-
"env_task_fn": curriculum_fn,
52-
}
53-
# Train using `Tuner.fit()` or `Algorithm.train()` and the above config stub.
54-
# ...
55-
56-
There are two more ways to use the RLlib's other APIs to implement
57-
`curriculum learning <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`__.
58-
59-
Use the Algorithm API and update the environment between calls to ``train()``.
60-
This example shows the algorithm being run inside a Tune function.
61-
This is basically the same as what the built-in `env_task_fn` API described above
62-
already does under the hood, but allows you to do even more customizations to your
63-
training loop.
64-
65-
.. TODO move to doc_code and make it use algo configs.
66-
.. code-block:: python
67-
68-
import ray
69-
from ray import train, tune
70-
from ray.rllib.algorithms.ppo import PPO
71-
72-
def train_fn(config):
73-
algo = PPO(config=config, env=YourEnv)
74-
while True:
75-
result = algo.train()
76-
train.report(result)
77-
if result["env_runners"]["episode_return_mean"] > 200:
78-
task = 2
79-
elif result["env_runners"]["episode_return_mean"] > 100:
80-
task = 1
81-
else:
82-
task = 0
83-
algo.workers.foreach_worker(
84-
lambda ev: ev.foreach_env(
85-
lambda env: env.set_task(task)))
86-
87-
num_gpus = 0
88-
num_env_runners = 2
22+
In curriculum learning, you can set the environment to different difficulties
23+
throughout the training process. This setting allows the algorithm to learn how to solve
24+
the actual and final problem incrementally, by interacting with and exploring in more and
25+
more difficult phases.
26+
Normally, such a curriculum starts with setting the environment to an easy level and
27+
then - as training progresses - transitions more toward a harder-to-solve difficulty.
28+
See the `Reverse Curriculum Generation for Reinforcement Learning Agents <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`_ blog post
29+
for another example of how you can do curriculum learning.
30+
31+
RLlib's Algorithm and custom callbacks APIs allow for implementing any arbitrary
32+
curricula. This `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum/curriculum_learning.py>`__ introduces
33+
the basic concepts you need to understand.
34+
35+
First, define some env options. This example uses the `FrozenLake-v1` environment,
36+
a grid world, whose map is fully customizable. Three tasks of different env difficulties
37+
are represented by slightly different maps that the agent has to navigate.
38+
39+
.. literalinclude:: ../../../rllib/examples/curriculum/curriculum_learning.py
40+
:language: python
41+
:start-after: __curriculum_learning_example_env_options__
42+
:end-before: __END_curriculum_learning_example_env_options__
8943

90-
ray.init()
91-
tune.Tuner(
92-
tune.with_resources(train_fn, resources=tune.PlacementGroupFactory(
93-
[{"CPU": 1}, {"GPU": num_gpus}] + [{"CPU": 1}] * num_env_runners
94-
),)
95-
param_space={
96-
"num_gpus": num_gpus,
97-
"num_env_runners": num_env_runners,
98-
},
99-
).fit()
44+
Then, define the central piece controlling the curriculum, which is a custom callbacks class
45+
overriding the :py:meth:`~ray.rllib.algorithms.callbacks.Callbacks.on_train_result`.
10046

101-
You could also use RLlib's callbacks API to update the environment on new training
102-
results:
10347

10448
.. TODO move to doc_code and make it use algo configs.
10549
.. code-block:: python

doc/source/rllib/rllib-algorithms.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Algorithms
99

1010
.. tip::
1111

12-
Check out the `environments <rllib-env.html>`__ page to learn more about different environment types.
12+
See the `environments <rllib-env.html>`__ page to learn more about different environment types.
1313

1414
Available Algorithms - Overview
1515
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/rllib/rllib-env.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ RLlib works with several different types of environments, including `Farama-Foun
1111

1212
.. tip::
1313

14-
Not all environments work with all algorithms. Check out the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
14+
Not all environments work with all algorithms. See the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
1515

1616
.. image:: images/rllib-envs.svg
1717

0 commit comments

Comments
 (0)