@@ -19,87 +19,31 @@ implement `custom training workflows (example) <https://github.com/ray-project/r
1919Curriculum Learning
2020~~~~~~~~~~~~~~~~~~~
2121
22- In Curriculum learning, the environment can be set to different difficulties
23- (or "tasks") to allow for learning to progress through controlled phases (from easy to
24- more difficult). RLlib comes with a basic curriculum learning API utilizing the
25- `TaskSettableEnv <https://github.com/ray-project/ray/blob/master/rllib/env/apis/task_settable_env.py >`__ environment API.
26- Your environment only needs to implement the `set_task ` and `get_task ` methods
27- for this to work. You can then define an `env_task_fn ` in your config,
28- which receives the last training results and returns a new task for the env to be set to:
29-
30- .. TODO move to doc_code and make it use algo configs.
31- .. code-block :: python
32-
33- from ray.rllib.env.apis.task_settable_env import TaskSettableEnv
34-
35- class MyEnv (TaskSettableEnv ):
36- def get_task (self ):
37- return self .current_difficulty
38-
39- def set_task (self , task ):
40- self .current_difficulty = task
41-
42- def curriculum_fn (train_results , task_settable_env , env_ctx ):
43- # Very simple curriculum function.
44- current_task = task_settable_env.get_task()
45- new_task = current_task + 1
46- return new_task
47-
48- # Setup your Algorithm's config like so:
49- config = {
50- " env" : MyEnv,
51- " env_task_fn" : curriculum_fn,
52- }
53- # Train using `Tuner.fit()` or `Algorithm.train()` and the above config stub.
54- # ...
55-
56- There are two more ways to use the RLlib's other APIs to implement
57- `curriculum learning <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/ >`__.
58-
59- Use the Algorithm API and update the environment between calls to ``train() ``.
60- This example shows the algorithm being run inside a Tune function.
61- This is basically the same as what the built-in `env_task_fn ` API described above
62- already does under the hood, but allows you to do even more customizations to your
63- training loop.
64-
65- .. TODO move to doc_code and make it use algo configs.
66- .. code-block :: python
67-
68- import ray
69- from ray import train, tune
70- from ray.rllib.algorithms.ppo import PPO
71-
72- def train_fn (config ):
73- algo = PPO(config = config, env = YourEnv)
74- while True :
75- result = algo.train()
76- train.report(result)
77- if result[" env_runners" ][" episode_return_mean" ] > 200 :
78- task = 2
79- elif result[" env_runners" ][" episode_return_mean" ] > 100 :
80- task = 1
81- else :
82- task = 0
83- algo.workers.foreach_worker(
84- lambda ev : ev.foreach_env(
85- lambda env : env.set_task(task)))
86-
87- num_gpus = 0
88- num_env_runners = 2
22+ In curriculum learning, you can set the environment to different difficulties
23+ throughout the training process. This setting allows the algorithm to learn how to solve
24+ the actual and final problem incrementally, by interacting with and exploring in more and
25+ more difficult phases.
26+ Normally, such a curriculum starts with setting the environment to an easy level and
27+ then - as training progresses - transitions more toward a harder-to-solve difficulty.
28+ See the `Reverse Curriculum Generation for Reinforcement Learning Agents <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/ >`_ blog post
29+ for another example of how you can do curriculum learning.
30+
31+ RLlib's Algorithm and custom callbacks APIs allow for implementing any arbitrary
32+ curricula. This `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum/curriculum_learning.py >`__ introduces
33+ the basic concepts you need to understand.
34+
35+ First, define some env options. This example uses the `FrozenLake-v1 ` environment,
36+ a grid world, whose map is fully customizable. Three tasks of different env difficulties
37+ are represented by slightly different maps that the agent has to navigate.
38+
39+ .. literalinclude :: ../../../rllib/examples/curriculum/curriculum_learning.py
40+ :language: python
41+ :start-after: __curriculum_learning_example_env_options__
42+ :end-before: __END_curriculum_learning_example_env_options__
8943
90- ray.init()
91- tune.Tuner(
92- tune.with_resources(train_fn, resources = tune.PlacementGroupFactory(
93- [{" CPU" : 1 }, {" GPU" : num_gpus}] + [{" CPU" : 1 }] * num_env_runners
94- ),)
95- param_space = {
96- " num_gpus" : num_gpus,
97- " num_env_runners" : num_env_runners,
98- },
99- ).fit()
44+ Then, define the central piece controlling the curriculum, which is a custom callbacks class
45+ overriding the :py:meth: `~ray.rllib.algorithms.callbacks.Callbacks.on_train_result `.
10046
101- You could also use RLlib's callbacks API to update the environment on new training
102- results:
10347
10448.. TODO move to doc_code and make it use algo configs.
10549 .. code-block :: python
0 commit comments