New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] AveragingEpisodesController #89

Open

maotto wants to merge 3 commits into rock-learning:master from maotto:average_controller

Contributor

maotto commented Mar 21, 2019

added an AveragingEpisodesController

allows to accumulate and average reward histories by function that is passed via feedback_averaging_function
- in many cases, the default should be reasonable: sum up the reward history of an individual rollout, collect them in a list and use the median of these values as a final return
allows to prepare an environment for each repetition (e.g. seeding) to make results repeatable
does not support recording of trajectories and raw reward histories

maotto added 3 commits

March 21, 2019 14:21


          a simple hack to evaluate a behavior in 10 differently seeded environ…

8418b3b

…ments and return the median of the returns; TODO: implement this nicely in a controller subclass


          AveragingEpisodesController with hardcoded merging func. and repetitions

78bd124


          add AveragingEpisodesController with documentation and defaults;

c14e40d

* does not support recording of trajectories and raw reward histories
* allows to accumulate and average reward histories by function that is
passed via feedback_averaging_function
* allows to prepare an environment for each repetition (e.g. seeding) to
make results repeatable

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                  See base class "Controller" for details on usage.
+                  Additional Parameters
+                  ----------

Contributor

AlexanderFabisch Mar 22, 2019

more -

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                  Additional Parameters
+                  ----------
+                  num_repetitions_to_average : int, optional (default: 10)

Contributor

AlexanderFabisch Mar 22, 2019

we usually try to use n_ as an abbreviation for number.

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      if the environment is stochastic or specifically prepared via the
+                      argument environment_preparation_function
+                  feedback_averaging_function : function, optional (default: median_of_sums)

Contributor

AlexanderFabisch Mar 22, 2019

It is a callback, not just a function. It also does not have to be a function, it can be any callable.

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      Note that the number of feedbacks per rollout may vary.
+                      See AveragingEpisodesController.median_of_sums (default) for an example
+                  environment_preparation_function : function, optional (default: None)

Contributor

AlexanderFabisch Mar 22, 2019

same applies here

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      self.record_inputs = False
+                      self.record_outputs = False
+                      self.record_feedbacks = False
+                      self.accumulate_feedbacks = False  # see feedback_averaging_function

Contributor

AlexanderFabisch Mar 22, 2019

this comment does not really help

AlexanderFabisch changed the title ~~AveragingEpisodesController~~ [WIP] AveragingEpisodesController

Contributor

AlexanderFabisch commented May 27, 2019

@maotto any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet