Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] AveragingEpisodesController #89

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

maotto
Copy link
Contributor

@maotto maotto commented Mar 21, 2019

added an AveragingEpisodesController

  • allows to accumulate and average reward histories by function that is passed via feedback_averaging_function
    • in many cases, the default should be reasonable: sum up the reward history of an individual rollout, collect them in a list and use the median of these values as a final return
  • allows to prepare an environment for each repetition (e.g. seeding) to make results repeatable
  • does not support recording of trajectories and raw reward histories

maotto added 3 commits March 21, 2019 14:21
…ments and return the median of the returns; TODO: implement this nicely in a controller subclass
* does not support recording of trajectories and raw reward histories
* allows to accumulate and average reward histories by function that is
passed via feedback_averaging_function
* allows to prepare an environment for each repetition (e.g. seeding) to
make results repeatable
See base class "Controller" for details on usage.

Additional Parameters
----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more -


Additional Parameters
----------
num_repetitions_to_average : int, optional (default: 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually try to use n_ as an abbreviation for number.

if the environment is stochastic or specifically prepared via the
argument environment_preparation_function

feedback_averaging_function : function, optional (default: median_of_sums)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a callback, not just a function. It also does not have to be a function, it can be any callable.

Note that the number of feedbacks per rollout may vary.
See AveragingEpisodesController.median_of_sums (default) for an example

environment_preparation_function : function, optional (default: None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same applies here

self.record_inputs = False
self.record_outputs = False
self.record_feedbacks = False
self.accumulate_feedbacks = False # see feedback_averaging_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment does not really help

@AlexanderFabisch AlexanderFabisch changed the title AveragingEpisodesController [WIP] AveragingEpisodesController Mar 26, 2019
@AlexanderFabisch
Copy link
Contributor

@maotto any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants