Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom action distributions #5164

Merged
merged 22 commits into from
Aug 6, 2019
Merged

Custom action distributions #5164

merged 22 commits into from
Aug 6, 2019

Conversation

mawright
Copy link
Contributor

What do these changes do?

Adds support for custom action distributions. They are registered to and looked up from the same global "ModelCatalog" class that are currently used for custom models and preprocessors.

There remain some issues with the handling of the action tensors in preexisting code that I will mention in a new Github issue.

Related issue number

#4895

Linter

  • [ x] I've run scripts/format.sh to lint the changes in this PR.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1592/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15282/
Test PASSed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think two more additions would make this feature more discoverable by users:

  • Add an end-to-end runnable example in rllib/examples/custom_action_dist.py
  • Update rllib-models.rst to include a section on custom action distributions, and also update rllib-examples.rst to link to the example script.

@@ -69,6 +72,25 @@ def sampled_action_prob(self):
"""Returns the log probability of the sampled action."""
return tf.exp(self.logp(self.sample_op))

@DeveloperAPI
@staticmethod
def parameter_shape_for_action_space(action_space, model_config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about required_model_output_size?

@@ -69,6 +72,25 @@ def sampled_action_prob(self):
"""Returns the log probability of the sampled action."""
return tf.exp(self.logp(self.sample_op))

@DeveloperAPI
@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@staticmethod
@classmethod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this suggestion. Why do you think this should be a class method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I guess staticmethod is fine, since you don't really need the class.

model_config (dict): Model's config dict (as defined in catalog.py)

Returns:
dist_dim (int or np.ndarray of ints): size of the required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dist_dim (int or np.ndarray of ints): size of the required
model_output_size (int or np.ndarray of ints): size of the required

"""

@DeveloperAPI
def __init__(self, inputs):
def __init__(self, inputs, model_config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid accidents where we forget to pass this, consider making it a required argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards making it required but this may break other users' code.

Copy link
Contributor

@ericl ericl Jul 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is DeveloperAPI I would say it's ok to err on the side of avoiding bugs vs backwards compatibility.

@staticmethod
@override(ActionDistribution)
def parameter_shape_for_action_space(action_space, model_config=None):
return action_space.shape[0] * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.product(action_space.shape)? here and elsewhere?

return partial(MultiCategorical, input_lens=action_space.nvec), \
int(sum(action_space.nvec))

return dist, dist.parameter_shape_for_action_space(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like now you could simplify this to return just dist (or this cleanup can be done later).

@ericl ericl self-assigned this Jul 10, 2019
@mawright
Copy link
Contributor Author

Will probably be another day or two before I get a chance to work on the stuff you mentioned.

@ericl
Copy link
Contributor

ericl commented Jul 24, 2019

@mawright any updates? Let us know if you need help.

@mawright
Copy link
Contributor Author

@mawright any updates? Let us know if you need help.

Sorry, have been busy finishing my dissertation the last few weeks and have a busy week ahead of me with neurips reviewer responses. This is still on my to do list.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15779/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15892/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15893/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15898/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15912/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15923/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15926/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15943/
Test FAILed.

@mawright
Copy link
Contributor Author

mawright commented Aug 4, 2019

The Travis check said this commit errored on a Python 2.7 build but I can't figure out why. From what I can tell the error is occurring in a C++ file: https://travis-ci.com/ray-project/ray/jobs/222169300#L1427

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ericl
Copy link
Contributor

ericl commented Aug 5, 2019

The travis tests look ok to me, but not sure if jenkins is fully passing still. Going to wait for the latest build before merging.

@ericl
Copy link
Contributor

ericl commented Aug 5, 2019

jenkins retest this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16008/
Test PASSed.

@ericl ericl merged commit e3c9f7e into ray-project:master Aug 6, 2019
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16042/
Test PASSed.

edoakes pushed a commit to edoakes/ray that referenced this pull request Aug 9, 2019
* custom action dist wip

* Test case for custom action dist

* ActionDistribution.get_parameter_shape_for_action_space pattern

* Edit exception message to also suggest using a custom action distribution

* Clean up ModelCatalog.get_action_dist

* Pass model config to ActionDistribution constructors

* Update custom action distribution test case

* Name fix

* Autoformatter

* parameter shape static methods for torch distributions

* Fix docstring

* Generalize fake array for graph initialization

* Fix action dist constructors

* Correct parameter shape static methods for multicategorical and gaussian

* Make suggested changes to custom action dist's

* Correct instances of not passing model config to action dist

* Autoformatter

* fix tuple distribution constructor

* bugfix
@mawright mawright deleted the action_dist branch August 22, 2019 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants