-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rllib feature request: Custom action distributions #4895
Comments
This sounds like a good idea to me. Would you have time to help add this? |
I wrote something in this branch: master...mawright:action_dist but there are some other hardcoded limitations that look like they get in the way of a truly general output distribution; in particular the choice to limit the action space to a single tensor dim here: ray/python/ray/rllib/models/catalog.py Lines 112 to 117 in b674c4a
ray/python/ray/rllib/models/catalog.py Lines 170 to 171 in b674c4a
return tf.placeholder(
tf.float32, shape=(None, action_space.shape[0]), name="action") be replaced with return tf.placeholder(
tf.float32, shape=(None, *action_space.shape), name="action") Not restricting the input to the action distribution to have a single tensor ndim may also help interpretability, e.g., by having the input to the diagonal N-D Gaussian be [None, N, 2] so each's random variable's mean and variance are put in their own tensor slice. Also, is there anything preventing allowing a neural net model to return more than one output tensor if the action distribution that gets the model's outputs is able to handle receiving either a single tensor or a list/tuple of tensors? |
I don't think there's anything fundamental about not allowing Box action spaces, especially with a custom action distribution. In the first case (catalog.py:170), that error could be skipped in the case of a custom action distribution which could implement a Re: the placeholder limitation, I think this one is a bit harder to remove. RLlib currently flattens the action representation to an array to simplify feeding into TensorFlow. Consider for example an action space of Tuple(Tuple(Discrete(1), Discrete(2)), Discrete(3)) -- there's no simple way to represent this other than flattening this into a tensor of shape (6,). In summary, I think yes, multi-dimensional spaces can be supported, but the input tensor for the distribution has to be a flat array. The action dist class can internally unflatten / reshape as needed though. Thanks for looking at this btw! |
I think that a Tuple action space like you mentioned could have its parameters and outputs represented just by a tuple/list of tensors, and then numpy outputs of sess.run's of both can be fed back in with something like feed_dict = dict(zip(placeholders, ndarrays)) Take a look at this notebook for some examples of what I am thinking with an example of a Tuple action space with wildly different ndims of each component: https://github.com/mawright/ray/blob/temp/action_spaces.ipynb |
Sure. I think it would be quite a large change to RLlib though, and should
be something considered separately from custom action distributions.
…On Tue, Jun 4, 2019, 2:12 PM Matthew A. Wright ***@***.***> wrote:
I think that a Tuple action space like you mentioned could have its
parameters and outputs represented just by a tuple/list of tensors, and
then numpy outputs of sess.run's of both can be fed back in with something
like
feed_dict = dict(zip(placeholders, ndarrays))
Take a look at this notebook for some examples of what I am thinking:
https://github.com/mawright/ray/blob/temp/action_spaces.ipynb
Sorry if it's a little sloppy let me know if it doesn't make sense.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4895?email_source=notifications&email_token=AAADUSTC243CBG5BTALHA7DPYYBVFA5CNFSM4HQ23LV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW3QY2Q#issuecomment-498535530>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAADUSWVLHWKOTNSGTDDWZLPYYBVFANCNFSM4HQ23LVQ>
.
|
Here's an idea which may be simpler to implement: the action dist or model could return a list of tensors instead of a single one. When rllib sees this it automatically concatenates them. |
Concatting multiple-tensor outputs would also need to make sure that the information of their original shapes and dtypes are also sent downstream so the action dist could know how to unpack it. I have a use case where a distribution is valid for different shapes of the input parameters, but needs to behave differently for different shapes. I don't want to be more explicit about it on here because the application is currently under anonymous review, maybe we can meet in person when you are around campus? I actually happened to be meeting with Joey yesterday and he offered to introduce us but you didn't happen to be around at the time. |
Sounds good, I followed up offline. I don't quite get the unpacking thing though, isn't the action space sufficient to define the tensor shapes? |
sometimes its good to use a variable action space bc box fails to initialize with shapes like (None, 16)
this might screw up models which infer their shape from the obs/action space |
Hm, I'm not sure how we can support variable-length observation shapes. This may be possible with ragged tensors but seems like it could be quite involved. The best you can do right now is choose a big enough size and add padding as needed. btw, @mawright any updates? |
Sorry for the delay, I've been busy with some other items but I will get back to finishing the pull request soon. |
Question about how the new ModelV2 is structured. Is the ray/python/ray/rllib/policy/dynamic_tf_policy.py Lines 72 to 75 in eeb67db
|
The intent was to support both, so it can be either configured or overriden
with code.
…On Mon, Jul 8, 2019, 4:54 PM Matthew A. Wright ***@***.***> wrote:
Question about how the new ModelV2 is structured. Is the make_model
function discussed here
https://github.com/ray-project/ray/blob/eeb67db861e3f3c30c04301cd8ceaea2bef33342/python/ray/rllib/policy/dynamic_tf_policy.py#L72-L75
meant to supersede the "custom_model" config option?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4895?email_source=notifications&email_token=AAADUSWKITKD25DPYYGFXN3P6PHSJA5CNFSM4HQ23LV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZOVUOY#issuecomment-509434427>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAADUSVE5N54RM66IXNWR4LP6PHSJANCNFSM4HQ23LVQ>
.
|
In python/ray/rllib/models/catalog.py, we are able to use functions to register custom neural net models and preprocessors.
I'd like to also request the functionality to register custom action distributions that can be retrieved later during calls to get_action_dist with a "custom_action_dist" (for example) key in the config dict similar to the existing "custom_model" and "custom_preprocessor" keys.
The text was updated successfully, but these errors were encountered: