rllib feature request: Custom action distributions #4895

mawright · 2019-05-30T02:32:30Z

In python/ray/rllib/models/catalog.py, we are able to use functions to register custom neural net models and preprocessors.
I'd like to also request the functionality to register custom action distributions that can be retrieved later during calls to get_action_dist with a "custom_action_dist" (for example) key in the config dict similar to the existing "custom_model" and "custom_preprocessor" keys.

ericl · 2019-05-30T05:33:21Z

This sounds like a good idea to me. Would you have time to help add this?

mawright · 2019-06-03T18:37:33Z

I wrote something in this branch: master...mawright:action_dist but there are some other hardcoded limitations that look like they get in the way of a truly general output distribution; in particular the choice to limit the action space to a single tensor dim here:

ray/python/ray/rllib/models/catalog.py

Lines 112 to 117 in b674c4a

    
           if len(action_space.shape) > 1: 
        
               raise ValueError( 
        
                   "Action space has multiple dimensions " 
        
                   "{}. ".format(action_space.shape) + 
        
                   "Consider reshaping this into a single dimension, " 
        
                   "using a Tuple action space, or the multi-agent API.")

and here

ray/python/ray/rllib/models/catalog.py

Lines 170 to 171 in b674c4a

    
           return tf.placeholder( 
        
               tf.float32, shape=(None, action_space.shape[0]), name="action")

. Are these hard constraints, or can e.g.,

return tf.placeholder( 
tf.float32, shape=(None, action_space.shape[0]), name="action")

be replaced with

return tf.placeholder( 
tf.float32, shape=(None, *action_space.shape), name="action")

Not restricting the input to the action distribution to have a single tensor ndim may also help interpretability, e.g., by having the input to the diagonal N-D Gaussian be [None, N, 2] so each's random variable's mean and variance are put in their own tensor slice.

Also, is there anything preventing allowing a neural net model to return more than one output tensor if the action distribution that gets the model's outputs is able to handle receiving either a single tensor or a list/tuple of tensors?

ericl · 2019-06-03T23:29:29Z

I don't think there's anything fundamental about not allowing Box action spaces, especially with a custom action distribution. In the first case (catalog.py:170), that error could be skipped in the case of a custom action distribution which could implement a Box space. We could also directly interpret a 2d-Box space as a Tuple space without too much trouble.

Re: the placeholder limitation, I think this one is a bit harder to remove. RLlib currently flattens the action representation to an array to simplify feeding into TensorFlow. Consider for example an action space of Tuple(Tuple(Discrete(1), Discrete(2)), Discrete(3)) -- there's no simple way to represent this other than flattening this into a tensor of shape (6,).

In summary, I think yes, multi-dimensional spaces can be supported, but the input tensor for the distribution has to be a flat array. The action dist class can internally unflatten / reshape as needed though.

Thanks for looking at this btw!

mawright · 2019-06-04T06:12:19Z

I think that a Tuple action space like you mentioned could have its parameters and outputs represented just by a tuple/list of tensors, and then numpy outputs of sess.run's of both can be fed back in with something like

feed_dict = dict(zip(placeholders, ndarrays))

Take a look at this notebook for some examples of what I am thinking with an example of a Tuple action space with wildly different ndims of each component: https://github.com/mawright/ray/blob/temp/action_spaces.ipynb
Sorry if it's a little sloppy, let me know if it doesn't make sense.

ericl · 2019-06-04T07:01:52Z

Sure. I think it would be quite a large change to RLlib though, and should be something considered separately from custom action distributions.

…

On Tue, Jun 4, 2019, 2:12 PM Matthew A. Wright ***@***.***> wrote: I think that a Tuple action space like you mentioned could have its parameters and outputs represented just by a tuple/list of tensors, and then numpy outputs of sess.run's of both can be fed back in with something like feed_dict = dict(zip(placeholders, ndarrays)) Take a look at this notebook for some examples of what I am thinking: https://github.com/mawright/ray/blob/temp/action_spaces.ipynb Sorry if it's a little sloppy let me know if it doesn't make sense. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4895?email_source=notifications&email_token=AAADUSTC243CBG5BTALHA7DPYYBVFA5CNFSM4HQ23LV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW3QY2Q#issuecomment-498535530>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAADUSWVLHWKOTNSGTDDWZLPYYBVFANCNFSM4HQ23LVQ> .

ericl · 2019-06-06T01:01:02Z

Here's an idea which may be simpler to implement: the action dist or model could return a list of tensors instead of a single one. When rllib sees this it automatically concatenates them.

mawright · 2019-06-07T19:47:34Z

Concatting multiple-tensor outputs would also need to make sure that the information of their original shapes and dtypes are also sent downstream so the action dist could know how to unpack it.

I have a use case where a distribution is valid for different shapes of the input parameters, but needs to behave differently for different shapes. I don't want to be more explicit about it on here because the application is currently under anonymous review, maybe we can meet in person when you are around campus? I actually happened to be meeting with Joey yesterday and he offered to introduce us but you didn't happen to be around at the time.

ericl · 2019-06-07T21:12:35Z

Sounds good, I followed up offline. I don't quite get the unpacking thing though, isn't the action space sufficient to define the tensor shapes?

bionicles · 2019-06-18T17:49:28Z

sometimes its good to use a variable action space bc box fails to initialize with shapes like (None, 16)

class Array(gym.Space):
    def __init__(
            self,
            shape,
            variance=1.,
            mean=0.,
            high=None,
            low=None,
            dtype=np.float32
            ):
        self.shape = shape
        self.dtype = dtype
        # sampling
        self.variance = variance
        self.mean = mean
        # constraints
        self.high = high
        self.low = low

    def sample(self):
        if self.shape is not None and None in self.shape:
            raise ValueError("cannot sample arrays with shape None")
        else:
            result = np.random.normal(self.mean, self.variance, self.shape)
            if self.high is not None or self.low is not None:
                result = np.clip(result, self.low, self.high)
            return result

this might screw up models which infer their shape from the obs/action space
might also screw up algorithms which require constant shapes...
but we have layers like (None, 16) in our model and it works OK with Adam optimizer

ericl · 2019-06-28T07:02:01Z

Hm, I'm not sure how we can support variable-length observation shapes. This may be possible with ragged tensors but seems like it could be quite involved. The best you can do right now is choose a big enough size and add padding as needed.

btw, @mawright any updates?

mawright · 2019-06-28T21:35:22Z

Sorry for the delay, I've been busy with some other items but I will get back to finishing the pull request soon.

mawright · 2019-07-08T23:53:58Z

Question about how the new ModelV2 is structured. Is the make_model function discussed here

ray/python/ray/rllib/policy/dynamic_tf_policy.py

Lines 72 to 75 in eeb67db

    
                       make_model (func): optional function that returns a ModelV2 object 
        
                           given (policy, obs_space, action_space, config). 
        
                           All policy variables should be created in this function. If not 
        
                           specified, a default model will be created.

meant to supersede the "custom_model" config option?

ericl · 2019-07-09T00:20:11Z

The intent was to support both, so it can be either configured or overriden with code.

…

On Mon, Jul 8, 2019, 4:54 PM Matthew A. Wright ***@***.***> wrote: Question about how the new ModelV2 is structured. Is the make_model function discussed here https://github.com/ray-project/ray/blob/eeb67db861e3f3c30c04301cd8ceaea2bef33342/python/ray/rllib/policy/dynamic_tf_policy.py#L72-L75 meant to supersede the "custom_model" config option? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4895?email_source=notifications&email_token=AAADUSWKITKD25DPYYGFXN3P6PHSJA5CNFSM4HQ23LV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZOVUOY#issuecomment-509434427>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAADUSVE5N54RM66IXNWR4LP6PHSJANCNFSM4HQ23LVQ> .

ericl added the feature request label May 30, 2019

ericl mentioned this issue Jun 6, 2019

[rllib] support autoregressive action distributions #4939

Closed

mawright mentioned this issue Jul 10, 2019

Custom action distributions #5164

Merged

richardliaw closed this as completed Oct 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rllib feature request: Custom action distributions #4895

rllib feature request: Custom action distributions #4895

mawright commented May 30, 2019

ericl commented May 30, 2019

mawright commented Jun 3, 2019 •

edited

Loading

ericl commented Jun 3, 2019 •

edited

Loading

mawright commented Jun 4, 2019 •

edited

Loading

ericl commented Jun 4, 2019 via email

ericl commented Jun 6, 2019

mawright commented Jun 7, 2019 •

edited

Loading

ericl commented Jun 7, 2019

bionicles commented Jun 18, 2019 •

edited

Loading

ericl commented Jun 28, 2019

mawright commented Jun 28, 2019

mawright commented Jul 8, 2019

ericl commented Jul 9, 2019 via email

rllib feature request: Custom action distributions #4895

rllib feature request: Custom action distributions #4895

Comments

mawright commented May 30, 2019

ericl commented May 30, 2019

mawright commented Jun 3, 2019 • edited Loading

ericl commented Jun 3, 2019 • edited Loading

mawright commented Jun 4, 2019 • edited Loading

ericl commented Jun 4, 2019 via email

ericl commented Jun 6, 2019

mawright commented Jun 7, 2019 • edited Loading

ericl commented Jun 7, 2019

bionicles commented Jun 18, 2019 • edited Loading

ericl commented Jun 28, 2019

mawright commented Jun 28, 2019

mawright commented Jul 8, 2019

ericl commented Jul 9, 2019 via email

mawright commented Jun 3, 2019 •

edited

Loading

ericl commented Jun 3, 2019 •

edited

Loading

mawright commented Jun 4, 2019 •

edited

Loading

mawright commented Jun 7, 2019 •

edited

Loading

bionicles commented Jun 18, 2019 •

edited

Loading