Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rllib feature request: Custom action distributions #4895

Closed
mawright opened this issue May 30, 2019 · 13 comments
Closed

rllib feature request: Custom action distributions #4895

mawright opened this issue May 30, 2019 · 13 comments

Comments

@mawright
Copy link
Contributor

In python/ray/rllib/models/catalog.py, we are able to use functions to register custom neural net models and preprocessors.
I'd like to also request the functionality to register custom action distributions that can be retrieved later during calls to get_action_dist with a "custom_action_dist" (for example) key in the config dict similar to the existing "custom_model" and "custom_preprocessor" keys.

@ericl
Copy link
Contributor

ericl commented May 30, 2019

This sounds like a good idea to me. Would you have time to help add this?

@mawright
Copy link
Contributor Author

mawright commented Jun 3, 2019

I wrote something in this branch: master...mawright:action_dist but there are some other hardcoded limitations that look like they get in the way of a truly general output distribution; in particular the choice to limit the action space to a single tensor dim here:

if len(action_space.shape) > 1:
raise ValueError(
"Action space has multiple dimensions "
"{}. ".format(action_space.shape) +
"Consider reshaping this into a single dimension, "
"using a Tuple action space, or the multi-agent API.")
and here
return tf.placeholder(
tf.float32, shape=(None, action_space.shape[0]), name="action")
. Are these hard constraints, or can e.g.,

return tf.placeholder( 
tf.float32, shape=(None, action_space.shape[0]), name="action")

be replaced with

return tf.placeholder( 
tf.float32, shape=(None, *action_space.shape), name="action")

Not restricting the input to the action distribution to have a single tensor ndim may also help interpretability, e.g., by having the input to the diagonal N-D Gaussian be [None, N, 2] so each's random variable's mean and variance are put in their own tensor slice.

Also, is there anything preventing allowing a neural net model to return more than one output tensor if the action distribution that gets the model's outputs is able to handle receiving either a single tensor or a list/tuple of tensors?

@ericl
Copy link
Contributor

ericl commented Jun 3, 2019

I don't think there's anything fundamental about not allowing Box action spaces, especially with a custom action distribution. In the first case (catalog.py:170), that error could be skipped in the case of a custom action distribution which could implement a Box space. We could also directly interpret a 2d-Box space as a Tuple space without too much trouble.

Re: the placeholder limitation, I think this one is a bit harder to remove. RLlib currently flattens the action representation to an array to simplify feeding into TensorFlow. Consider for example an action space of Tuple(Tuple(Discrete(1), Discrete(2)), Discrete(3)) -- there's no simple way to represent this other than flattening this into a tensor of shape (6,).

In summary, I think yes, multi-dimensional spaces can be supported, but the input tensor for the distribution has to be a flat array. The action dist class can internally unflatten / reshape as needed though.

Thanks for looking at this btw!

@mawright
Copy link
Contributor Author

mawright commented Jun 4, 2019

I think that a Tuple action space like you mentioned could have its parameters and outputs represented just by a tuple/list of tensors, and then numpy outputs of sess.run's of both can be fed back in with something like

feed_dict = dict(zip(placeholders, ndarrays))

Take a look at this notebook for some examples of what I am thinking with an example of a Tuple action space with wildly different ndims of each component: https://github.com/mawright/ray/blob/temp/action_spaces.ipynb
Sorry if it's a little sloppy, let me know if it doesn't make sense.

@ericl
Copy link
Contributor

ericl commented Jun 4, 2019 via email

@ericl
Copy link
Contributor

ericl commented Jun 6, 2019

Here's an idea which may be simpler to implement: the action dist or model could return a list of tensors instead of a single one. When rllib sees this it automatically concatenates them.

@mawright
Copy link
Contributor Author

mawright commented Jun 7, 2019

Concatting multiple-tensor outputs would also need to make sure that the information of their original shapes and dtypes are also sent downstream so the action dist could know how to unpack it.

I have a use case where a distribution is valid for different shapes of the input parameters, but needs to behave differently for different shapes. I don't want to be more explicit about it on here because the application is currently under anonymous review, maybe we can meet in person when you are around campus? I actually happened to be meeting with Joey yesterday and he offered to introduce us but you didn't happen to be around at the time.

@ericl
Copy link
Contributor

ericl commented Jun 7, 2019

Sounds good, I followed up offline. I don't quite get the unpacking thing though, isn't the action space sufficient to define the tensor shapes?

@bionicles
Copy link

bionicles commented Jun 18, 2019

sometimes its good to use a variable action space bc box fails to initialize with shapes like (None, 16)

class Array(gym.Space):
    def __init__(
            self,
            shape,
            variance=1.,
            mean=0.,
            high=None,
            low=None,
            dtype=np.float32
            ):
        self.shape = shape
        self.dtype = dtype
        # sampling
        self.variance = variance
        self.mean = mean
        # constraints
        self.high = high
        self.low = low

    def sample(self):
        if self.shape is not None and None in self.shape:
            raise ValueError("cannot sample arrays with shape None")
        else:
            result = np.random.normal(self.mean, self.variance, self.shape)
            if self.high is not None or self.low is not None:
                result = np.clip(result, self.low, self.high)
            return result

this might screw up models which infer their shape from the obs/action space
might also screw up algorithms which require constant shapes...
but we have layers like (None, 16) in our model and it works OK with Adam optimizer

@ericl
Copy link
Contributor

ericl commented Jun 28, 2019

Hm, I'm not sure how we can support variable-length observation shapes. This may be possible with ragged tensors but seems like it could be quite involved. The best you can do right now is choose a big enough size and add padding as needed.

btw, @mawright any updates?

@mawright
Copy link
Contributor Author

Sorry for the delay, I've been busy with some other items but I will get back to finishing the pull request soon.

@mawright
Copy link
Contributor Author

mawright commented Jul 8, 2019

Question about how the new ModelV2 is structured. Is the make_model function discussed here

make_model (func): optional function that returns a ModelV2 object
given (policy, obs_space, action_space, config).
All policy variables should be created in this function. If not
specified, a default model will be created.
meant to supersede the "custom_model" config option?

@ericl
Copy link
Contributor

ericl commented Jul 9, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants