-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MultiDiscrete/MultiBinary action spaces #5
Comments
Flattened seems the easiest and cleanest way, no? |
Hmm where did the question go disappear? o: .I would like to comment on this after taking a good look at how it could be done. A multi-discrete distribution returning flattened output seems bit convoluted at first sight, but I will comment better when there is an example code suggestion. |
@Miffyli deleted it before @araffin answered, as I though it was obvious =) here is the suggestion:
|
Progress so far (tested): class MultiCategoricalDistribution(Distribution):
"""
MultiCategorical distribution for multi discrete actions.
:param action_dims: ([int]) List of sizes of discrete action spaces.
"""
def __init__(self, action_dims: [int]):
super(MultiCategoricalDistribution, self).__init__()
self.action_dims = action_dims
self.distributions = None
def proba_distribution_net(self, latent_dim: int) -> nn.Module:
"""
Create the layer that represents the distribution:
it will be the logits of the Categorical distribution.
You can then get probabilities using a softmax.
:param latent_dim: (int) Dimension of the last layer
of the policy network (before the action layer)
:return: (nn.Linear)
"""
action_logits = nn.Linear(latent_dim, np.sum(self.action_dims))
return action_logits
def proba_distribution(self, action_logits: th.Tensor) -> 'MultiCategoricalDistribution':
reshaped_logits = th.split(action_logits, self.action_dims, dim=1)
self.distributions = [Categorical(logits=l) for l in reshaped_logits]
return self
def mode(self) -> th.Tensor:
return th.stack([th.argmax(d.probs, dim=1) for d in self.distributions])
def sample(self) -> th.Tensor:
return th.stack([d.sample() for d in self.distributions])
def entropy(self) -> th.Tensor:
return sum([d.entropy() for d in self.distributions])
def actions_from_params(self, action_logits: th.Tensor,
deterministic: bool = False) -> th.Tensor:
# Update the proba distribution
self.proba_distribution(action_logits)
return self.get_actions(deterministic=deterministic)
def log_prob_from_params(self, action_logits: th.Tensor) -> Tuple[th.Tensor, th.Tensor]:
actions = self.actions_from_params(action_logits)
log_prob = self.log_prob(actions)
return actions, log_prob
def log_prob(self, actions: th.Tensor) -> th.Tensor:
return sum(d.log_prob(x) for d, x in zip(self.distributions, th.unbind(actions))) Let me know if you have any design suggestions @Miffyli @araffin |
I think it would be better if you open a draft pull request ;) |
@araffin can we just use the .shape attribute for multi spaces here?
|
@rolandgvc this is for another issue (this one: #4 ) no? Looking at the source, this won't work as obs.shape is not defined for MultiBinary, and same for multi discrete As Gym is not documented, I really recommend to read the source code. |
Off-Policy State Dependent Exploration
distributions.py
need to be updated (and maybe ppo/a2c) withMultiCategorical
andBernoulli
distributionsthe envs from
identity_env.py
should help to create tests@rolandgvc is working on it
The text was updated successfully, but these errors were encountered: