Add support for MultiDiscrete/MultiBinary action spaces #5

araffin · 2020-05-09T11:13:18Z

distributions.py need to be updated (and maybe ppo/a2c) with MultiCategorical and Bernoulli distributions
the envs from identity_env.py should help to create tests

@rolandgvc is working on it

The text was updated successfully, but these errors were encountered:

araffin · 2020-05-09T16:36:30Z

Flattened seems the easiest and cleanest way, no?

Miffyli · 2020-05-09T17:15:31Z

Hmm where did the question go disappear? o: .I would like to comment on this after taking a good look at how it could be done. A multi-discrete distribution returning flattened output seems bit convoluted at first sight, but I will comment better when there is an example code suggestion.

rolandgvc · 2020-05-09T17:19:23Z

@Miffyli deleted it before @araffin answered, as I though it was obvious =) here is the suggestion:

class MultiCategoricalDistribution(Distribution):
    """
    MultiCategorical distribution for multi discrete actions.
    
    :param action_dims: ([int]) List of sizes of discrete action spaces.
    """
    def __init__(self, action_dims: [int]):
        super(MultiCategoricalDistribution, self).__init__()
        self.action_dims = action_dims
        self.distributions = None

    def proba_distribution_net(self, latent_dim: int) -> nn.Module:
        """
        Create the layer that represents the distribution:
        it will be the logits of the Categorical distribution.
        You can then get probabilities using a softmax.

        :param latent_dim: (int) Dimension of the last layer
            of the policy network (before the action layer)
        :return: (nn.Linear)
        """
        action_logits = nn.Linear(latent_dim, np.sum(self.action_dims))
        return action_logits 

    def proba_distribution(self, action_logits: th.Tensor) -> 'MultiCategoricalDistribution':
        reshaped_logits = action_logits.split(self.action_dims)
        self.distributions = [Categorical(logits=l) for l in reshaped_logits]
        return self

rolandgvc · 2020-05-09T18:43:59Z

Progress so far (tested):

class MultiCategoricalDistribution(Distribution):
    """
    MultiCategorical distribution for multi discrete actions.

    :param action_dims: ([int]) List of sizes of discrete action spaces.
    """
    def __init__(self, action_dims: [int]):
        super(MultiCategoricalDistribution, self).__init__()
        self.action_dims = action_dims
        self.distributions = None

    def proba_distribution_net(self, latent_dim: int) -> nn.Module:
        """
        Create the layer that represents the distribution:
        it will be the logits of the Categorical distribution.
        You can then get probabilities using a softmax.

        :param latent_dim: (int) Dimension of the last layer
            of the policy network (before the action layer)
        :return: (nn.Linear)
        """
        action_logits = nn.Linear(latent_dim, np.sum(self.action_dims))
        return action_logits 

    def proba_distribution(self, action_logits: th.Tensor) -> 'MultiCategoricalDistribution':
        reshaped_logits = th.split(action_logits, self.action_dims, dim=1)
        self.distributions = [Categorical(logits=l) for l in reshaped_logits]
        return self

    def mode(self) -> th.Tensor:
        return th.stack([th.argmax(d.probs, dim=1) for d in self.distributions])

    def sample(self) -> th.Tensor:
        return th.stack([d.sample() for d in self.distributions])

    def entropy(self) -> th.Tensor:
        return sum([d.entropy() for d in self.distributions])

    def actions_from_params(self, action_logits: th.Tensor,
                            deterministic: bool = False) -> th.Tensor:
        # Update the proba distribution
        self.proba_distribution(action_logits)
        return self.get_actions(deterministic=deterministic)

    def log_prob_from_params(self, action_logits: th.Tensor) -> Tuple[th.Tensor, th.Tensor]:
        actions = self.actions_from_params(action_logits)
        log_prob = self.log_prob(actions)
        return actions, log_prob

    def log_prob(self, actions: th.Tensor) -> th.Tensor:
        return sum(d.log_prob(x) for d, x in zip(self.distributions, th.unbind(actions)))

Let me know if you have any design suggestions @Miffyli @araffin

araffin · 2020-05-09T19:03:12Z

I think it would be better if you open a draft pull request ;)

rolandgvc · 2020-05-10T14:46:13Z

@araffin can we just use the .shape attribute for multi spaces here?

def get_obs_shape(observation_space: spaces.Space) -> Tuple[int, ...]:
    """
    Get the shape of the observation (useful for the buffers).

    :param observation_space: (spaces.Space)
    :return: (Tuple[int, ...])
    """
    if isinstance(observation_space, spaces.Box):
        return observation_space.shape
    elif isinstance(observation_space, spaces.Discrete):
        # Observation is an int
        return 1
    elif isinstance(observation_space, spaces.MultiDiscrete):
        return observation_space.shape
    elif isinstance(observation_space, spaces.MultiBinary):
        return observation_space.shape
    else:
        raise NotImplementedError()

araffin · 2020-05-10T19:32:05Z

@rolandgvc this is for another issue (this one: #4 ) no?

Looking at the source, this won't work as obs.shape is not defined for MultiBinary, and same for multi discrete

As Gym is not documented, I really recommend to read the source code.

Off-Policy State Dependent Exploration

Check test env in tests

araffin added the enhancement New feature or request label May 9, 2020

araffin added this to the v1.0 milestone May 9, 2020

araffin mentioned this issue May 9, 2020

Roadmap to Stable-Baselines3 V1.0 #1

Closed

42 tasks

rolandgvc mentioned this issue May 10, 2020

Support for MultiBinary / MultiDiscrete spaces #13

Merged

12 tasks

araffin closed this as completed in #13 May 18, 2020

Shunian-Chen pushed a commit to Shunian-Chen/AIPI530 that referenced this issue Nov 14, 2021

Merge pull request DLR-RM#5 from Antonin-Raffin/feat/td3-sde

8874b9d

Off-Policy State Dependent Exploration

araffin added a commit that referenced this issue Feb 11, 2023

Merge pull request #5 from DLR-RM/check_test_env

0431c7a

Check test env in tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for MultiDiscrete/MultiBinary action spaces #5

Add support for MultiDiscrete/MultiBinary action spaces #5

araffin commented May 9, 2020

araffin commented May 9, 2020

Miffyli commented May 9, 2020

rolandgvc commented May 9, 2020 •

edited

Loading

rolandgvc commented May 9, 2020 •

edited by araffin

Loading

araffin commented May 9, 2020

rolandgvc commented May 10, 2020 •

edited

Loading

araffin commented May 10, 2020

Add support for MultiDiscrete/MultiBinary action spaces #5

Add support for MultiDiscrete/MultiBinary action spaces #5

Comments

araffin commented May 9, 2020

araffin commented May 9, 2020

Miffyli commented May 9, 2020

rolandgvc commented May 9, 2020 • edited Loading

rolandgvc commented May 9, 2020 • edited by araffin Loading

araffin commented May 9, 2020

rolandgvc commented May 10, 2020 • edited Loading

araffin commented May 10, 2020

rolandgvc commented May 9, 2020 •

edited

Loading

rolandgvc commented May 9, 2020 •

edited by araffin

Loading

rolandgvc commented May 10, 2020 •

edited

Loading