PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

BaLinuss · 2024-03-18T14:11:42Z

I found a possible bug/unwanted behaviour when I wanted to train a PPOAgent on TicTacToe with Masking.

In the file agents/ppo/ppo_policy.py on line 237, time_step is first normalized, this observation is then fed into the _actor_network. However, if the actual ActorDistributionNetwork is wrapped inside a MaskSplitterNetwork, the observation at this point in time, when the function below is called, contains the mask which leads to the mask being normalized. The fact, that the dtype of the mask is either int or bool, the normalization and rounding afterwards leads to wrong masks applied at the ActorDistributionNetwork.

In the case of my TicTacToe Environment, the masking was simply wrong as soon as one training-step was applied and the normalization function updated.

This should

either be properly documented if it is wanted behaviour (that you cannot use masks when observation normalization is turned on with the PPOAgent (which it is by default)),
or fixed, such that the mask is excluded from the normalization.

  def _apply_actor_network(self, time_step, policy_state, training=False):
    observation = time_step.observation
    if self._observation_normalizer:
      observation = self._observation_normalizer.normalize(observation)

    return self._actor_network(
        observation,
        time_step.step_type,
        network_state=policy_state,
        training=training,
    )

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

BaLinuss commented Mar 18, 2024

PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

Comments

BaLinuss commented Mar 18, 2024