Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Environment] Petting zoo #1471

Merged
merged 73 commits into from
Sep 14, 2023
Merged

[Environment] Petting zoo #1471

merged 73 commits into from
Sep 14, 2023

Conversation

matteobettini
Copy link
Contributor

@matteobettini matteobettini commented Aug 22, 2023

Depends on #1421 and #1462

PettingZoo environments wrapper

This PR proposes a wrapper for PettingZoo environments.
In PettingZoo there are two types of environments:

  • turn based AECEnv where only one agent acts at each step
  • parallel ParallelEnv where at each steps all the agents act at each step

In this PR we try to wrap both cases under the same class.

Users can construct it using the task name and then can set the parallel argument according to the version of the environment they want to wrap.

For example:

 >>> env = PettingZooEnv(
  ...     task="pistonball_v6",
  ...     parallel=True, # True if you want Parallel
  ...     **kwargs,
  ... )

cc @MarkHaoxiang

vmoens and others added 27 commits July 27, 2023 16:00
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts:
#	test/mocking_classes.py
#	test/test_env.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
This reverts commit e8e410e.
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2023
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
matteobettini and others added 2 commits August 30, 2023 17:54
Signed-off-by: Matteo Bettini <[email protected]>
@vmoens vmoens added the enhancement New feature or request label Aug 30, 2023
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, some early comments while waiting for ActionMask to be integrated

Comment on lines 17 to 26
IMPORT_ERR = None
try:
import pettingzoo

_has_pettingzoo = True

except ImportError as err:
_has_pettingzoo = False
IMPORT_ERR = err

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use local imports (if possible) and do _has_pettingzoo = importlib.util.find_spec("pettingzoo") is not None to reduce the time it takes to load the lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean local imports?
i have updated to the spec finding code proposed

Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

Thanks for reviewing, I have updated the changes requested.

One thing that I wanted to discuss from this PR is the masking.
Here I am suing the "action_mask" with essentially 2 functions:

  • reading the "action_mask" coming from pettingzoo which informs on available actions. In this case I also call update_mask on the spec (since it is just limiting the available actions but some actions will be available)
  • setting it to all False for agents which are not supposed to act. In this case I am NOT feeding it to the spec (since an action mask of all False is not permitted)

so essentially when you want to use a turn-based (AEC) env from pettingzoo for training you will have a key "action_mask" which could be all Falses for agents not acting or partially Falses for agent acting with limited actions.

Now, my question is, how do we deal with this in training? cause if you feed it straight to a masked distribution it will cause problems for non acting agents.

to note also is that grouping is still available in AEC envs so you can have

env = PettingZooEnv(
   task="tictactoe_v3",
   parallel=False,
  use_action_mask=True, # Must use it since one player plays at a time
)
env.rollout(10)
TensorDict(
    fields={
        next: TensorDict(
            fields={
                player: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([9, 2, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([9, 2]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([9, 2]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([9]),
            device=cpu,
            is_shared=False),
        player: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([9, 2, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([9, 2]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([9, 2]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([9]),
    device=cpu,
    is_shared=False)

where by default player_0 and player_1 will be grouped toghether. here if you index along the dimension 2 of the action mask you will see that when it has some true for one agent, the other has all falses

or you can have

env = PettingZooEnv(
   task="tictactoe_v3",
  parallel=False,
  use_action_mask=True, # Must use it since one player plays at a time
  group_map=MarlGroupMapType.ONE_GROUP_PER_AGENT,
   # This time let's split the players (even though since they are both named "player_i" the default would group them together)
 )
env.rollout(10)
Out[3]: 
TensorDict(
    fields={
        next: TensorDict(
            fields={
                player_1: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([8]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False),
                player_2: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([8]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_1: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_2: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([8]),
    device=cpu,
    is_shared=False)

where now the action_mask in each group refers to a single agent

the question is, what does a training script look like for these envs? since the mask can be all falses for some agents

@vmoens
Copy link
Contributor

vmoens commented Sep 1, 2023

I'm not sure how to answer this without assuming anything about what the loss module does. For instance, with the last tensordict you displayed, what should we assume the model looks like?

IIUC petting zoo can give you action masks that are all False, meaning no action should be taken, right?
From my point of view there is a confusion between masking actions and masking agents. The action_mask is used as a proxy for the second, but maybe we should explicitly register an entry that says whether an agent is valid or not:

TensorDict(
    fields={
        next: TensorDict(...),
        player_1: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_2: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([8]),
    device=cpu,
    is_shared=False)

Another thing we could consider is that env.step does not always returns a full tensordict but only the leaves that are meaningful. fake_tensordict will still return the full data and can be used to build your replay buffer.

@matteobettini
Copy link
Contributor Author

matteobettini commented Sep 1, 2023

Ok makes sense, we can separate agent masks from action masks to improve clarity and differenctiate semantics, I'll do that.

Regarding the step outputting partial data i would prefer to not do that since there are a lot of compnents relying on data not changing structure over time.

Essentially we can do what I am doing now and build outputs from zeroed specs to make sure it is consistent

# Conflicts:
#	test/test_specs.py
#	torchrl/data/tensor_specs.py
#	torchrl/envs/transforms/transforms.py
#	torchrl/envs/utils.py
#	torchrl/modules/distributions/discrete.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts:
#	torchrl/envs/libs/vmas.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts:
#	.github/unittest/linux_libs/scripts_pettingzoo/environment.yml
#	.github/unittest/linux_libs/scripts_pettingzoo/install.sh
#	.github/unittest/linux_libs/scripts_pettingzoo/post_process.sh
#	.github/unittest/linux_libs/scripts_pettingzoo/run-clang-format.py
#	.github/unittest/linux_libs/scripts_pettingzoo/run_test.sh
#	.github/unittest/linux_libs/scripts_pettingzoo/setup_env.sh
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Let's remove global imports

Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

done

@vmoens vmoens merged commit a73428b into pytorch:main Sep 14, 2023
@matteobettini matteobettini deleted the petting-zoo branch September 14, 2023 14:19
albertbou92 pushed a commit to PyTorchRL/rl that referenced this pull request Sep 18, 2023
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: vmoens <[email protected]>
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: vmoens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants