[Environment] Petting zoo #1471

matteobettini · 2023-08-22T15:06:05Z

Depends on #1421 and #1462

PettingZoo environments wrapper

This PR proposes a wrapper for PettingZoo environments.
In PettingZoo there are two types of environments:

turn based AECEnv where only one agent acts at each step
parallel ParallelEnv where at each steps all the agents act at each step

In this PR we try to wrap both cases under the same class.

Users can construct it using the task name and then can set the parallel argument according to the version of the environment they want to wrap.

For example:

 >>> env = PettingZooEnv(
  ...     task="pistonball_v6",
  ...     parallel=True, # True if you want Parallel
  ...     **kwargs,
  ... )

cc @MarkHaoxiang

Signed-off-by: Matteo Bettini <[email protected]>

# Conflicts: # test/mocking_classes.py # test/test_env.py

Signed-off-by: Matteo Bettini <[email protected]>

This reverts commit e8e410e.

Signed-off-by: Matteo Bettini <[email protected]>

vmoens

Great stuff, some early comments while waiting for ActionMask to be integrated

torchrl/envs/libs/vmas.py

torchrl/envs/libs/pettingzoo.py

vmoens · 2023-08-30T19:48:07Z

torchrl/envs/libs/pettingzoo.py

+IMPORT_ERR = None
+try:
+    import pettingzoo
+
+    _has_pettingzoo = True
+
+except ImportError as err:
+    _has_pettingzoo = False
+    IMPORT_ERR = err
+


We should use local imports (if possible) and do _has_pettingzoo = importlib.util.find_spec("pettingzoo") is not None to reduce the time it takes to load the lib

what do you mean local imports?
i have updated to the spec finding code proposed

torchrl/envs/libs/pettingzoo.py

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-08-31T08:35:53Z

Thanks for reviewing, I have updated the changes requested.

One thing that I wanted to discuss from this PR is the masking.
Here I am suing the "action_mask" with essentially 2 functions:

reading the "action_mask" coming from pettingzoo which informs on available actions. In this case I also call update_mask on the spec (since it is just limiting the available actions but some actions will be available)
setting it to all False for agents which are not supposed to act. In this case I am NOT feeding it to the spec (since an action mask of all False is not permitted)

so essentially when you want to use a turn-based (AEC) env from pettingzoo for training you will have a key "action_mask" which could be all Falses for agents not acting or partially Falses for agent acting with limited actions.

Now, my question is, how do we deal with this in training? cause if you feed it straight to a masked distribution it will cause problems for non acting agents.

to note also is that grouping is still available in AEC envs so you can have

env = PettingZooEnv(
   task="tictactoe_v3",
   parallel=False,
  use_action_mask=True, # Must use it since one player plays at a time
)
env.rollout(10)

TensorDict(
    fields={
        next: TensorDict(
            fields={
                player: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([9, 2, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([9, 2]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([9, 2]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([9]),
            device=cpu,
            is_shared=False),
        player: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([9, 2, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([9, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([9, 2, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([9, 2]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([9, 2]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([9]),
    device=cpu,
    is_shared=False)

where by default player_0 and player_1 will be grouped toghether. here if you index along the dimension 2 of the action mask you will see that when it has some true for one agent, the other has all falses

or you can have

env = PettingZooEnv(
   task="tictactoe_v3",
  parallel=False,
  use_action_mask=True, # Must use it since one player plays at a time
  group_map=MarlGroupMapType.ONE_GROUP_PER_AGENT,
   # This time let's split the players (even though since they are both named "player_i" the default would group them together)
 )
env.rollout(10)

Out[3]: 
TensorDict(
    fields={
        next: TensorDict(
            fields={
                player_1: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([8]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False),
                player_2: TensorDict(
                    fields={
                        action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                        done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                        observation: TensorDict(
                            fields={
                                observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                            batch_size=torch.Size([8]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_1: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_2: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([8]),
    device=cpu,
    is_shared=False)

where now the action_mask in each group refers to a single agent

the question is, what does a training script look like for these envs? since the mask can be all falses for some agents

vmoens · 2023-09-01T08:41:46Z

I'm not sure how to answer this without assuming anything about what the loss module does. For instance, with the last tensordict you displayed, what should we assume the model looks like?

IIUC petting zoo can give you action masks that are all False, meaning no action should be taken, right?
From my point of view there is a confusion between masking actions and masking agents. The action_mask is used as a proxy for the second, but maybe we should explicitly register an entry that says whether an agent is valid or not:

TensorDict(
    fields={
        next: TensorDict(...),
        player_1: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False),
        player_2: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
                mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
                action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
                done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
                    batch_size=torch.Size([8]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([8]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([8]),
    device=cpu,
    is_shared=False)

Another thing we could consider is that env.step does not always returns a full tensordict but only the leaves that are meaningful. fake_tensordict will still return the full data and can be used to build your replay buffer.

matteobettini · 2023-09-01T10:40:33Z

Ok makes sense, we can separate agent masks from action masks to improve clarity and differenctiate semantics, I'll do that.

Regarding the step outputting partial data i would prefer to not do that since there are a lot of compnents relying on data not changing structure over time.

Essentially we can do what I am doing now and build outputs from zeroed specs to make sure it is consistent

# Conflicts: # test/test_specs.py # torchrl/data/tensor_specs.py # torchrl/envs/transforms/transforms.py # torchrl/envs/utils.py # torchrl/modules/distributions/discrete.py

Signed-off-by: Matteo Bettini <[email protected]>

…masked distribution Signed-off-by: Matteo Bettini <[email protected]>

# Conflicts: # torchrl/envs/libs/vmas.py

Signed-off-by: Matteo Bettini <[email protected]>

# Conflicts: # .github/unittest/linux_libs/scripts_pettingzoo/environment.yml # .github/unittest/linux_libs/scripts_pettingzoo/install.sh # .github/unittest/linux_libs/scripts_pettingzoo/post_process.sh # .github/unittest/linux_libs/scripts_pettingzoo/run-clang-format.py # .github/unittest/linux_libs/scripts_pettingzoo/run_test.sh # .github/unittest/linux_libs/scripts_pettingzoo/setup_env.sh

vmoens

LGTM
Let's remove global imports

torchrl/envs/libs/pettingzoo.py

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-09-14T13:46:23Z

done

Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>

vmoens and others added 27 commits July 27, 2023 16:00

init

7d291a7

temp

4f597bf

Signed-off-by: Matteo Bettini <[email protected]>

action

49bc8e5

Signed-off-by: Matteo Bettini <[email protected]>

amend

36e1afb

Signed-off-by: Matteo Bettini <[email protected]>

Merge branch 'main' into allow-all-specs-compsite

eee5045

# Conflicts: # test/mocking_classes.py # test/test_env.py

reward spec

92f62a9

Signed-off-by: Matteo Bettini <[email protected]>

reward spec

5a77edd

Signed-off-by: Matteo Bettini <[email protected]>

done spec

1c334b1

Signed-off-by: Matteo Bettini <[email protected]>

done spec

7dc7548

Signed-off-by: Matteo Bettini <[email protected]>

fix

ba13680

Signed-off-by: Matteo Bettini <[email protected]>

rollout and step_mdp

2f548ea

Signed-off-by: Matteo Bettini <[email protected]>

fix

4054f61

Signed-off-by: Matteo Bettini <[email protected]>

amend

a772289

Signed-off-by: Matteo Bettini <[email protected]>

added todos for _reset

5baa353

Signed-off-by: Matteo Bettini <[email protected]>

docs

b6c1047

Signed-off-by: Matteo Bettini <[email protected]>

fix transforms

5f294d6

Signed-off-by: Matteo Bettini <[email protected]>

vec_env

e20298e

Signed-off-by: Matteo Bettini <[email protected]>

collector

873dbbf

Signed-off-by: Matteo Bettini <[email protected]>

treat done

4332984

Signed-off-by: Matteo Bettini <[email protected]>

amend

162e40f

Signed-off-by: Matteo Bettini <[email protected]>

amend

d9c0dbb

Signed-off-by: Matteo Bettini <[email protected]>

collectors and vec_env

451e9a9

Signed-off-by: Matteo Bettini <[email protected]>

TEMP

e8e410e

Signed-off-by: Matteo Bettini <[email protected]>

Revert "TEMP"

d3cbd5d

This reverts commit e8e410e.

amend

ea1fe3f

Signed-off-by: Matteo Bettini <[email protected]>

temp

91b0dbc

Signed-off-by: Matteo Bettini <[email protected]>

init

212c66a

Signed-off-by: Matteo Bettini <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2023

matteobettini added 2 commits August 22, 2023 16:09

state spec

a3ca5d6

Signed-off-by: Matteo Bettini <[email protected]>

update

4d986d9

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini and others added 2 commits August 30, 2023 17:54

Delete provas.py

02a87a3

amend

e205718

Signed-off-by: Matteo Bettini <[email protected]>

vmoens added the enhancement New feature or request label Aug 30, 2023

vmoens reviewed Aug 30, 2023

View reviewed changes

amend

5c9b731

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini added 3 commits September 4, 2023 09:30

Merge branch 'main' into petting-zoo

f40f65c

# Conflicts: # test/test_specs.py # torchrl/data/tensor_specs.py # torchrl/envs/transforms/transforms.py # torchrl/envs/utils.py # torchrl/modules/distributions/discrete.py

docs and test

fd998ba

Signed-off-by: Matteo Bettini <[email protected]>

update pettingzoo

57fcd3c

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini mentioned this pull request Sep 4, 2023

[Environment, Docs] SMACv2 and docs on action masking #1466

Merged

matteobettini added 9 commits September 4, 2023 11:10

separate masks

a84a7eb

Signed-off-by: Matteo Bettini <[email protected]>

typo

ecabe4e

Signed-off-by: Matteo Bettini <[email protected]>

fixes

10d35ca

Signed-off-by: Matteo Bettini <[email protected]>

temp

d48031c

Signed-off-by: Matteo Bettini <[email protected]>

change default categorical actions to true due to absence of one hot …

4f5a06b

…masked distribution Signed-off-by: Matteo Bettini <[email protected]>

Merge branch 'main' into petting-zoo

34c01f1

# Conflicts: # torchrl/envs/libs/vmas.py

merge

fe316d3

Signed-off-by: Matteo Bettini <[email protected]>

merge

bb224d0

Signed-off-by: Matteo Bettini <[email protected]>

vmoens approved these changes Sep 13, 2023

View reviewed changes

torchrl/envs/libs/pettingzoo.py Outdated Show resolved Hide resolved

matteobettini added 2 commits September 14, 2023 12:14

amend

d57cc26

Signed-off-by: Matteo Bettini <[email protected]>

amend

48ede79

Signed-off-by: Matteo Bettini <[email protected]>

vmoens approved these changes Sep 14, 2023

View reviewed changes

vmoens merged commit a73428b into pytorch:main Sep 14, 2023

matteobettini deleted the petting-zoo branch September 14, 2023 14:19

albertbou92 pushed a commit to PyTorchRL/rl that referenced this pull request Sep 18, 2023

[Environment] Petting zoo (pytorch#1471)

f455c7a

Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>

vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[Environment] Petting zoo (pytorch#1471)

3b30929

Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Environment] Petting zoo #1471

[Environment] Petting zoo #1471

matteobettini commented Aug 22, 2023 •

edited

Loading

vmoens left a comment

vmoens Aug 30, 2023

matteobettini Aug 31, 2023

matteobettini commented Aug 31, 2023

vmoens commented Sep 1, 2023 •

edited

Loading

matteobettini commented Sep 1, 2023 •

edited

Loading

vmoens left a comment

matteobettini commented Sep 14, 2023

[Environment] Petting zoo #1471

[Environment] Petting zoo #1471

Conversation

matteobettini commented Aug 22, 2023 • edited Loading

PettingZoo environments wrapper

vmoens left a comment

Choose a reason for hiding this comment

vmoens Aug 30, 2023

Choose a reason for hiding this comment

matteobettini Aug 31, 2023

Choose a reason for hiding this comment

matteobettini commented Aug 31, 2023

vmoens commented Sep 1, 2023 • edited Loading

matteobettini commented Sep 1, 2023 • edited Loading

vmoens left a comment

Choose a reason for hiding this comment

matteobettini commented Sep 14, 2023

matteobettini commented Aug 22, 2023 •

edited

Loading

vmoens commented Sep 1, 2023 •

edited

Loading

matteobettini commented Sep 1, 2023 •

edited

Loading