-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Environment] Petting zoo #1471
Conversation
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # test/mocking_classes.py # test/test_env.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
This reverts commit e8e410e.
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, some early comments while waiting for ActionMask to be integrated
torchrl/envs/libs/pettingzoo.py
Outdated
IMPORT_ERR = None | ||
try: | ||
import pettingzoo | ||
|
||
_has_pettingzoo = True | ||
|
||
except ImportError as err: | ||
_has_pettingzoo = False | ||
IMPORT_ERR = err | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use local imports (if possible) and do _has_pettingzoo = importlib.util.find_spec("pettingzoo") is not None
to reduce the time it takes to load the lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean local imports?
i have updated to the spec finding code proposed
Signed-off-by: Matteo Bettini <[email protected]>
Thanks for reviewing, I have updated the changes requested. One thing that I wanted to discuss from this PR is the masking.
so essentially when you want to use a turn-based (AEC) env from pettingzoo for training you will have a key "action_mask" which could be all Falses for agents not acting or partially Falses for agent acting with limited actions. Now, my question is, how do we deal with this in training? cause if you feed it straight to a masked distribution it will cause problems for non acting agents. to note also is that grouping is still available in AEC envs so you can have env = PettingZooEnv(
task="tictactoe_v3",
parallel=False,
use_action_mask=True, # Must use it since one player plays at a time
)
env.rollout(10)
where by default player_0 and player_1 will be grouped toghether. here if you index along the dimension 2 of the action mask you will see that when it has some true for one agent, the other has all falses or you can have env = PettingZooEnv(
task="tictactoe_v3",
parallel=False,
use_action_mask=True, # Must use it since one player plays at a time
group_map=MarlGroupMapType.ONE_GROUP_PER_AGENT,
# This time let's split the players (even though since they are both named "player_i" the default would group them together)
)
env.rollout(10)
where now the action_mask in each group refers to a single agent the question is, what does a training script look like for these envs? since the mask can be all falses for some agents |
I'm not sure how to answer this without assuming anything about what the loss module does. For instance, with the last tensordict you displayed, what should we assume the model looks like? IIUC petting zoo can give you action masks that are all False, meaning no action should be taken, right? TensorDict(
fields={
next: TensorDict(...),
player_1: TensorDict(
fields={
action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
observation: TensorDict(
fields={
observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
batch_size=torch.Size([8]),
device=cpu,
is_shared=False)},
batch_size=torch.Size([8]),
device=cpu,
is_shared=False),
player_2: TensorDict(
fields={
action: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.int64, is_shared=False),
mask: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False), <-- Here, same shape as TD
action_mask: Tensor(shape=torch.Size([8, 9]), device=cpu, dtype=torch.bool, is_shared=False),
done: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.bool, is_shared=False),
observation: TensorDict(
fields={
observation: Tensor(shape=torch.Size([8, 3, 3, 2]), device=cpu, dtype=torch.int8, is_shared=False)},
batch_size=torch.Size([8]),
device=cpu,
is_shared=False)},
batch_size=torch.Size([8]),
device=cpu,
is_shared=False)},
batch_size=torch.Size([8]),
device=cpu,
is_shared=False) Another thing we could consider is that |
Ok makes sense, we can separate agent masks from action masks to improve clarity and differenctiate semantics, I'll do that. Regarding the step outputting partial data i would prefer to not do that since there are a lot of compnents relying on data not changing structure over time. Essentially we can do what I am doing now and build outputs from zeroed specs to make sure it is consistent |
# Conflicts: # test/test_specs.py # torchrl/data/tensor_specs.py # torchrl/envs/transforms/transforms.py # torchrl/envs/utils.py # torchrl/modules/distributions/discrete.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
…masked distribution Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # torchrl/envs/libs/vmas.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # .github/unittest/linux_libs/scripts_pettingzoo/environment.yml # .github/unittest/linux_libs/scripts_pettingzoo/install.sh # .github/unittest/linux_libs/scripts_pettingzoo/post_process.sh # .github/unittest/linux_libs/scripts_pettingzoo/run-clang-format.py # .github/unittest/linux_libs/scripts_pettingzoo/run_test.sh # .github/unittest/linux_libs/scripts_pettingzoo/setup_env.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let's remove global imports
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
done |
Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
Depends on #1421 and #1462
PettingZoo environments wrapper
This PR proposes a wrapper for PettingZoo environments.
In PettingZoo there are two types of environments:
AECEnv
where only one agent acts at each stepParallelEnv
where at each steps all the agents act at each stepIn this PR we try to wrap both cases under the same class.
Users can construct it using the task name and then can set the
parallel
argument according to the version of the environment they want to wrap.For example:
cc @MarkHaoxiang