[Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` #1462

matteobettini · 2023-08-15T08:00:34Z

Depends on #512

This PR is an important milestone in the journey to #1463.

It will also allow single-agent users to have more than one action, reward, and done keys.

This is very important when your agents has multiple action (e.g., some discrete, some continuous ).

Main design problem: multiple dones

One major design choice here though is that if we allow multiple done keys, we need a "_reset" key for each (same for "truncated" , "is_init" and all the other done-based keys).

This is because we need to independently reset each done (e.g., in multiagent we want to reset only some done agents).

So the question is, what is the best way to do this? because we can imagine having a done_spec like

done_spec = CompositeSpec{
     "nested_1": {
            "done": DiscreteSpec,
      }
      "nested_2": {
            "other_done": DiscreteSpec,
          
      }
       "done": DiscreteSpec,
}

With a spec of this type, we place a "_reset" key for each done present (and that has any trues).
Same will be valid for "truncated" and "is_init".

A _reset td will look like

reset_td = TensorDict{
     "nested_1": {
            "_reset": Tensor,
      }
      "nested_2": {
            "_reset": Tensor,
          
      }
       "_reset": Tensor,
}

This solution is already implemented in the PR

cc @hyerra

Signed-off-by: Matteo Bettini <[email protected]>

# Conflicts: # test/mocking_classes.py # test/test_env.py

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-08-15T16:25:10Z

Up to here benchmark runs have shown all performance has been maintained

Signed-off-by: Matteo Bettini <[email protected]>

hyerra · 2023-08-16T19:25:22Z

This looks great! I have a few questions though:

Is there any flexibility on when an agent that died gets reset? In single-agent situations, it's more straightforward since you reset once the only agent in the game dies. However, in multi-agent games, there might be cases when we don't want to reset the agent immediately. Like for instance maybe we only reset after all agents of a behavior die, or we reset the game after all agents in the game have died. For the first situation, it might be beneficial to think of competitive games where the game keeps progressing until one team loses.
I might be a little confused, but just to make sure I understand the purpose of multiple done keys per agent, would this be used in situations where an agent can be partially done? So for example, let's say an agent has 2 actions and the done corresponding to the first action is True and the done corresponding to the second action is False. In this scenario would we want to reset the agent completely, or mask the first action and only allow the second action to be set, or something else maybe?

matteobettini · 2023-08-17T08:50:03Z

Is there any flexibility on when an agent that died gets reset? In single-agent situations, it's more straightforward since you reset once the only agent in the game dies. However, in multi-agent games, there might be cases when we don't want to reset the agent immediately. Like for instance maybe we only reset after all agents of a behavior die, or we reset the game after all agents in the game have died. For the first situation, it might be beneficial to think of competitive games where the game keeps progressing until one team loses.

Basically the idea is that at every step when some reset flag will be true, _reset() in the env will be called and given as input the _reset flags. If your game does not want to actually reset the game when only some resets are true you can defo do that in the _reset() logic of your env. The only problem I see is that currently torchrl checks that no "done" is true after a reset, but that should be removed anyway.

The other possibility is that you only put the dones you catually need in your done spec. i.e. if your agent can be done, but you want to reset only when some groups are done, than your done spec should have group granularity and not agent granularity.

2. I might be a little confused, but just to make sure I understand the purpose of multiple done keys per agent, would this be used in situations where an agent can be partially done? So for example, let's say an agent has 2 actions and the done corresponding to the first action is True and the done corresponding to the second action is False. In this scenario would we want to reset the agent completely, or mask the first action and only allow the second action to be set, or something else maybe?

That snippet i posted is just an example pushing the flexibility of a composite done spec to the extreme. I do not see a use for that in multiagent. In multiagent we suggest to stick to what is outlined in #1463

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-08-17T16:23:03Z

@vmoens @hyerra i updated the description to illustrate how multiple _resets work

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-08-18T10:44:22Z

Benchmarks up to here

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1723s	0.1693s	5.9062 Ops/s	5.8633 Ops/s	$\color{#35bf28}+0.73\%$
test_sync	0.1014s	88.3443ms	11.3194 Ops/s	10.0705 Ops/s	$\textbf{\color{#35bf28}+12.40\%}$
test_async	0.2583s	86.9081ms	11.5064 Ops/s	11.1518 Ops/s	$\color{#35bf28}+3.18\%$
test_simple	0.7965s	0.7169s	1.3948 Ops/s	1.3627 Ops/s	$\color{#35bf28}+2.36\%$
test_transformed	2.0009s	1.9152s	0.5221 Ops/s	0.5098 Ops/s	$\color{#35bf28}+2.43\%$
test_serial	2.3078s	2.2196s	0.4505 Ops/s	0.4374 Ops/s	$\color{#35bf28}+3.00\%$
test_parallel	1.9862s	1.8723s	0.5341 Ops/s	0.5114 Ops/s	$\color{#35bf28}+4.44\%$
test_step_mdp_speed[True-True-True-True-True]	0.1436ms	51.1794μs	19.5391 KOps/s	19.5441 KOps/s	$\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-True-True-True-False]	69.6040μs	29.0311μs	34.4458 KOps/s	34.8334 KOps/s	$\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-True-True-False-True]	73.8050μs	35.7194μs	27.9960 KOps/s	27.2185 KOps/s	$\color{#35bf28}+2.86\%$
test_step_mdp_speed[True-True-True-False-False]	0.1312ms	19.9777μs	50.0558 KOps/s	50.8298 KOps/s	$\color{#d91a1a}-1.52\%$
test_step_mdp_speed[True-True-False-True-True]	0.2524ms	53.2112μs	18.7930 KOps/s	18.4247 KOps/s	$\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-True-False-True-False]	0.2505ms	31.5389μs	31.7068 KOps/s	31.8836 KOps/s	$\color{#d91a1a}-0.55\%$
test_step_mdp_speed[True-True-False-False-True]	74.8040μs	37.7298μs	26.5043 KOps/s	25.8554 KOps/s	$\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-True-False-False-False]	0.1153ms	22.5023μs	44.4400 KOps/s	44.6484 KOps/s	$\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-False-True-True-True]	0.1495ms	54.1892μs	18.4539 KOps/s	18.2167 KOps/s	$\color{#35bf28}+1.30\%$
test_step_mdp_speed[True-False-True-True-False]	0.1885ms	32.9094μs	30.3864 KOps/s	30.0521 KOps/s	$\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-True-False-True]	0.1354ms	37.9034μs	26.3829 KOps/s	26.2531 KOps/s	$\color{#35bf28}+0.49\%$
test_step_mdp_speed[True-False-True-False-False]	99.6070μs	22.7218μs	44.0106 KOps/s	45.2454 KOps/s	$\color{#d91a1a}-2.73\%$
test_step_mdp_speed[True-False-False-True-True]	0.1357ms	56.9797μs	17.5501 KOps/s	17.1613 KOps/s	$\color{#35bf28}+2.27\%$
test_step_mdp_speed[True-False-False-True-False]	0.1368ms	35.1205μs	28.4734 KOps/s	28.6868 KOps/s	$\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-False-False-False-True]	0.2515ms	40.0787μs	24.9509 KOps/s	24.2065 KOps/s	$\color{#35bf28}+3.08\%$
test_step_mdp_speed[True-False-False-False-False]	85.7050μs	24.1722μs	41.3698 KOps/s	41.1810 KOps/s	$\color{#35bf28}+0.46\%$
test_step_mdp_speed[False-True-True-True-True]	0.1106ms	55.6137μs	17.9812 KOps/s	17.8849 KOps/s	$\color{#35bf28}+0.54\%$
test_step_mdp_speed[False-True-True-True-False]	0.1105ms	33.6482μs	29.7193 KOps/s	31.0923 KOps/s	$\color{#d91a1a}-4.42\%$
test_step_mdp_speed[False-True-True-False-True]	80.9060μs	44.7823μs	22.3303 KOps/s	22.2712 KOps/s	$\color{#35bf28}+0.27\%$
test_step_mdp_speed[False-True-True-False-False]	0.1071ms	24.9867μs	40.0214 KOps/s	40.2051 KOps/s	$\color{#d91a1a}-0.46\%$
test_step_mdp_speed[False-True-False-True-True]	0.2028ms	57.1715μs	17.4912 KOps/s	17.3723 KOps/s	$\color{#35bf28}+0.68\%$
test_step_mdp_speed[False-True-False-True-False]	83.6060μs	35.6899μs	28.0192 KOps/s	27.9212 KOps/s	$\color{#35bf28}+0.35\%$
test_step_mdp_speed[False-True-False-False-True]	0.1160ms	45.6753μs	21.8937 KOps/s	20.6355 KOps/s	$\textbf{\color{#35bf28}+6.10\%}$
test_step_mdp_speed[False-True-False-False-False]	94.5060μs	27.3382μs	36.5789 KOps/s	36.5997 KOps/s	$\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-False-True-True-True]	0.1710ms	60.2704μs	16.5919 KOps/s	16.6932 KOps/s	$\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-False-True-True-False]	0.1190ms	38.0676μs	26.2690 KOps/s	26.6995 KOps/s	$\color{#d91a1a}-1.61\%$
test_step_mdp_speed[False-False-True-False-True]	81.7050μs	46.6006μs	21.4590 KOps/s	20.9720 KOps/s	$\color{#35bf28}+2.32\%$
test_step_mdp_speed[False-False-True-False-False]	0.1064ms	26.9170μs	37.1513 KOps/s	37.1322 KOps/s	$\color{#35bf28}+0.05\%$
test_step_mdp_speed[False-False-False-True-True]	0.1142ms	60.7249μs	16.4677 KOps/s	16.0684 KOps/s	$\color{#35bf28}+2.49\%$
test_step_mdp_speed[False-False-False-True-False]	0.1489ms	39.9871μs	25.0081 KOps/s	24.8579 KOps/s	$\color{#35bf28}+0.60\%$
test_step_mdp_speed[False-False-False-False-True]	88.3060μs	48.2547μs	20.7234 KOps/s	21.1491 KOps/s	$\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-False-False-False-False]	0.1023ms	29.3799μs	34.0369 KOps/s	33.8363 KOps/s	$\color{#35bf28}+0.59\%$
test_values[generalized_advantage_estimate-True-True]	16.4701ms	15.3655ms	65.0811 Ops/s	64.6490 Ops/s	$\color{#35bf28}+0.67\%$
test_values[vec_generalized_advantage_estimate-True-True]	58.7579ms	47.9988ms	20.8338 Ops/s	20.0170 Ops/s	$\color{#35bf28}+4.08\%$
test_values[td0_return_estimate-False-False]	1.4525ms	0.2542ms	3.9346 KOps/s	4.3111 KOps/s	$\textbf{\color{#d91a1a}-8.73\%}$
test_values[td1_return_estimate-False-False]	21.2661ms	14.3787ms	69.5475 Ops/s	68.3216 Ops/s	$\color{#35bf28}+1.79\%$
test_values[vec_td1_return_estimate-False-False]	62.3502ms	48.7186ms	20.5260 Ops/s	20.7530 Ops/s	$\color{#d91a1a}-1.09\%$
test_values[td_lambda_return_estimate-True-False]	36.7649ms	35.8692ms	27.8790 Ops/s	27.9602 Ops/s	$\color{#d91a1a}-0.29\%$
test_values[vec_td_lambda_return_estimate-True-False]	53.6468ms	47.6071ms	21.0053 Ops/s	20.1969 Ops/s	$\color{#35bf28}+4.00\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	13.5513ms	13.3063ms	75.1522 Ops/s	75.4835 Ops/s	$\color{#d91a1a}-0.44\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	4.4637ms	3.9616ms	252.4235 Ops/s	229.9797 Ops/s	$\textbf{\color{#35bf28}+9.76\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	8.9242ms	0.5338ms	1.8733 KOps/s	1.8868 KOps/s	$\color{#d91a1a}-0.72\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	75.7226ms	65.8061ms	15.1961 Ops/s	16.6942 Ops/s	$\textbf{\color{#d91a1a}-8.97\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	10.1176ms	3.4563ms	289.3234 Ops/s	303.1526 Ops/s	$\color{#d91a1a}-4.56\%$
test_dqn_speed	5.3808ms	2.1930ms	455.9982 Ops/s	456.4696 Ops/s	$\color{#d91a1a}-0.10\%$
test_ddpg_speed	15.8147ms	3.4214ms	292.2822 Ops/s	303.5626 Ops/s	$\color{#d91a1a}-3.72\%$
test_sac_speed	16.6758ms	9.9643ms	100.3580 Ops/s	96.6589 Ops/s	$\color{#35bf28}+3.83\%$
test_redq_speed	27.6497ms	19.3821ms	51.5939 Ops/s	50.0998 Ops/s	$\color{#35bf28}+2.98\%$
test_redq_deprec_speed	25.6383ms	15.7424ms	63.5229 Ops/s	63.7062 Ops/s	$\color{#d91a1a}-0.29\%$
test_td3_speed	22.7042ms	12.9180ms	77.4114 Ops/s	81.6208 Ops/s	$\textbf{\color{#d91a1a}-5.16\%}$
test_cql_speed	46.6941ms	38.9581ms	25.6686 Ops/s	30.1520 Ops/s	$\textbf{\color{#d91a1a}-14.87\%}$
test_a2c_speed	12.3663ms	6.3100ms	158.4779 Ops/s	158.2560 Ops/s	$\color{#35bf28}+0.14\%$
test_ppo_speed	43.2018ms	7.4847ms	133.6060 Ops/s	146.5561 Ops/s	$\textbf{\color{#d91a1a}-8.84\%}$
test_reinforce_speed	5.6273ms	4.8698ms	205.3454 Ops/s	202.8195 Ops/s	$\color{#35bf28}+1.25\%$
test_iql_speed	32.7797ms	26.4046ms	37.8722 Ops/s	38.5589 Ops/s	$\color{#d91a1a}-1.78\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	10.8717ms	3.2410ms	308.5470 Ops/s	303.6460 Ops/s	$\color{#35bf28}+1.61\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	6.3166ms	3.3705ms	296.6959 Ops/s	295.4027 Ops/s	$\color{#35bf28}+0.44\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.1596s	4.0183ms	248.8623 Ops/s	289.9610 Ops/s	$\textbf{\color{#d91a1a}-14.17\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.3918ms	3.2291ms	309.6877 Ops/s	233.8293 Ops/s	$\textbf{\color{#35bf28}+32.44\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	5.6604ms	3.4182ms	292.5519 Ops/s	286.1925 Ops/s	$\color{#35bf28}+2.22\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	6.2675ms	3.4804ms	287.3267 Ops/s	283.5725 Ops/s	$\color{#35bf28}+1.32\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	5.1356ms	3.2960ms	303.3945 Ops/s	297.0482 Ops/s	$\color{#35bf28}+2.14\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	6.1026ms	3.3708ms	296.6668 Ops/s	297.8016 Ops/s	$\color{#d91a1a}-0.38\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	7.7640ms	3.2978ms	303.2324 Ops/s	292.3812 Ops/s	$\color{#35bf28}+3.71\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.7251ms	3.2478ms	307.9018 Ops/s	304.2208 Ops/s	$\color{#35bf28}+1.21\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	7.0352ms	3.4282ms	291.7023 Ops/s	288.6523 Ops/s	$\color{#35bf28}+1.06\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	6.4660ms	3.3372ms	299.6482 Ops/s	290.2122 Ops/s	$\color{#35bf28}+3.25\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.0933ms	3.2353ms	309.0877 Ops/s	308.0684 Ops/s	$\color{#35bf28}+0.33\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	7.6054ms	3.3539ms	298.1590 Ops/s	291.1113 Ops/s	$\color{#35bf28}+2.42\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	6.9095ms	3.4381ms	290.8561 Ops/s	296.9770 Ops/s	$\color{#d91a1a}-2.06\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	4.3310ms	3.2730ms	305.5319 Ops/s	309.8827 Ops/s	$\color{#d91a1a}-1.40\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	5.9179ms	3.3631ms	297.3460 Ops/s	287.4260 Ops/s	$\color{#35bf28}+3.45\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.1684s	3.9773ms	251.4251 Ops/s	289.8637 Ops/s	$\textbf{\color{#d91a1a}-13.26\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.3398s	34.2021ms	29.2380 Ops/s	28.6756 Ops/s	$\color{#35bf28}+1.96\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	0.1784s	32.0856ms	31.1666 Ops/s	29.3976 Ops/s	$\textbf{\color{#35bf28}+6.02\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	0.1634s	33.0770ms	30.2325 Ops/s	31.8222 Ops/s	$\color{#d91a1a}-5.00\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1715s	33.1941ms	30.1259 Ops/s	29.1887 Ops/s	$\color{#35bf28}+3.21\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	0.1688s	30.8474ms	32.4177 Ops/s	31.1972 Ops/s	$\color{#35bf28}+3.91\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	0.1664s	32.8674ms	30.4253 Ops/s	29.0259 Ops/s	$\color{#35bf28}+4.82\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1592s	27.3021ms	36.6273 Ops/s	35.5638 Ops/s	$\color{#35bf28}+2.99\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1715s	30.8670ms	32.3971 Ops/s	29.4496 Ops/s	$\textbf{\color{#35bf28}+10.01\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	0.1608s	30.1441ms	33.1740 Ops/s	32.3258 Ops/s	$\color{#35bf28}+2.62\%$

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini · 2023-08-18T14:16:26Z

@vmoens PR is ready for review, here the last benchmark results (i dunno how attendibili they are ahaha)

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1647s	0.1627s	6.1456 Ops/s	5.8554 Ops/s	$\color{#35bf28}+4.95\%$
test_sync	0.2485s	0.1124s	8.8968 Ops/s	10.2936 Ops/s	$\textbf{\color{#d91a1a}-13.57\%}$
test_async	0.1691s	92.9929ms	10.7535 Ops/s	11.3000 Ops/s	$\color{#d91a1a}-4.84\%$
test_simple	0.7580s	0.6880s	1.4535 Ops/s	1.3949 Ops/s	$\color{#35bf28}+4.20\%$
test_transformed	1.9819s	1.9045s	0.5251 Ops/s	0.5285 Ops/s	$\color{#d91a1a}-0.64\%$
test_serial	2.1587s	2.1303s	0.4694 Ops/s	0.4546 Ops/s	$\color{#35bf28}+3.26\%$
test_parallel	1.9122s	1.7912s	0.5583 Ops/s	0.5481 Ops/s	$\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-True-True-True]	4.4434ms	55.7029μs	17.9524 KOps/s	17.9924 KOps/s	$\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-True-True-True-False]	5.6612ms	31.6486μs	31.5970 KOps/s	31.1724 KOps/s	$\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-True-True-False-True]	5.8664ms	37.2757μs	26.8271 KOps/s	25.7469 KOps/s	$\color{#35bf28}+4.20\%$
test_step_mdp_speed[True-True-True-False-False]	2.7161ms	21.7901μs	45.8924 KOps/s	46.9270 KOps/s	$\color{#d91a1a}-2.20\%$
test_step_mdp_speed[True-True-False-True-True]	1.8516ms	56.7031μs	17.6357 KOps/s	17.0507 KOps/s	$\color{#35bf28}+3.43\%$
test_step_mdp_speed[True-True-False-True-False]	5.0931ms	33.8472μs	29.5445 KOps/s	29.2349 KOps/s	$\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-False-False-True]	0.4845ms	40.1233μs	24.9232 KOps/s	24.9629 KOps/s	$\color{#d91a1a}-0.16\%$
test_step_mdp_speed[True-True-False-False-False]	0.2525ms	22.4385μs	44.5663 KOps/s	42.8728 KOps/s	$\color{#35bf28}+3.95\%$
test_step_mdp_speed[True-False-True-True-True]	2.3308ms	53.6124μs	18.6524 KOps/s	17.1587 KOps/s	$\textbf{\color{#35bf28}+8.71\%}$
test_step_mdp_speed[True-False-True-True-False]	86.1010μs	32.8285μs	30.4614 KOps/s	29.0816 KOps/s	$\color{#35bf28}+4.74\%$
test_step_mdp_speed[True-False-True-False-True]	0.1276ms	36.7159μs	27.2361 KOps/s	26.0517 KOps/s	$\color{#35bf28}+4.55\%$
test_step_mdp_speed[True-False-True-False-False]	93.1020μs	22.1981μs	45.0488 KOps/s	43.4095 KOps/s	$\color{#35bf28}+3.78\%$
test_step_mdp_speed[True-False-False-True-True]	0.1140ms	57.8957μs	17.2724 KOps/s	15.7928 KOps/s	$\textbf{\color{#35bf28}+9.37\%}$
test_step_mdp_speed[True-False-False-True-False]	79.8010μs	35.5476μs	28.1313 KOps/s	26.1017 KOps/s	$\textbf{\color{#35bf28}+7.78\%}$
test_step_mdp_speed[True-False-False-False-True]	86.7010μs	39.4724μs	25.3342 KOps/s	23.6531 KOps/s	$\textbf{\color{#35bf28}+7.11\%}$
test_step_mdp_speed[True-False-False-False-False]	55.0000μs	24.3875μs	41.0046 KOps/s	37.4748 KOps/s	$\textbf{\color{#35bf28}+9.42\%}$
test_step_mdp_speed[False-True-True-True-True]	0.1330ms	55.7093μs	17.9503 KOps/s	16.2877 KOps/s	$\textbf{\color{#35bf28}+10.21\%}$
test_step_mdp_speed[False-True-True-True-False]	0.1021ms	33.7844μs	29.5995 KOps/s	27.2411 KOps/s	$\textbf{\color{#35bf28}+8.66\%}$
test_step_mdp_speed[False-True-True-False-True]	0.1075ms	45.6167μs	21.9218 KOps/s	20.5252 KOps/s	$\textbf{\color{#35bf28}+6.80\%}$
test_step_mdp_speed[False-True-True-False-False]	57.5010μs	25.4963μs	39.2214 KOps/s	37.3505 KOps/s	$\textbf{\color{#35bf28}+5.01\%}$
test_step_mdp_speed[False-True-False-True-True]	0.1156ms	57.6891μs	17.3343 KOps/s	15.9812 KOps/s	$\textbf{\color{#35bf28}+8.47\%}$
test_step_mdp_speed[False-True-False-True-False]	72.2020μs	35.8726μs	27.8765 KOps/s	25.5467 KOps/s	$\textbf{\color{#35bf28}+9.12\%}$
test_step_mdp_speed[False-True-False-False-True]	99.5010μs	47.1273μs	21.2191 KOps/s	19.6296 KOps/s	$\textbf{\color{#35bf28}+8.10\%}$
test_step_mdp_speed[False-True-False-False-False]	0.4194ms	27.5198μs	36.3375 KOps/s	33.9728 KOps/s	$\textbf{\color{#35bf28}+6.96\%}$
test_step_mdp_speed[False-False-True-True-True]	0.1738ms	60.2809μs	16.5890 KOps/s	15.2697 KOps/s	$\textbf{\color{#35bf28}+8.64\%}$
test_step_mdp_speed[False-False-True-True-False]	88.8020μs	38.2615μs	26.1360 KOps/s	24.4306 KOps/s	$\textbf{\color{#35bf28}+6.98\%}$
test_step_mdp_speed[False-False-True-False-True]	0.1072ms	46.7716μs	21.3805 KOps/s	20.0850 KOps/s	$\textbf{\color{#35bf28}+6.45\%}$
test_step_mdp_speed[False-False-True-False-False]	0.1034ms	27.3023μs	36.6270 KOps/s	34.8872 KOps/s	$\color{#35bf28}+4.99\%$
test_step_mdp_speed[False-False-False-True-True]	0.4811ms	61.3224μs	16.3073 KOps/s	15.0569 KOps/s	$\textbf{\color{#35bf28}+8.30\%}$
test_step_mdp_speed[False-False-False-True-False]	0.1207ms	40.2852μs	24.8230 KOps/s	23.1117 KOps/s	$\textbf{\color{#35bf28}+7.40\%}$
test_step_mdp_speed[False-False-False-False-True]	87.8020μs	47.0619μs	21.2486 KOps/s	19.4897 KOps/s	$\textbf{\color{#35bf28}+9.02\%}$
test_step_mdp_speed[False-False-False-False-False]	69.0010μs	28.6427μs	34.9129 KOps/s	31.6615 KOps/s	$\textbf{\color{#35bf28}+10.27\%}$
test_values[generalized_advantage_estimate-True-True]	17.1591ms	15.6134ms	64.0477 Ops/s	65.4707 Ops/s	$\color{#d91a1a}-2.17\%$
test_values[vec_generalized_advantage_estimate-True-True]	54.1021ms	48.0776ms	20.7997 Ops/s	19.9874 Ops/s	$\color{#35bf28}+4.06\%$
test_values[td0_return_estimate-False-False]	0.5348ms	0.2683ms	3.7273 KOps/s	3.9547 KOps/s	$\textbf{\color{#d91a1a}-5.75\%}$
test_values[td1_return_estimate-False-False]	28.2293ms	17.9419ms	55.7353 Ops/s	66.5124 Ops/s	$\textbf{\color{#d91a1a}-16.20\%}$
test_values[vec_td1_return_estimate-False-False]	55.9740ms	49.9254ms	20.0299 Ops/s	20.1732 Ops/s	$\color{#d91a1a}-0.71\%$
test_values[td_lambda_return_estimate-True-False]	57.1347ms	39.2181ms	25.4984 Ops/s	27.4650 Ops/s	$\textbf{\color{#d91a1a}-7.16\%}$
test_values[vec_td_lambda_return_estimate-True-False]	55.4242ms	47.5653ms	21.0237 Ops/s	21.1204 Ops/s	$\color{#d91a1a}-0.46\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	13.8923ms	13.4951ms	74.1011 Ops/s	75.2222 Ops/s	$\color{#d91a1a}-1.49\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	11.8691ms	4.1182ms	242.8246 Ops/s	247.5457 Ops/s	$\color{#d91a1a}-1.91\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	1.0107ms	0.5319ms	1.8799 KOps/s	1.8635 KOps/s	$\color{#35bf28}+0.88\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	67.2663ms	59.5128ms	16.8031 Ops/s	15.6373 Ops/s	$\textbf{\color{#35bf28}+7.46\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	10.2972ms	3.2775ms	305.1109 Ops/s	302.4077 Ops/s	$\color{#35bf28}+0.89\%$
test_dqn_speed	7.7693ms	2.1017ms	475.8026 Ops/s	459.3154 Ops/s	$\color{#35bf28}+3.59\%$
test_ddpg_speed	10.8176ms	3.1370ms	318.7800 Ops/s	303.8929 Ops/s	$\color{#35bf28}+4.90\%$
test_sac_speed	17.6591ms	9.4098ms	106.2716 Ops/s	102.4623 Ops/s	$\color{#35bf28}+3.72\%$
test_redq_speed	24.7943ms	17.6182ms	56.7594 Ops/s	52.9670 Ops/s	$\textbf{\color{#35bf28}+7.16\%}$
test_redq_deprec_speed	25.7918ms	14.8359ms	67.4041 Ops/s	67.4533 Ops/s	$\color{#d91a1a}-0.07\%$
test_td3_speed	14.8726ms	11.5413ms	86.6457 Ops/s	82.4292 Ops/s	$\textbf{\color{#35bf28}+5.12\%}$
test_cql_speed	37.3957ms	30.2593ms	33.0477 Ops/s	27.9962 Ops/s	$\textbf{\color{#35bf28}+18.04\%}$
test_a2c_speed	12.9621ms	5.8443ms	171.1055 Ops/s	157.3071 Ops/s	$\textbf{\color{#35bf28}+8.77\%}$
test_ppo_speed	52.1499ms	6.6604ms	150.1405 Ops/s	138.9223 Ops/s	$\textbf{\color{#35bf28}+8.08\%}$
test_reinforce_speed	10.1891ms	4.5586ms	219.3678 Ops/s	193.6780 Ops/s	$\textbf{\color{#35bf28}+13.26\%}$
test_iql_speed	30.4570ms	23.8415ms	41.9437 Ops/s	37.6562 Ops/s	$\textbf{\color{#35bf28}+11.39\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.1427ms	3.0059ms	332.6745 Ops/s	327.3033 Ops/s	$\color{#35bf28}+1.64\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	5.3442ms	3.1919ms	313.2922 Ops/s	312.0670 Ops/s	$\color{#35bf28}+0.39\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	5.8876ms	3.2056ms	311.9553 Ops/s	307.0196 Ops/s	$\color{#35bf28}+1.61\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.6546ms	3.0113ms	332.0802 Ops/s	266.8115 Ops/s	$\textbf{\color{#35bf28}+24.46\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	5.3157ms	3.2145ms	311.0923 Ops/s	308.0937 Ops/s	$\color{#35bf28}+0.97\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.1873s	3.7193ms	268.8669 Ops/s	306.2822 Ops/s	$\textbf{\color{#d91a1a}-12.22\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.9944ms	3.0221ms	330.8998 Ops/s	341.3490 Ops/s	$\color{#d91a1a}-3.06\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	5.7867ms	3.1228ms	320.2229 Ops/s	317.8732 Ops/s	$\color{#35bf28}+0.74\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	6.5091ms	3.2183ms	310.7208 Ops/s	309.6122 Ops/s	$\color{#35bf28}+0.36\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.3055ms	3.1312ms	319.3685 Ops/s	322.5116 Ops/s	$\color{#d91a1a}-0.97\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	7.1730ms	3.3437ms	299.0730 Ops/s	307.7729 Ops/s	$\color{#d91a1a}-2.83\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	6.1064ms	3.3799ms	295.8667 Ops/s	306.8941 Ops/s	$\color{#d91a1a}-3.59\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.3425ms	3.1871ms	313.7695 Ops/s	332.2224 Ops/s	$\textbf{\color{#d91a1a}-5.55\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	5.9926ms	3.4568ms	289.2834 Ops/s	301.0128 Ops/s	$\color{#d91a1a}-3.90\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	6.9477ms	3.5537ms	281.3948 Ops/s	300.5863 Ops/s	$\textbf{\color{#d91a1a}-6.38\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	4.1597ms	3.1344ms	319.0360 Ops/s	332.9123 Ops/s	$\color{#d91a1a}-4.17\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	6.2296ms	3.2633ms	306.4409 Ops/s	308.2488 Ops/s	$\color{#d91a1a}-0.59\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	7.0389ms	3.3730ms	296.4743 Ops/s	297.6329 Ops/s	$\color{#d91a1a}-0.39\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.3154s	33.4142ms	29.9274 Ops/s	30.6981 Ops/s	$\color{#d91a1a}-2.51\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	0.1602s	32.6436ms	30.6339 Ops/s	29.9558 Ops/s	$\color{#35bf28}+2.26\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	0.1648s	30.5007ms	32.7861 Ops/s	32.6256 Ops/s	$\color{#35bf28}+0.49\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1579s	32.9126ms	30.3835 Ops/s	30.3961 Ops/s	$\color{#d91a1a}-0.04\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	0.1621s	30.3391ms	32.9607 Ops/s	30.1277 Ops/s	$\textbf{\color{#35bf28}+9.40\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	0.1632s	33.6573ms	29.7113 Ops/s	32.7989 Ops/s	$\textbf{\color{#d91a1a}-9.41\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1602s	30.3708ms	32.9263 Ops/s	30.3517 Ops/s	$\textbf{\color{#35bf28}+8.48\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1616s	32.6958ms	30.5850 Ops/s	33.1721 Ops/s	$\textbf{\color{#d91a1a}-7.80\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	0.1566s	29.5711ms	33.8168 Ops/s	33.6890 Ops/s	$\color{#35bf28}+0.38\%$

This reverts commit e8e410e.

Signed-off-by: Matteo Bettini <[email protected]>

# Conflicts: # torchrl/data/utils.py

vmoens

As part of this PR, I think we should rename the "_action_spec" in "full_action_spec" etc.
For the rest I just have some minor comments, nice work!

torchrl/envs/common.py

torchrl/envs/vec_env.py

torchrl/envs/utils.py

torchrl/envs/common.py

Signed-off-by: Matteo Bettini <[email protected]>

Co-authored-by: Vincent Moens <[email protected]>

Signed-off-by: Matteo Bettini <[email protected]>

vmoens

LGTM thanks for this!

…`vec_env` and `collectors` (pytorch#1462) Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>

matteobettini added 4 commits August 4, 2023 15:24

temp

4f597bf

Signed-off-by: Matteo Bettini <[email protected]>

action

49bc8e5

Signed-off-by: Matteo Bettini <[email protected]>

amend

36e1afb

Signed-off-by: Matteo Bettini <[email protected]>

Merge branch 'main' into allow-all-specs-compsite

eee5045

# Conflicts: # test/mocking_classes.py # test/test_env.py

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2023

matteobettini added 8 commits August 15, 2023 10:11

reward spec

92f62a9

Signed-off-by: Matteo Bettini <[email protected]>

reward spec

5a77edd

Signed-off-by: Matteo Bettini <[email protected]>

done spec

1c334b1

Signed-off-by: Matteo Bettini <[email protected]>

done spec

7dc7548

Signed-off-by: Matteo Bettini <[email protected]>

fix

ba13680

Signed-off-by: Matteo Bettini <[email protected]>

rollout and step_mdp

2f548ea

Signed-off-by: Matteo Bettini <[email protected]>

fix

4054f61

Signed-off-by: Matteo Bettini <[email protected]>

amend

a772289

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini changed the title ~~[Feature] Allow multiple action, reward, done keys~~ [Feature] Allow multiple (nested) action, reward, done keys in env module Aug 15, 2023

added todos for _reset

5baa353

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini mentioned this pull request Aug 15, 2023

[Discussion] TorchRL MARL API #1463

Closed

matteobettini added 4 commits August 16, 2023 09:44

docs

b6c1047

Signed-off-by: Matteo Bettini <[email protected]>

fix transforms

5f294d6

Signed-off-by: Matteo Bettini <[email protected]>

vec_env

e20298e

Signed-off-by: Matteo Bettini <[email protected]>

collector

873dbbf

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini changed the title ~~[Feature] Allow multiple (nested) action, reward, done keys in env module~~ [Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors Aug 16, 2023

treat done

4332984

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini added 2 commits August 18, 2023 09:15

amend

162e40f

Signed-off-by: Matteo Bettini <[email protected]>

amend

d9c0dbb

Signed-off-by: Matteo Bettini <[email protected]>

collectors and vec_env

451e9a9

Signed-off-by: Matteo Bettini <[email protected]>

TEMP

e8e410e

Signed-off-by: Matteo Bettini <[email protected]>

Revert "TEMP"

d3cbd5d

This reverts commit e8e410e.

matteobettini marked this pull request as ready for review August 18, 2023 14:17

amend

ea1fe3f

Signed-off-by: Matteo Bettini <[email protected]>

matteobettini mentioned this pull request Aug 22, 2023

[Environment] Petting zoo #1471

Merged

vmoens added the enhancement New feature or request label Aug 30, 2023

Merge branch 'main' into allow-all-specs-compsite

8d5abef

# Conflicts: # torchrl/data/utils.py

vmoens reviewed Aug 30, 2023

View reviewed changes

matteobettini and others added 12 commits August 30, 2023 13:21

fix review

334aa8d

Signed-off-by: Matteo Bettini <[email protected]>

Update torchrl/envs/vec_env.py

4830358

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/vec_env.py

78be054

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/vec_env.py

836d085

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

95dd02c

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

d755d89

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

9187b28

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

19875bd

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

01fc27a

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

790ff36

Co-authored-by: Vincent Moens <[email protected]>

Update torchrl/envs/common.py

793b738

Co-authored-by: Vincent Moens <[email protected]>

preappend full_ before specs

6f1debe

Signed-off-by: Matteo Bettini <[email protected]>

vmoens approved these changes Aug 30, 2023

View reviewed changes

vmoens merged commit f8777a6 into pytorch:main Aug 30, 2023

matteobettini deleted the allow-all-specs-compsite branch August 30, 2023 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` #1462

[Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` #1462

matteobettini commented Aug 15, 2023 •

edited

Loading

matteobettini commented Aug 15, 2023

hyerra commented Aug 16, 2023

matteobettini commented Aug 17, 2023

matteobettini commented Aug 17, 2023

matteobettini commented Aug 18, 2023

matteobettini commented Aug 18, 2023

vmoens left a comment

vmoens left a comment

[Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors #1462

[Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors #1462

Conversation

matteobettini commented Aug 15, 2023 • edited Loading

Main design problem: multiple dones

matteobettini commented Aug 15, 2023

hyerra commented Aug 16, 2023

matteobettini commented Aug 17, 2023

matteobettini commented Aug 17, 2023

matteobettini commented Aug 18, 2023

matteobettini commented Aug 18, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

[Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` #1462

[Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` #1462

matteobettini commented Aug 15, 2023 •

edited

Loading