Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors #1462

Merged
merged 37 commits into from
Aug 30, 2023

Conversation

matteobettini
Copy link
Contributor

@matteobettini matteobettini commented Aug 15, 2023

Depends on #512

This PR is an important milestone in the journey to #1463.

It will also allow single-agent users to have more than one action, reward, and done keys.

This is very important when your agents has multiple action (e.g., some discrete, some continuous ).

Main design problem: multiple dones

One major design choice here though is that if we allow multiple done keys, we need a "_reset" key for each (same for "truncated" , "is_init" and all the other done-based keys).

This is because we need to independently reset each done (e.g., in multiagent we want to reset only some done agents).

So the question is, what is the best way to do this? because we can imagine having a done_spec like

done_spec = CompositeSpec{
     "nested_1": {
            "done": DiscreteSpec,
      }
      "nested_2": {
            "other_done": DiscreteSpec,
          
      }
       "done": DiscreteSpec,
}

With a spec of this type, we place a "_reset" key for each done present (and that has any trues).
Same will be valid for "truncated" and "is_init".

A _reset td will look like

reset_td = TensorDict{
     "nested_1": {
            "_reset": Tensor,
      }
      "nested_2": {
            "_reset": Tensor,
          
      }
       "_reset": Tensor,
}

This solution is already implemented in the PR

cc @hyerra

Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts:
#	test/mocking_classes.py
#	test/test_env.py
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2023
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini matteobettini changed the title [Feature] Allow multiple action, reward, done keys [Feature] Allow multiple (nested) action, reward, done keys in env module Aug 15, 2023
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

Up to here benchmark runs have shown all performance has been maintained

Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini matteobettini changed the title [Feature] Allow multiple (nested) action, reward, done keys in env module [Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors Aug 16, 2023
@hyerra
Copy link
Contributor

hyerra commented Aug 16, 2023

This looks great! I have a few questions though:

  1. Is there any flexibility on when an agent that died gets reset? In single-agent situations, it's more straightforward since you reset once the only agent in the game dies. However, in multi-agent games, there might be cases when we don't want to reset the agent immediately. Like for instance maybe we only reset after all agents of a behavior die, or we reset the game after all agents in the game have died. For the first situation, it might be beneficial to think of competitive games where the game keeps progressing until one team loses.
  2. I might be a little confused, but just to make sure I understand the purpose of multiple done keys per agent, would this be used in situations where an agent can be partially done? So for example, let's say an agent has 2 actions and the done corresponding to the first action is True and the done corresponding to the second action is False. In this scenario would we want to reset the agent completely, or mask the first action and only allow the second action to be set, or something else maybe?

@matteobettini
Copy link
Contributor Author

  1. Is there any flexibility on when an agent that died gets reset? In single-agent situations, it's more straightforward since you reset once the only agent in the game dies. However, in multi-agent games, there might be cases when we don't want to reset the agent immediately. Like for instance maybe we only reset after all agents of a behavior die, or we reset the game after all agents in the game have died. For the first situation, it might be beneficial to think of competitive games where the game keeps progressing until one team loses.

Basically the idea is that at every step when some reset flag will be true, _reset() in the env will be called and given as input the _reset flags. If your game does not want to actually reset the game when only some resets are true you can defo do that in the _reset() logic of your env. The only problem I see is that currently torchrl checks that no "done" is true after a reset, but that should be removed anyway.

The other possibility is that you only put the dones you catually need in your done spec. i.e. if your agent can be done, but you want to reset only when some groups are done, than your done spec should have group granularity and not agent granularity.

2. I might be a little confused, but just to make sure I understand the purpose of multiple done keys per agent, would this be used in situations where an agent can be partially done? So for example, let's say an agent has 2 actions and the done corresponding to the first action is True and the done corresponding to the second action is False. In this scenario would we want to reset the agent completely, or mask the first action and only allow the second action to be set, or something else maybe?

That snippet i posted is just an example pushing the flexibility of a composite done spec to the extreme. I do not see a use for that in multiagent. In multiagent we suggest to stick to what is outlined in #1463

Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

@vmoens @hyerra i updated the description to illustrate how multiple _resets work

Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

Benchmarks up to here

Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1723s 0.1693s 5.9062 Ops/s 5.8633 Ops/s $\color{#35bf28}+0.73\%$
test_sync 0.1014s 88.3443ms 11.3194 Ops/s 10.0705 Ops/s $\textbf{\color{#35bf28}+12.40\%}$
test_async 0.2583s 86.9081ms 11.5064 Ops/s 11.1518 Ops/s $\color{#35bf28}+3.18\%$
test_simple 0.7965s 0.7169s 1.3948 Ops/s 1.3627 Ops/s $\color{#35bf28}+2.36\%$
test_transformed 2.0009s 1.9152s 0.5221 Ops/s 0.5098 Ops/s $\color{#35bf28}+2.43\%$
test_serial 2.3078s 2.2196s 0.4505 Ops/s 0.4374 Ops/s $\color{#35bf28}+3.00\%$
test_parallel 1.9862s 1.8723s 0.5341 Ops/s 0.5114 Ops/s $\color{#35bf28}+4.44\%$
test_step_mdp_speed[True-True-True-True-True] 0.1436ms 51.1794μs 19.5391 KOps/s 19.5441 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-True-True-True-False] 69.6040μs 29.0311μs 34.4458 KOps/s 34.8334 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-True-True-False-True] 73.8050μs 35.7194μs 27.9960 KOps/s 27.2185 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[True-True-True-False-False] 0.1312ms 19.9777μs 50.0558 KOps/s 50.8298 KOps/s $\color{#d91a1a}-1.52\%$
test_step_mdp_speed[True-True-False-True-True] 0.2524ms 53.2112μs 18.7930 KOps/s 18.4247 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-True-False-True-False] 0.2505ms 31.5389μs 31.7068 KOps/s 31.8836 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[True-True-False-False-True] 74.8040μs 37.7298μs 26.5043 KOps/s 25.8554 KOps/s $\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-True-False-False-False] 0.1153ms 22.5023μs 44.4400 KOps/s 44.6484 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-False-True-True-True] 0.1495ms 54.1892μs 18.4539 KOps/s 18.2167 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[True-False-True-True-False] 0.1885ms 32.9094μs 30.3864 KOps/s 30.0521 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-True-False-True] 0.1354ms 37.9034μs 26.3829 KOps/s 26.2531 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[True-False-True-False-False] 99.6070μs 22.7218μs 44.0106 KOps/s 45.2454 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[True-False-False-True-True] 0.1357ms 56.9797μs 17.5501 KOps/s 17.1613 KOps/s $\color{#35bf28}+2.27\%$
test_step_mdp_speed[True-False-False-True-False] 0.1368ms 35.1205μs 28.4734 KOps/s 28.6868 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-False-False-False-True] 0.2515ms 40.0787μs 24.9509 KOps/s 24.2065 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[True-False-False-False-False] 85.7050μs 24.1722μs 41.3698 KOps/s 41.1810 KOps/s $\color{#35bf28}+0.46\%$
test_step_mdp_speed[False-True-True-True-True] 0.1106ms 55.6137μs 17.9812 KOps/s 17.8849 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[False-True-True-True-False] 0.1105ms 33.6482μs 29.7193 KOps/s 31.0923 KOps/s $\color{#d91a1a}-4.42\%$
test_step_mdp_speed[False-True-True-False-True] 80.9060μs 44.7823μs 22.3303 KOps/s 22.2712 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[False-True-True-False-False] 0.1071ms 24.9867μs 40.0214 KOps/s 40.2051 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[False-True-False-True-True] 0.2028ms 57.1715μs 17.4912 KOps/s 17.3723 KOps/s $\color{#35bf28}+0.68\%$
test_step_mdp_speed[False-True-False-True-False] 83.6060μs 35.6899μs 28.0192 KOps/s 27.9212 KOps/s $\color{#35bf28}+0.35\%$
test_step_mdp_speed[False-True-False-False-True] 0.1160ms 45.6753μs 21.8937 KOps/s 20.6355 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_step_mdp_speed[False-True-False-False-False] 94.5060μs 27.3382μs 36.5789 KOps/s 36.5997 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-False-True-True-True] 0.1710ms 60.2704μs 16.5919 KOps/s 16.6932 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-False-True-True-False] 0.1190ms 38.0676μs 26.2690 KOps/s 26.6995 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[False-False-True-False-True] 81.7050μs 46.6006μs 21.4590 KOps/s 20.9720 KOps/s $\color{#35bf28}+2.32\%$
test_step_mdp_speed[False-False-True-False-False] 0.1064ms 26.9170μs 37.1513 KOps/s 37.1322 KOps/s $\color{#35bf28}+0.05\%$
test_step_mdp_speed[False-False-False-True-True] 0.1142ms 60.7249μs 16.4677 KOps/s 16.0684 KOps/s $\color{#35bf28}+2.49\%$
test_step_mdp_speed[False-False-False-True-False] 0.1489ms 39.9871μs 25.0081 KOps/s 24.8579 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[False-False-False-False-True] 88.3060μs 48.2547μs 20.7234 KOps/s 21.1491 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-False-False-False-False] 0.1023ms 29.3799μs 34.0369 KOps/s 33.8363 KOps/s $\color{#35bf28}+0.59\%$
test_values[generalized_advantage_estimate-True-True] 16.4701ms 15.3655ms 65.0811 Ops/s 64.6490 Ops/s $\color{#35bf28}+0.67\%$
test_values[vec_generalized_advantage_estimate-True-True] 58.7579ms 47.9988ms 20.8338 Ops/s 20.0170 Ops/s $\color{#35bf28}+4.08\%$
test_values[td0_return_estimate-False-False] 1.4525ms 0.2542ms 3.9346 KOps/s 4.3111 KOps/s $\textbf{\color{#d91a1a}-8.73\%}$
test_values[td1_return_estimate-False-False] 21.2661ms 14.3787ms 69.5475 Ops/s 68.3216 Ops/s $\color{#35bf28}+1.79\%$
test_values[vec_td1_return_estimate-False-False] 62.3502ms 48.7186ms 20.5260 Ops/s 20.7530 Ops/s $\color{#d91a1a}-1.09\%$
test_values[td_lambda_return_estimate-True-False] 36.7649ms 35.8692ms 27.8790 Ops/s 27.9602 Ops/s $\color{#d91a1a}-0.29\%$
test_values[vec_td_lambda_return_estimate-True-False] 53.6468ms 47.6071ms 21.0053 Ops/s 20.1969 Ops/s $\color{#35bf28}+4.00\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 13.5513ms 13.3063ms 75.1522 Ops/s 75.4835 Ops/s $\color{#d91a1a}-0.44\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 4.4637ms 3.9616ms 252.4235 Ops/s 229.9797 Ops/s $\textbf{\color{#35bf28}+9.76\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 8.9242ms 0.5338ms 1.8733 KOps/s 1.8868 KOps/s $\color{#d91a1a}-0.72\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 75.7226ms 65.8061ms 15.1961 Ops/s 16.6942 Ops/s $\textbf{\color{#d91a1a}-8.97\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 10.1176ms 3.4563ms 289.3234 Ops/s 303.1526 Ops/s $\color{#d91a1a}-4.56\%$
test_dqn_speed 5.3808ms 2.1930ms 455.9982 Ops/s 456.4696 Ops/s $\color{#d91a1a}-0.10\%$
test_ddpg_speed 15.8147ms 3.4214ms 292.2822 Ops/s 303.5626 Ops/s $\color{#d91a1a}-3.72\%$
test_sac_speed 16.6758ms 9.9643ms 100.3580 Ops/s 96.6589 Ops/s $\color{#35bf28}+3.83\%$
test_redq_speed 27.6497ms 19.3821ms 51.5939 Ops/s 50.0998 Ops/s $\color{#35bf28}+2.98\%$
test_redq_deprec_speed 25.6383ms 15.7424ms 63.5229 Ops/s 63.7062 Ops/s $\color{#d91a1a}-0.29\%$
test_td3_speed 22.7042ms 12.9180ms 77.4114 Ops/s 81.6208 Ops/s $\textbf{\color{#d91a1a}-5.16\%}$
test_cql_speed 46.6941ms 38.9581ms 25.6686 Ops/s 30.1520 Ops/s $\textbf{\color{#d91a1a}-14.87\%}$
test_a2c_speed 12.3663ms 6.3100ms 158.4779 Ops/s 158.2560 Ops/s $\color{#35bf28}+0.14\%$
test_ppo_speed 43.2018ms 7.4847ms 133.6060 Ops/s 146.5561 Ops/s $\textbf{\color{#d91a1a}-8.84\%}$
test_reinforce_speed 5.6273ms 4.8698ms 205.3454 Ops/s 202.8195 Ops/s $\color{#35bf28}+1.25\%$
test_iql_speed 32.7797ms 26.4046ms 37.8722 Ops/s 38.5589 Ops/s $\color{#d91a1a}-1.78\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 10.8717ms 3.2410ms 308.5470 Ops/s 303.6460 Ops/s $\color{#35bf28}+1.61\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 6.3166ms 3.3705ms 296.6959 Ops/s 295.4027 Ops/s $\color{#35bf28}+0.44\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1596s 4.0183ms 248.8623 Ops/s 289.9610 Ops/s $\textbf{\color{#d91a1a}-14.17\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3918ms 3.2291ms 309.6877 Ops/s 233.8293 Ops/s $\textbf{\color{#35bf28}+32.44\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 5.6604ms 3.4182ms 292.5519 Ops/s 286.1925 Ops/s $\color{#35bf28}+2.22\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 6.2675ms 3.4804ms 287.3267 Ops/s 283.5725 Ops/s $\color{#35bf28}+1.32\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.1356ms 3.2960ms 303.3945 Ops/s 297.0482 Ops/s $\color{#35bf28}+2.14\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 6.1026ms 3.3708ms 296.6668 Ops/s 297.8016 Ops/s $\color{#d91a1a}-0.38\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 7.7640ms 3.2978ms 303.2324 Ops/s 292.3812 Ops/s $\color{#35bf28}+3.71\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.7251ms 3.2478ms 307.9018 Ops/s 304.2208 Ops/s $\color{#35bf28}+1.21\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 7.0352ms 3.4282ms 291.7023 Ops/s 288.6523 Ops/s $\color{#35bf28}+1.06\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 6.4660ms 3.3372ms 299.6482 Ops/s 290.2122 Ops/s $\color{#35bf28}+3.25\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.0933ms 3.2353ms 309.0877 Ops/s 308.0684 Ops/s $\color{#35bf28}+0.33\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 7.6054ms 3.3539ms 298.1590 Ops/s 291.1113 Ops/s $\color{#35bf28}+2.42\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 6.9095ms 3.4381ms 290.8561 Ops/s 296.9770 Ops/s $\color{#d91a1a}-2.06\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.3310ms 3.2730ms 305.5319 Ops/s 309.8827 Ops/s $\color{#d91a1a}-1.40\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 5.9179ms 3.3631ms 297.3460 Ops/s 287.4260 Ops/s $\color{#35bf28}+3.45\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1684s 3.9773ms 251.4251 Ops/s 289.8637 Ops/s $\textbf{\color{#d91a1a}-13.26\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3398s 34.2021ms 29.2380 Ops/s 28.6756 Ops/s $\color{#35bf28}+1.96\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1784s 32.0856ms 31.1666 Ops/s 29.3976 Ops/s $\textbf{\color{#35bf28}+6.02\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1634s 33.0770ms 30.2325 Ops/s 31.8222 Ops/s $\color{#d91a1a}-5.00\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1715s 33.1941ms 30.1259 Ops/s 29.1887 Ops/s $\color{#35bf28}+3.21\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1688s 30.8474ms 32.4177 Ops/s 31.1972 Ops/s $\color{#35bf28}+3.91\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1664s 32.8674ms 30.4253 Ops/s 29.0259 Ops/s $\color{#35bf28}+4.82\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1592s 27.3021ms 36.6273 Ops/s 35.5638 Ops/s $\color{#35bf28}+2.99\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1715s 30.8670ms 32.3971 Ops/s 29.4496 Ops/s $\textbf{\color{#35bf28}+10.01\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1608s 30.1441ms 33.1740 Ops/s 32.3258 Ops/s $\color{#35bf28}+2.62\%$

Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@matteobettini
Copy link
Contributor Author

@vmoens PR is ready for review, here the last benchmark results (i dunno how attendibili they are ahaha)

Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1647s 0.1627s 6.1456 Ops/s 5.8554 Ops/s $\color{#35bf28}+4.95\%$
test_sync 0.2485s 0.1124s 8.8968 Ops/s 10.2936 Ops/s $\textbf{\color{#d91a1a}-13.57\%}$
test_async 0.1691s 92.9929ms 10.7535 Ops/s 11.3000 Ops/s $\color{#d91a1a}-4.84\%$
test_simple 0.7580s 0.6880s 1.4535 Ops/s 1.3949 Ops/s $\color{#35bf28}+4.20\%$
test_transformed 1.9819s 1.9045s 0.5251 Ops/s 0.5285 Ops/s $\color{#d91a1a}-0.64\%$
test_serial 2.1587s 2.1303s 0.4694 Ops/s 0.4546 Ops/s $\color{#35bf28}+3.26\%$
test_parallel 1.9122s 1.7912s 0.5583 Ops/s 0.5481 Ops/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-True-True-True] 4.4434ms 55.7029μs 17.9524 KOps/s 17.9924 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-True-True-True-False] 5.6612ms 31.6486μs 31.5970 KOps/s 31.1724 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-True-True-False-True] 5.8664ms 37.2757μs 26.8271 KOps/s 25.7469 KOps/s $\color{#35bf28}+4.20\%$
test_step_mdp_speed[True-True-True-False-False] 2.7161ms 21.7901μs 45.8924 KOps/s 46.9270 KOps/s $\color{#d91a1a}-2.20\%$
test_step_mdp_speed[True-True-False-True-True] 1.8516ms 56.7031μs 17.6357 KOps/s 17.0507 KOps/s $\color{#35bf28}+3.43\%$
test_step_mdp_speed[True-True-False-True-False] 5.0931ms 33.8472μs 29.5445 KOps/s 29.2349 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-False-False-True] 0.4845ms 40.1233μs 24.9232 KOps/s 24.9629 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[True-True-False-False-False] 0.2525ms 22.4385μs 44.5663 KOps/s 42.8728 KOps/s $\color{#35bf28}+3.95\%$
test_step_mdp_speed[True-False-True-True-True] 2.3308ms 53.6124μs 18.6524 KOps/s 17.1587 KOps/s $\textbf{\color{#35bf28}+8.71\%}$
test_step_mdp_speed[True-False-True-True-False] 86.1010μs 32.8285μs 30.4614 KOps/s 29.0816 KOps/s $\color{#35bf28}+4.74\%$
test_step_mdp_speed[True-False-True-False-True] 0.1276ms 36.7159μs 27.2361 KOps/s 26.0517 KOps/s $\color{#35bf28}+4.55\%$
test_step_mdp_speed[True-False-True-False-False] 93.1020μs 22.1981μs 45.0488 KOps/s 43.4095 KOps/s $\color{#35bf28}+3.78\%$
test_step_mdp_speed[True-False-False-True-True] 0.1140ms 57.8957μs 17.2724 KOps/s 15.7928 KOps/s $\textbf{\color{#35bf28}+9.37\%}$
test_step_mdp_speed[True-False-False-True-False] 79.8010μs 35.5476μs 28.1313 KOps/s 26.1017 KOps/s $\textbf{\color{#35bf28}+7.78\%}$
test_step_mdp_speed[True-False-False-False-True] 86.7010μs 39.4724μs 25.3342 KOps/s 23.6531 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_step_mdp_speed[True-False-False-False-False] 55.0000μs 24.3875μs 41.0046 KOps/s 37.4748 KOps/s $\textbf{\color{#35bf28}+9.42\%}$
test_step_mdp_speed[False-True-True-True-True] 0.1330ms 55.7093μs 17.9503 KOps/s 16.2877 KOps/s $\textbf{\color{#35bf28}+10.21\%}$
test_step_mdp_speed[False-True-True-True-False] 0.1021ms 33.7844μs 29.5995 KOps/s 27.2411 KOps/s $\textbf{\color{#35bf28}+8.66\%}$
test_step_mdp_speed[False-True-True-False-True] 0.1075ms 45.6167μs 21.9218 KOps/s 20.5252 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_step_mdp_speed[False-True-True-False-False] 57.5010μs 25.4963μs 39.2214 KOps/s 37.3505 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_step_mdp_speed[False-True-False-True-True] 0.1156ms 57.6891μs 17.3343 KOps/s 15.9812 KOps/s $\textbf{\color{#35bf28}+8.47\%}$
test_step_mdp_speed[False-True-False-True-False] 72.2020μs 35.8726μs 27.8765 KOps/s 25.5467 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_step_mdp_speed[False-True-False-False-True] 99.5010μs 47.1273μs 21.2191 KOps/s 19.6296 KOps/s $\textbf{\color{#35bf28}+8.10\%}$
test_step_mdp_speed[False-True-False-False-False] 0.4194ms 27.5198μs 36.3375 KOps/s 33.9728 KOps/s $\textbf{\color{#35bf28}+6.96\%}$
test_step_mdp_speed[False-False-True-True-True] 0.1738ms 60.2809μs 16.5890 KOps/s 15.2697 KOps/s $\textbf{\color{#35bf28}+8.64\%}$
test_step_mdp_speed[False-False-True-True-False] 88.8020μs 38.2615μs 26.1360 KOps/s 24.4306 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_step_mdp_speed[False-False-True-False-True] 0.1072ms 46.7716μs 21.3805 KOps/s 20.0850 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_step_mdp_speed[False-False-True-False-False] 0.1034ms 27.3023μs 36.6270 KOps/s 34.8872 KOps/s $\color{#35bf28}+4.99\%$
test_step_mdp_speed[False-False-False-True-True] 0.4811ms 61.3224μs 16.3073 KOps/s 15.0569 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_step_mdp_speed[False-False-False-True-False] 0.1207ms 40.2852μs 24.8230 KOps/s 23.1117 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_step_mdp_speed[False-False-False-False-True] 87.8020μs 47.0619μs 21.2486 KOps/s 19.4897 KOps/s $\textbf{\color{#35bf28}+9.02\%}$
test_step_mdp_speed[False-False-False-False-False] 69.0010μs 28.6427μs 34.9129 KOps/s 31.6615 KOps/s $\textbf{\color{#35bf28}+10.27\%}$
test_values[generalized_advantage_estimate-True-True] 17.1591ms 15.6134ms 64.0477 Ops/s 65.4707 Ops/s $\color{#d91a1a}-2.17\%$
test_values[vec_generalized_advantage_estimate-True-True] 54.1021ms 48.0776ms 20.7997 Ops/s 19.9874 Ops/s $\color{#35bf28}+4.06\%$
test_values[td0_return_estimate-False-False] 0.5348ms 0.2683ms 3.7273 KOps/s 3.9547 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_values[td1_return_estimate-False-False] 28.2293ms 17.9419ms 55.7353 Ops/s 66.5124 Ops/s $\textbf{\color{#d91a1a}-16.20\%}$
test_values[vec_td1_return_estimate-False-False] 55.9740ms 49.9254ms 20.0299 Ops/s 20.1732 Ops/s $\color{#d91a1a}-0.71\%$
test_values[td_lambda_return_estimate-True-False] 57.1347ms 39.2181ms 25.4984 Ops/s 27.4650 Ops/s $\textbf{\color{#d91a1a}-7.16\%}$
test_values[vec_td_lambda_return_estimate-True-False] 55.4242ms 47.5653ms 21.0237 Ops/s 21.1204 Ops/s $\color{#d91a1a}-0.46\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 13.8923ms 13.4951ms 74.1011 Ops/s 75.2222 Ops/s $\color{#d91a1a}-1.49\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 11.8691ms 4.1182ms 242.8246 Ops/s 247.5457 Ops/s $\color{#d91a1a}-1.91\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1.0107ms 0.5319ms 1.8799 KOps/s 1.8635 KOps/s $\color{#35bf28}+0.88\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 67.2663ms 59.5128ms 16.8031 Ops/s 15.6373 Ops/s $\textbf{\color{#35bf28}+7.46\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 10.2972ms 3.2775ms 305.1109 Ops/s 302.4077 Ops/s $\color{#35bf28}+0.89\%$
test_dqn_speed 7.7693ms 2.1017ms 475.8026 Ops/s 459.3154 Ops/s $\color{#35bf28}+3.59\%$
test_ddpg_speed 10.8176ms 3.1370ms 318.7800 Ops/s 303.8929 Ops/s $\color{#35bf28}+4.90\%$
test_sac_speed 17.6591ms 9.4098ms 106.2716 Ops/s 102.4623 Ops/s $\color{#35bf28}+3.72\%$
test_redq_speed 24.7943ms 17.6182ms 56.7594 Ops/s 52.9670 Ops/s $\textbf{\color{#35bf28}+7.16\%}$
test_redq_deprec_speed 25.7918ms 14.8359ms 67.4041 Ops/s 67.4533 Ops/s $\color{#d91a1a}-0.07\%$
test_td3_speed 14.8726ms 11.5413ms 86.6457 Ops/s 82.4292 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_cql_speed 37.3957ms 30.2593ms 33.0477 Ops/s 27.9962 Ops/s $\textbf{\color{#35bf28}+18.04\%}$
test_a2c_speed 12.9621ms 5.8443ms 171.1055 Ops/s 157.3071 Ops/s $\textbf{\color{#35bf28}+8.77\%}$
test_ppo_speed 52.1499ms 6.6604ms 150.1405 Ops/s 138.9223 Ops/s $\textbf{\color{#35bf28}+8.08\%}$
test_reinforce_speed 10.1891ms 4.5586ms 219.3678 Ops/s 193.6780 Ops/s $\textbf{\color{#35bf28}+13.26\%}$
test_iql_speed 30.4570ms 23.8415ms 41.9437 Ops/s 37.6562 Ops/s $\textbf{\color{#35bf28}+11.39\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.1427ms 3.0059ms 332.6745 Ops/s 327.3033 Ops/s $\color{#35bf28}+1.64\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 5.3442ms 3.1919ms 313.2922 Ops/s 312.0670 Ops/s $\color{#35bf28}+0.39\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 5.8876ms 3.2056ms 311.9553 Ops/s 307.0196 Ops/s $\color{#35bf28}+1.61\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.6546ms 3.0113ms 332.0802 Ops/s 266.8115 Ops/s $\textbf{\color{#35bf28}+24.46\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 5.3157ms 3.2145ms 311.0923 Ops/s 308.0937 Ops/s $\color{#35bf28}+0.97\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.1873s 3.7193ms 268.8669 Ops/s 306.2822 Ops/s $\textbf{\color{#d91a1a}-12.22\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9944ms 3.0221ms 330.8998 Ops/s 341.3490 Ops/s $\color{#d91a1a}-3.06\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 5.7867ms 3.1228ms 320.2229 Ops/s 317.8732 Ops/s $\color{#35bf28}+0.74\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 6.5091ms 3.2183ms 310.7208 Ops/s 309.6122 Ops/s $\color{#35bf28}+0.36\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.3055ms 3.1312ms 319.3685 Ops/s 322.5116 Ops/s $\color{#d91a1a}-0.97\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 7.1730ms 3.3437ms 299.0730 Ops/s 307.7729 Ops/s $\color{#d91a1a}-2.83\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 6.1064ms 3.3799ms 295.8667 Ops/s 306.8941 Ops/s $\color{#d91a1a}-3.59\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3425ms 3.1871ms 313.7695 Ops/s 332.2224 Ops/s $\textbf{\color{#d91a1a}-5.55\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 5.9926ms 3.4568ms 289.2834 Ops/s 301.0128 Ops/s $\color{#d91a1a}-3.90\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 6.9477ms 3.5537ms 281.3948 Ops/s 300.5863 Ops/s $\textbf{\color{#d91a1a}-6.38\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.1597ms 3.1344ms 319.0360 Ops/s 332.9123 Ops/s $\color{#d91a1a}-4.17\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 6.2296ms 3.2633ms 306.4409 Ops/s 308.2488 Ops/s $\color{#d91a1a}-0.59\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 7.0389ms 3.3730ms 296.4743 Ops/s 297.6329 Ops/s $\color{#d91a1a}-0.39\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3154s 33.4142ms 29.9274 Ops/s 30.6981 Ops/s $\color{#d91a1a}-2.51\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1602s 32.6436ms 30.6339 Ops/s 29.9558 Ops/s $\color{#35bf28}+2.26\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1648s 30.5007ms 32.7861 Ops/s 32.6256 Ops/s $\color{#35bf28}+0.49\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1579s 32.9126ms 30.3835 Ops/s 30.3961 Ops/s $\color{#d91a1a}-0.04\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1621s 30.3391ms 32.9607 Ops/s 30.1277 Ops/s $\textbf{\color{#35bf28}+9.40\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1632s 33.6573ms 29.7113 Ops/s 32.7989 Ops/s $\textbf{\color{#d91a1a}-9.41\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1602s 30.3708ms 32.9263 Ops/s 30.3517 Ops/s $\textbf{\color{#35bf28}+8.48\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1616s 32.6958ms 30.5850 Ops/s 33.1721 Ops/s $\textbf{\color{#d91a1a}-7.80\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1566s 29.5711ms 33.8168 Ops/s 33.6890 Ops/s $\color{#35bf28}+0.38\%$

This reverts commit e8e410e.
@matteobettini matteobettini marked this pull request as ready for review August 18, 2023 14:17
Signed-off-by: Matteo Bettini <[email protected]>
@vmoens vmoens added the enhancement New feature or request label Aug 30, 2023
# Conflicts:
#	torchrl/data/utils.py
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this PR, I think we should rename the "_action_spec" in "full_action_spec" etc.
For the rest I just have some minor comments, nice work!

torchrl/envs/common.py Outdated Show resolved Hide resolved
torchrl/envs/common.py Outdated Show resolved Hide resolved
torchrl/envs/common.py Outdated Show resolved Hide resolved
torchrl/envs/common.py Outdated Show resolved Hide resolved
torchrl/envs/common.py Outdated Show resolved Hide resolved
torchrl/envs/vec_env.py Outdated Show resolved Hide resolved
torchrl/envs/utils.py Outdated Show resolved Hide resolved
torchrl/envs/utils.py Outdated Show resolved Hide resolved
torchrl/envs/utils.py Outdated Show resolved Hide resolved
torchrl/envs/common.py Outdated Show resolved Hide resolved
matteobettini and others added 12 commits August 30, 2023 13:21
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for this!

@vmoens vmoens merged commit f8777a6 into pytorch:main Aug 30, 2023
@matteobettini matteobettini deleted the allow-all-specs-compsite branch August 30, 2023 14:32
osalpekar pushed a commit to osalpekar/rl that referenced this pull request Aug 30, 2023
…`vec_env` and `collectors` (pytorch#1462)

Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: vmoens <[email protected]>
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
…`vec_env` and `collectors` (pytorch#1462)

Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: vmoens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants