Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs #1900

Merged
merged 4 commits into from
Feb 12, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 12, 2024

Solves #1864

Copy link

pytorch-bot bot commented Feb 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1900

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 1559a4a with merge base 2cfd9b6 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2024
@vmoens vmoens added bug Something isn't working Suitable for minor Suitable to be integrated in minor release (no new feature) labels Feb 12, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1267s 65.8861ms 15.1777 Ops/s 15.9692 Ops/s $\color{#d91a1a}-4.96\%$
test_sync 43.1026ms 36.8718ms 27.1210 Ops/s 28.9777 Ops/s $\textbf{\color{#d91a1a}-6.41\%}$
test_async 0.1258s 32.8466ms 30.4446 Ops/s 30.8152 Ops/s $\color{#d91a1a}-1.20\%$
test_simple 0.4909s 0.4342s 2.3029 Ops/s 2.2624 Ops/s $\color{#35bf28}+1.79\%$
test_transformed 0.6516s 0.6001s 1.6664 Ops/s 1.6514 Ops/s $\color{#35bf28}+0.91\%$
test_serial 1.4770s 1.4192s 0.7046 Ops/s 0.6873 Ops/s $\color{#35bf28}+2.52\%$
test_parallel 1.4347s 1.3941s 0.7173 Ops/s 0.7114 Ops/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[True-True-True-True-True] 0.1500ms 21.1815μs 47.2111 KOps/s 47.0843 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[True-True-True-True-False] 49.8040μs 12.9673μs 77.1171 KOps/s 78.1265 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-True-False-True] 43.8730μs 12.6008μs 79.3601 KOps/s 81.2702 KOps/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[True-True-True-False-False] 25.7990μs 7.6820μs 130.1736 KOps/s 134.0338 KOps/s $\color{#d91a1a}-2.88\%$
test_step_mdp_speed[True-True-False-True-True] 56.8870μs 22.9519μs 43.5694 KOps/s 44.2030 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-True-False-True-False] 41.1370μs 14.4825μs 69.0490 KOps/s 70.3588 KOps/s $\color{#d91a1a}-1.86\%$
test_step_mdp_speed[True-True-False-False-True] 48.8820μs 13.8717μs 72.0891 KOps/s 73.3088 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-True-False-False-False] 29.8660μs 8.9735μs 111.4392 KOps/s 114.5137 KOps/s $\color{#d91a1a}-2.68\%$
test_step_mdp_speed[True-False-True-True-True] 58.9610μs 24.2140μs 41.2984 KOps/s 41.6402 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-True-True-False] 38.7630μs 15.7077μs 63.6629 KOps/s 64.6249 KOps/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[True-False-True-False-True] 48.8930μs 13.8524μs 72.1895 KOps/s 72.6906 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-False-True-False-False] 37.4800μs 8.9312μs 111.9676 KOps/s 115.2192 KOps/s $\color{#d91a1a}-2.82\%$
test_step_mdp_speed[True-False-False-True-True] 80.6620μs 25.3879μs 39.3888 KOps/s 40.0143 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[True-False-False-True-False] 49.7030μs 16.8761μs 59.2554 KOps/s 59.9379 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[True-False-False-False-True] 38.8430μs 15.0769μs 66.3265 KOps/s 68.1455 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[True-False-False-False-False] 43.4620μs 10.1902μs 98.1339 KOps/s 101.0862 KOps/s $\color{#d91a1a}-2.92\%$
test_step_mdp_speed[False-True-True-True-True] 61.9060μs 24.3199μs 41.1186 KOps/s 41.5783 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[False-True-True-True-False] 79.2200μs 15.5941μs 64.1269 KOps/s 64.1825 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[False-True-True-False-True] 62.1560μs 16.0687μs 62.2329 KOps/s 63.0236 KOps/s $\color{#d91a1a}-1.25\%$
test_step_mdp_speed[False-True-True-False-False] 37.6010μs 10.0704μs 99.3010 KOps/s 101.6154 KOps/s $\color{#d91a1a}-2.28\%$
test_step_mdp_speed[False-True-False-True-True] 36.9790μs 25.5903μs 39.0773 KOps/s 39.1713 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-False-True-False] 45.9460μs 16.9476μs 59.0054 KOps/s 60.2963 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[False-True-False-False-True] 48.2010μs 17.1562μs 58.2881 KOps/s 58.6237 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[False-True-False-False-False] 40.8370μs 11.4388μs 87.4214 KOps/s 89.6844 KOps/s $\color{#d91a1a}-2.52\%$
test_step_mdp_speed[False-False-True-True-True] 74.7610μs 26.6773μs 37.4851 KOps/s 37.8770 KOps/s $\color{#d91a1a}-1.03\%$
test_step_mdp_speed[False-False-True-True-False] 51.4170μs 18.2120μs 54.9089 KOps/s 55.2499 KOps/s $\color{#d91a1a}-0.62\%$
test_step_mdp_speed[False-False-True-False-True] 50.6350μs 17.2073μs 58.1149 KOps/s 58.7798 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[False-False-True-False-False] 34.9760μs 11.3928μs 87.7748 KOps/s 89.6148 KOps/s $\color{#d91a1a}-2.05\%$
test_step_mdp_speed[False-False-False-True-True] 60.4640μs 27.7853μs 35.9902 KOps/s 36.5609 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[False-False-False-True-False] 66.9160μs 19.1541μs 52.2081 KOps/s 52.6001 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-False-False-False-True] 54.6130μs 18.2882μs 54.6800 KOps/s 55.1378 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[False-False-False-False-False] 45.6460μs 12.4471μs 80.3401 KOps/s 81.4798 KOps/s $\color{#d91a1a}-1.40\%$
test_values[generalized_advantage_estimate-True-True] 9.6519ms 9.3085ms 107.4292 Ops/s 107.9476 Ops/s $\color{#d91a1a}-0.48\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.5375ms 35.5368ms 28.1398 Ops/s 28.7249 Ops/s $\color{#d91a1a}-2.04\%$
test_values[td0_return_estimate-False-False] 0.2247ms 0.1671ms 5.9831 KOps/s 6.0756 KOps/s $\color{#d91a1a}-1.52\%$
test_values[td1_return_estimate-False-False] 25.2179ms 23.7402ms 42.1227 Ops/s 43.2055 Ops/s $\color{#d91a1a}-2.51\%$
test_values[vec_td1_return_estimate-False-False] 43.0884ms 35.5863ms 28.1007 Ops/s 28.5759 Ops/s $\color{#d91a1a}-1.66\%$
test_values[td_lambda_return_estimate-True-False] 36.7193ms 34.2264ms 29.2172 Ops/s 30.1231 Ops/s $\color{#d91a1a}-3.01\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.2584ms 35.3059ms 28.3239 Ops/s 28.5494 Ops/s $\color{#d91a1a}-0.79\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.7283ms 8.2640ms 121.0064 Ops/s 124.0367 Ops/s $\color{#d91a1a}-2.44\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4808ms 1.8627ms 536.8467 Ops/s 515.0457 Ops/s $\color{#35bf28}+4.23\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5465ms 0.3556ms 2.8125 KOps/s 2.9018 KOps/s $\color{#d91a1a}-3.08\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.1663ms 44.7073ms 22.3677 Ops/s 22.2031 Ops/s $\color{#35bf28}+0.74\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.5502ms 3.0134ms 331.8541 Ops/s 321.3685 Ops/s $\color{#35bf28}+3.26\%$
test_dqn_speed 1.7372ms 1.3480ms 741.8569 Ops/s 718.8269 Ops/s $\color{#35bf28}+3.20\%$
test_ddpg_speed 3.1382ms 2.7085ms 369.2146 Ops/s 341.3832 Ops/s $\textbf{\color{#35bf28}+8.15\%}$
test_sac_speed 9.6372ms 8.4809ms 117.9118 Ops/s 116.6978 Ops/s $\color{#35bf28}+1.04\%$
test_redq_speed 13.9240ms 13.1994ms 75.7608 Ops/s 75.1186 Ops/s $\color{#35bf28}+0.85\%$
test_redq_deprec_speed 13.9564ms 13.2685ms 75.3667 Ops/s 73.9167 Ops/s $\color{#35bf28}+1.96\%$
test_td3_speed 8.9596ms 8.5433ms 117.0505 Ops/s 114.3512 Ops/s $\color{#35bf28}+2.36\%$
test_cql_speed 46.3926ms 38.1628ms 26.2035 Ops/s 26.9203 Ops/s $\color{#d91a1a}-2.66\%$
test_a2c_speed 8.6010ms 7.3056ms 136.8816 Ops/s 134.1041 Ops/s $\color{#35bf28}+2.07\%$
test_ppo_speed 8.2262ms 7.4846ms 133.6075 Ops/s 128.5914 Ops/s $\color{#35bf28}+3.90\%$
test_reinforce_speed 7.3372ms 6.5273ms 153.2023 Ops/s 152.9189 Ops/s $\color{#35bf28}+0.19\%$
test_iql_speed 33.6705ms 32.5347ms 30.7365 Ops/s 28.0062 Ops/s $\textbf{\color{#35bf28}+9.75\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.2234ms 2.6621ms 375.6383 Ops/s 372.0736 Ops/s $\color{#35bf28}+0.96\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8615ms 0.5230ms 1.9119 KOps/s 1.9496 KOps/s $\color{#d91a1a}-1.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6571ms 0.4917ms 2.0338 KOps/s 2.0702 KOps/s $\color{#d91a1a}-1.76\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.1372ms 2.9872ms 334.7652 Ops/s 381.9513 Ops/s $\textbf{\color{#d91a1a}-12.35\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7425ms 0.5155ms 1.9400 KOps/s 1.9894 KOps/s $\color{#d91a1a}-2.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6920ms 0.4862ms 2.0567 KOps/s 2.0856 KOps/s $\color{#d91a1a}-1.39\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1013ms 2.6984ms 370.5904 Ops/s 360.6615 Ops/s $\color{#35bf28}+2.75\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7923ms 0.6244ms 1.6016 KOps/s 1.5886 KOps/s $\color{#35bf28}+0.82\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8378ms 0.5935ms 1.6848 KOps/s 1.6712 KOps/s $\color{#35bf28}+0.81\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.8033ms 2.5547ms 391.4325 Ops/s 388.0350 Ops/s $\color{#35bf28}+0.88\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5782ms 0.5062ms 1.9755 KOps/s 1.9531 KOps/s $\color{#35bf28}+1.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7156ms 0.4848ms 2.0628 KOps/s 2.0532 KOps/s $\color{#35bf28}+0.46\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.0301ms 2.5946ms 385.4140 Ops/s 376.6786 Ops/s $\color{#35bf28}+2.32\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6298ms 0.5026ms 1.9898 KOps/s 1.9179 KOps/s $\color{#35bf28}+3.75\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6798ms 0.4799ms 2.0836 KOps/s 2.0565 KOps/s $\color{#35bf28}+1.32\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.8980ms 2.6734ms 374.0577 Ops/s 365.9223 Ops/s $\color{#35bf28}+2.22\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7491ms 0.6234ms 1.6041 KOps/s 1.5889 KOps/s $\color{#35bf28}+0.96\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7949ms 0.5967ms 1.6760 KOps/s 1.6721 KOps/s $\color{#35bf28}+0.23\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1066s 8.0493ms 124.2339 Ops/s 120.0736 Ops/s $\color{#35bf28}+3.46\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 16.2375ms 13.8180ms 72.3691 Ops/s 73.8259 Ops/s $\color{#d91a1a}-1.97\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.9901ms 2.5047ms 399.2565 Ops/s 387.0952 Ops/s $\color{#35bf28}+3.14\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 95.6934ms 9.5064ms 105.1918 Ops/s 104.1537 Ops/s $\color{#35bf28}+1.00\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.6146ms 13.7291ms 72.8379 Ops/s 73.4095 Ops/s $\color{#d91a1a}-0.78\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.0938ms 2.5207ms 396.7129 Ops/s 392.1264 Ops/s $\color{#35bf28}+1.17\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 95.5445ms 7.9769ms 125.3614 Ops/s 123.0962 Ops/s $\color{#35bf28}+1.84\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1054s 15.9291ms 62.7782 Ops/s 73.2862 Ops/s $\textbf{\color{#d91a1a}-14.34\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.9360ms 2.7769ms 360.1174 Ops/s 355.2600 Ops/s $\color{#35bf28}+1.37\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1200s 0.1180s 8.4729 Ops/s 8.5900 Ops/s $\color{#d91a1a}-1.36\%$
test_sync 0.1716s 0.1027s 9.7349 Ops/s 9.6668 Ops/s $\color{#35bf28}+0.70\%$
test_async 0.2538s 92.1036ms 10.8573 Ops/s 10.8800 Ops/s $\color{#d91a1a}-0.21\%$
test_single_pixels 0.1292s 0.1286s 7.7770 Ops/s 7.7570 Ops/s $\color{#35bf28}+0.26\%$
test_sync_pixels 83.1942ms 82.1334ms 12.1753 Ops/s 12.4544 Ops/s $\color{#d91a1a}-2.24\%$
test_async_pixels 0.2097s 74.6682ms 13.3926 Ops/s 13.1212 Ops/s $\color{#35bf28}+2.07\%$
test_simple 0.9009s 0.8316s 1.2025 Ops/s 1.1983 Ops/s $\color{#35bf28}+0.35\%$
test_transformed 1.1499s 1.0853s 0.9214 Ops/s 0.9200 Ops/s $\color{#35bf28}+0.15\%$
test_serial 2.5165s 2.4506s 0.4081 Ops/s 0.4162 Ops/s $\color{#d91a1a}-1.94\%$
test_parallel 2.2208s 2.1092s 0.4741 Ops/s 0.4767 Ops/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[True-True-True-True-True] 94.1310μs 33.8393μs 29.5514 KOps/s 29.1981 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-True-True-False] 41.0910μs 19.4027μs 51.5392 KOps/s 50.8090 KOps/s $\color{#35bf28}+1.44\%$
test_step_mdp_speed[True-True-True-False-True] 41.0010μs 18.3157μs 54.5981 KOps/s 51.3586 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_step_mdp_speed[True-True-True-False-False] 37.0800μs 11.3746μs 87.9149 KOps/s 89.4027 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-True-False-True-True] 72.8220μs 34.3773μs 29.0890 KOps/s 27.5003 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_step_mdp_speed[True-True-False-True-False] 44.5410μs 21.0425μs 47.5228 KOps/s 46.3610 KOps/s $\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-True-False-False-True] 42.3510μs 20.5252μs 48.7207 KOps/s 46.4721 KOps/s $\color{#35bf28}+4.84\%$
test_step_mdp_speed[True-True-False-False-False] 34.1310μs 12.8733μs 77.6802 KOps/s 76.1172 KOps/s $\color{#35bf28}+2.05\%$
test_step_mdp_speed[True-False-True-True-True] 60.9110μs 36.3034μs 27.5457 KOps/s 26.2091 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_step_mdp_speed[True-False-True-True-False] 44.6210μs 23.2812μs 42.9532 KOps/s 42.2991 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[True-False-True-False-True] 45.3800μs 20.2750μs 49.3218 KOps/s 46.8076 KOps/s $\textbf{\color{#35bf28}+5.37\%}$
test_step_mdp_speed[True-False-True-False-False] 28.1000μs 12.8295μs 77.9452 KOps/s 76.4997 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[True-False-False-True-True] 65.9510μs 38.4843μs 25.9846 KOps/s 25.2494 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[True-False-False-True-False] 51.0300μs 25.4634μs 39.2721 KOps/s 39.6692 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-False-False-False-True] 47.6110μs 21.9741μs 45.5081 KOps/s 43.4484 KOps/s $\color{#35bf28}+4.74\%$
test_step_mdp_speed[True-False-False-False-False] 31.9800μs 14.7140μs 67.9623 KOps/s 67.4553 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[False-True-True-True-True] 63.2910μs 36.8090μs 27.1673 KOps/s 26.1607 KOps/s $\color{#35bf28}+3.85\%$
test_step_mdp_speed[False-True-True-True-False] 46.8310μs 23.2245μs 43.0580 KOps/s 42.5932 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[False-True-True-False-True] 47.6210μs 24.1617μs 41.3878 KOps/s 39.3106 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_step_mdp_speed[False-True-True-False-False] 44.7010μs 15.0433μs 66.4746 KOps/s 66.8526 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[False-True-False-True-True] 64.2710μs 39.0304μs 25.6210 KOps/s 25.1593 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[False-True-False-True-False] 49.1500μs 25.2186μs 39.6533 KOps/s 39.0369 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-True-False-False-True] 48.2100μs 26.2317μs 38.1218 KOps/s 37.1055 KOps/s $\color{#35bf28}+2.74\%$
test_step_mdp_speed[False-True-False-False-False] 33.2510μs 16.4987μs 60.6110 KOps/s 59.4861 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-False-True-True-True] 70.7820μs 39.2094μs 25.5041 KOps/s 23.8771 KOps/s $\textbf{\color{#35bf28}+6.81\%}$
test_step_mdp_speed[False-False-True-True-False] 52.6210μs 26.9621μs 37.0891 KOps/s 36.1747 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[False-False-True-False-True] 50.7410μs 26.5624μs 37.6472 KOps/s 37.1018 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-False-True-False-False] 34.2300μs 16.4488μs 60.7947 KOps/s 59.2652 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-False-True-True] 71.1120μs 41.1418μs 24.3062 KOps/s 23.3541 KOps/s $\color{#35bf28}+4.08\%$
test_step_mdp_speed[False-False-False-True-False] 52.5110μs 28.2272μs 35.4268 KOps/s 34.2224 KOps/s $\color{#35bf28}+3.52\%$
test_step_mdp_speed[False-False-False-False-True] 52.2810μs 27.4779μs 36.3929 KOps/s 34.7301 KOps/s $\color{#35bf28}+4.79\%$
test_step_mdp_speed[False-False-False-False-False] 41.9600μs 18.4034μs 54.3377 KOps/s 54.5957 KOps/s $\color{#d91a1a}-0.47\%$
test_values[generalized_advantage_estimate-True-True] 24.4717ms 24.0743ms 41.5380 Ops/s 41.3323 Ops/s $\color{#35bf28}+0.50\%$
test_values[vec_generalized_advantage_estimate-True-True] 89.3270ms 3.3437ms 299.0705 Ops/s 308.4948 Ops/s $\color{#d91a1a}-3.05\%$
test_values[td0_return_estimate-False-False] 96.6320μs 61.8930μs 16.1569 KOps/s 16.2497 KOps/s $\color{#d91a1a}-0.57\%$
test_values[td1_return_estimate-False-False] 52.3478ms 51.9696ms 19.2420 Ops/s 19.2380 Ops/s $\color{#35bf28}+0.02\%$
test_values[vec_td1_return_estimate-False-False] 2.0499ms 1.7571ms 569.1163 Ops/s 567.1574 Ops/s $\color{#35bf28}+0.35\%$
test_values[td_lambda_return_estimate-True-False] 89.7979ms 84.9197ms 11.7758 Ops/s 12.0246 Ops/s $\color{#d91a1a}-2.07\%$
test_values[vec_td_lambda_return_estimate-True-False] 3.9999ms 1.7875ms 559.4559 Ops/s 557.2813 Ops/s $\color{#35bf28}+0.39\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.8198ms 22.5223ms 44.4004 Ops/s 42.8526 Ops/s $\color{#35bf28}+3.61\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8727ms 0.7003ms 1.4280 KOps/s 1.4221 KOps/s $\color{#35bf28}+0.42\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8252ms 0.6784ms 1.4741 KOps/s 1.5399 KOps/s $\color{#d91a1a}-4.28\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5105ms 1.4554ms 687.1107 Ops/s 688.8733 Ops/s $\color{#d91a1a}-0.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9969ms 0.6708ms 1.4908 KOps/s 1.5009 KOps/s $\color{#d91a1a}-0.68\%$
test_dqn_speed 9.1347ms 1.4848ms 673.4849 Ops/s 698.7177 Ops/s $\color{#d91a1a}-3.61\%$
test_ddpg_speed 3.0208ms 2.8082ms 356.0972 Ops/s 361.2150 Ops/s $\color{#d91a1a}-1.42\%$
test_sac_speed 9.1352ms 8.7311ms 114.5325 Ops/s 116.9757 Ops/s $\color{#d91a1a}-2.09\%$
test_redq_speed 11.8078ms 10.7496ms 93.0269 Ops/s 92.7844 Ops/s $\color{#35bf28}+0.26\%$
test_redq_deprec_speed 12.8133ms 11.8393ms 84.4647 Ops/s 84.7595 Ops/s $\color{#d91a1a}-0.35\%$
test_td3_speed 18.0304ms 9.0410ms 110.6069 Ops/s 115.2551 Ops/s $\color{#d91a1a}-4.03\%$
test_cql_speed 27.8388ms 26.4752ms 37.7713 Ops/s 38.6896 Ops/s $\color{#d91a1a}-2.37\%$
test_a2c_speed 6.1839ms 5.5756ms 179.3535 Ops/s 182.7693 Ops/s $\color{#d91a1a}-1.87\%$
test_ppo_speed 6.7574ms 5.8226ms 171.7451 Ops/s 174.9850 Ops/s $\color{#d91a1a}-1.85\%$
test_reinforce_speed 7.1700ms 4.5969ms 217.5360 Ops/s 223.4755 Ops/s $\color{#d91a1a}-2.66\%$
test_iql_speed 20.8756ms 20.0809ms 49.7985 Ops/s 50.0542 Ops/s $\color{#d91a1a}-0.51\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6643ms 3.5532ms 281.4327 Ops/s 279.8515 Ops/s $\color{#35bf28}+0.57\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8004ms 0.5568ms 1.7961 KOps/s 1.7948 KOps/s $\color{#35bf28}+0.07\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7283ms 0.5310ms 1.8832 KOps/s 1.8924 KOps/s $\color{#d91a1a}-0.48\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.7961ms 3.5808ms 279.2666 Ops/s 279.4604 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6996ms 0.5550ms 1.8018 KOps/s 1.8195 KOps/s $\color{#d91a1a}-0.97\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8196ms 0.5279ms 1.8944 KOps/s 1.9084 KOps/s $\color{#d91a1a}-0.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.8000ms 3.6944ms 270.6813 Ops/s 268.4456 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9228ms 0.6887ms 1.4520 KOps/s 1.4476 KOps/s $\color{#35bf28}+0.30\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8459ms 0.6571ms 1.5219 KOps/s 1.5143 KOps/s $\color{#35bf28}+0.50\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6299ms 3.5582ms 281.0391 Ops/s 280.0720 Ops/s $\color{#35bf28}+0.35\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8221ms 0.5672ms 1.7631 KOps/s 1.7657 KOps/s $\color{#d91a1a}-0.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6792ms 0.5387ms 1.8562 KOps/s 1.8490 KOps/s $\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.8989ms 3.6000ms 277.7763 Ops/s 278.2963 Ops/s $\color{#d91a1a}-0.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7174ms 0.5570ms 1.7953 KOps/s 1.7906 KOps/s $\color{#35bf28}+0.26\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7508ms 0.5308ms 1.8838 KOps/s 1.8861 KOps/s $\color{#d91a1a}-0.12\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9206ms 3.7097ms 269.5642 Ops/s 269.4782 Ops/s $\color{#35bf28}+0.03\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9996ms 0.7028ms 1.4229 KOps/s 1.4412 KOps/s $\color{#d91a1a}-1.27\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7836ms 0.6688ms 1.4951 KOps/s 1.5029 KOps/s $\color{#d91a1a}-0.52\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1234s 10.1280ms 98.7361 Ops/s 96.7850 Ops/s $\color{#35bf28}+2.02\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.8737ms 16.2674ms 61.4728 Ops/s 61.5774 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.0257ms 3.0856ms 324.0894 Ops/s 322.4014 Ops/s $\color{#35bf28}+0.52\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1203s 12.2747ms 81.4685 Ops/s 99.8230 Ops/s $\textbf{\color{#d91a1a}-18.39\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 18.9078ms 16.3713ms 61.0826 Ops/s 54.5184 Ops/s $\textbf{\color{#35bf28}+12.04\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.2189ms 3.1021ms 322.3580 Ops/s 322.2299 Ops/s $\color{#35bf28}+0.04\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1210s 10.2758ms 97.3164 Ops/s 96.7301 Ops/s $\color{#35bf28}+0.61\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 19.0963ms 16.6434ms 60.0837 Ops/s 60.3942 Ops/s $\color{#d91a1a}-0.51\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.1956ms 3.3929ms 294.7353 Ops/s 293.3871 Ops/s $\color{#35bf28}+0.46\%$

@vmoens vmoens merged commit 6f6c896 into main Feb 12, 2024
64 of 67 checks passed
@vmoens vmoens deleted the non-blocking-mps branch February 12, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Suitable for minor Suitable to be integrated in minor release (no new feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants