Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Apr 28, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 28, 2025
ghstack-source-id: 6c687b8
Pull-Request-resolved: #1298
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025
@vmoens vmoens added the CI label Apr 28, 2025
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 28, 2025
ghstack-source-id: 803dfa1
Pull-Request-resolved: #1298
@vmoens vmoens closed this in 89b29d3 Apr 28, 2025
@vmoens vmoens merged commit 5108297 into gh/vmoens/52/base Apr 28, 2025
33 of 36 checks passed
@vmoens vmoens deleted the gh/vmoens/52/head branch April 28, 2025 12:38
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 233. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 25.5010μs 11.4302μs 87.4874 KOps/s 88.7367 KOps/s $\color{#d91a1a}-1.41\%$
test_plain_set_stack_nested 67.9710μs 11.3366μs 88.2100 KOps/s 87.8831 KOps/s $\color{#35bf28}+0.37\%$
test_plain_set_nested_inplace 0.1809ms 12.4913μs 80.0558 KOps/s 80.6094 KOps/s $\color{#d91a1a}-0.69\%$
test_plain_set_stack_nested_inplace 36.7000μs 12.4772μs 80.1464 KOps/s 80.6735 KOps/s $\color{#d91a1a}-0.65\%$
test_items 26.3600μs 2.8956μs 345.3504 KOps/s 345.2775 KOps/s $\color{#35bf28}+0.02\%$
test_items_nested 0.4098ms 0.3656ms 2.7350 KOps/s 2.7556 KOps/s $\color{#d91a1a}-0.75\%$
test_items_nested_locked 0.3945ms 0.3671ms 2.7242 KOps/s 2.7456 KOps/s $\color{#d91a1a}-0.78\%$
test_items_nested_leaf 0.1650ms 60.4106μs 16.5534 KOps/s 16.6412 KOps/s $\color{#d91a1a}-0.53\%$
test_items_stack_nested 0.4254ms 0.3670ms 2.7248 KOps/s 2.7727 KOps/s $\color{#d91a1a}-1.73\%$
test_items_stack_nested_leaf 0.2415ms 60.4753μs 16.5357 KOps/s 16.5407 KOps/s $\color{#d91a1a}-0.03\%$
test_items_stack_nested_locked 0.5705ms 0.3695ms 2.7063 KOps/s 2.7508 KOps/s $\color{#d91a1a}-1.62\%$
test_keys 34.2800μs 3.4292μs 291.6132 KOps/s 287.5858 KOps/s $\color{#35bf28}+1.40\%$
test_keys_nested 0.2249ms 90.1966μs 11.0869 KOps/s 11.3413 KOps/s $\color{#d91a1a}-2.24\%$
test_keys_nested_locked 2.2965ms 95.8628μs 10.4316 KOps/s 10.5419 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_nested_leaf 0.1069ms 81.0668μs 12.3355 KOps/s 12.6168 KOps/s $\color{#d91a1a}-2.23\%$
test_keys_stack_nested 0.1173ms 89.1005μs 11.2233 KOps/s 11.4258 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_stack_nested_leaf 0.1273ms 79.8670μs 12.5208 KOps/s 12.6824 KOps/s $\color{#d91a1a}-1.27\%$
test_keys_stack_nested_locked 0.1534ms 95.2860μs 10.4947 KOps/s 10.6713 KOps/s $\color{#d91a1a}-1.66\%$
test_values 8.9567μs 0.8491μs 1.1777 MOps/s 1.1762 MOps/s $\color{#35bf28}+0.13\%$
test_values_nested 70.6510μs 38.1050μs 26.2432 KOps/s 26.6448 KOps/s $\color{#d91a1a}-1.51\%$
test_values_nested_locked 71.5600μs 39.7750μs 25.1414 KOps/s 25.4199 KOps/s $\color{#d91a1a}-1.10\%$
test_values_nested_leaf 78.9700μs 43.2745μs 23.1083 KOps/s 23.6115 KOps/s $\color{#d91a1a}-2.13\%$
test_values_stack_nested 64.3810μs 37.8684μs 26.4073 KOps/s 26.5407 KOps/s $\color{#d91a1a}-0.50\%$
test_values_stack_nested_leaf 0.1078ms 43.7454μs 22.8595 KOps/s 23.4413 KOps/s $\color{#d91a1a}-2.48\%$
test_values_stack_nested_locked 63.2210μs 40.0399μs 24.9751 KOps/s 25.1740 KOps/s $\color{#d91a1a}-0.79\%$
test_membership 1.5730μs 0.5022μs 1.9911 MOps/s 1.9901 MOps/s $\color{#35bf28}+0.05\%$
test_membership_nested 93.1760μs 1.9774μs 505.7134 KOps/s 494.2586 KOps/s $\color{#35bf28}+2.32\%$
test_membership_nested_leaf 21.3005μs 2.0248μs 493.8684 KOps/s 494.3534 KOps/s $\color{#d91a1a}-0.10\%$
test_membership_stacked_nested 29.8900μs 2.0915μs 478.1342 KOps/s 481.0996 KOps/s $\color{#d91a1a}-0.62\%$
test_membership_stacked_nested_leaf 0.2159ms 2.0779μs 481.2605 KOps/s 484.6193 KOps/s $\color{#d91a1a}-0.69\%$
test_membership_nested_last 34.2700μs 3.0210μs 331.0188 KOps/s 330.9097 KOps/s $\color{#35bf28}+0.03\%$
test_membership_nested_leaf_last 30.8600μs 3.0138μs 331.8099 KOps/s 327.5834 KOps/s $\color{#35bf28}+1.29\%$
test_membership_stacked_nested_last 30.2800μs 3.0657μs 326.1864 KOps/s 326.3533 KOps/s $\color{#d91a1a}-0.05\%$
test_membership_stacked_nested_leaf_last 25.9600μs 3.0345μs 329.5437 KOps/s 331.3074 KOps/s $\color{#d91a1a}-0.53\%$
test_nested_getleaf 0.1595ms 13.1254μs 76.1882 KOps/s 76.6570 KOps/s $\color{#d91a1a}-0.61\%$
test_nested_get 42.2510μs 12.3453μs 81.0027 KOps/s 80.2763 KOps/s $\color{#35bf28}+0.90\%$
test_stacked_getleaf 42.4410μs 13.1101μs 76.2773 KOps/s 76.7350 KOps/s $\color{#d91a1a}-0.60\%$
test_stacked_get 0.1409ms 12.4599μs 80.2575 KOps/s 81.0288 KOps/s $\color{#d91a1a}-0.95\%$
test_nested_getitemleaf 0.1447ms 13.4672μs 74.2545 KOps/s 74.7600 KOps/s $\color{#d91a1a}-0.68\%$
test_nested_getitem 48.8200μs 12.7623μs 78.3558 KOps/s 78.6901 KOps/s $\color{#d91a1a}-0.42\%$
test_stacked_getitemleaf 0.1271ms 13.5788μs 73.6440 KOps/s 74.6821 KOps/s $\color{#d91a1a}-1.39\%$
test_stacked_getitem 43.9500μs 12.7884μs 78.1960 KOps/s 78.8559 KOps/s $\color{#d91a1a}-0.84\%$
test_lock_nested 1.8241ms 0.3583ms 2.7908 KOps/s 2.8260 KOps/s $\color{#d91a1a}-1.25\%$
test_lock_stack_nested 0.4081ms 0.3461ms 2.8895 KOps/s 2.9127 KOps/s $\color{#d91a1a}-0.80\%$
test_unlock_nested 0.5783ms 0.3026ms 3.3048 KOps/s 3.4470 KOps/s $\color{#d91a1a}-4.13\%$
test_unlock_stack_nested 0.3782ms 0.2871ms 3.4832 KOps/s 3.5631 KOps/s $\color{#d91a1a}-2.24\%$
test_flatten_speed 0.1163ms 78.5765μs 12.7265 KOps/s 13.1195 KOps/s $\color{#d91a1a}-3.00\%$
test_unflatten_speed 0.4465ms 0.3970ms 2.5192 KOps/s 2.5276 KOps/s $\color{#d91a1a}-0.33\%$
test_common_ops 0.8732ms 0.6435ms 1.5540 KOps/s 1.5607 KOps/s $\color{#d91a1a}-0.43\%$
test_creation 86.4810μs 1.7645μs 566.7475 KOps/s 566.7638 KOps/s $-0.00\%$
test_creation_empty 0.6388ms 7.1855μs 139.1689 KOps/s 139.6507 KOps/s $\color{#d91a1a}-0.34\%$
test_creation_nested_1 0.1581ms 10.1048μs 98.9628 KOps/s 99.4264 KOps/s $\color{#d91a1a}-0.47\%$
test_creation_nested_2 0.1042ms 12.8893μs 77.5840 KOps/s 77.7521 KOps/s $\color{#d91a1a}-0.22\%$
test_clone 64.9510μs 11.1496μs 89.6897 KOps/s 92.8133 KOps/s $\color{#d91a1a}-3.37\%$
test_getitem[int] 0.1656ms 10.7476μs 93.0440 KOps/s 64.4586 KOps/s $\textbf{\color{#35bf28}+44.35\%}$
test_getitem[slice_int] 0.1441ms 21.9559μs 45.5458 KOps/s 48.3653 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_getitem[range] 0.1457ms 39.7448μs 25.1605 KOps/s 25.6409 KOps/s $\color{#d91a1a}-1.87\%$
test_getitem[tuple] 0.1060ms 18.5605μs 53.8779 KOps/s 56.1830 KOps/s $\color{#d91a1a}-4.10\%$
test_getitem[list] 0.1641ms 34.8040μs 28.7323 KOps/s 29.3532 KOps/s $\color{#d91a1a}-2.12\%$
test_setitem_dim[int] 38.1110μs 20.5036μs 48.7720 KOps/s 50.3410 KOps/s $\color{#d91a1a}-3.12\%$
test_setitem_dim[slice_int] 82.4110μs 42.9760μs 23.2688 KOps/s 25.8833 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_setitem_dim[range] 87.5910μs 57.6660μs 17.3413 KOps/s 18.3130 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_setitem_dim[tuple] 59.6210μs 36.2936μs 27.5530 KOps/s 30.3642 KOps/s $\textbf{\color{#d91a1a}-9.26\%}$
test_setitem 0.3487ms 15.9732μs 62.6049 KOps/s 56.7020 KOps/s $\textbf{\color{#35bf28}+10.41\%}$
test_set 0.2122ms 15.0810μs 66.3084 KOps/s 64.6123 KOps/s $\color{#35bf28}+2.63\%$
test_set_shared 0.5282ms 0.1613ms 6.1987 KOps/s 6.2784 KOps/s $\color{#d91a1a}-1.27\%$
test_update 0.4793ms 18.7586μs 53.3089 KOps/s 54.0909 KOps/s $\color{#d91a1a}-1.45\%$
test_update_nested 0.1683ms 29.5300μs 33.8639 KOps/s 33.6345 KOps/s $\color{#35bf28}+0.68\%$
test_update__nested 85.2810μs 26.1997μs 38.1684 KOps/s 33.6778 KOps/s $\textbf{\color{#35bf28}+13.33\%}$
test_set_nested 0.2057ms 17.2539μs 57.9579 KOps/s 53.0323 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_set_nested_new 0.1255ms 21.0944μs 47.4059 KOps/s 50.2837 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_select 0.1953ms 31.0123μs 32.2452 KOps/s 32.0765 KOps/s $\color{#35bf28}+0.53\%$
test_select_nested 78.6410μs 43.4831μs 22.9975 KOps/s 22.8250 KOps/s $\color{#35bf28}+0.76\%$
test_exclude_nested 93.2810μs 61.9540μs 16.1410 KOps/s 15.8535 KOps/s $\color{#35bf28}+1.81\%$
test_empty[True] 0.4750ms 0.2894ms 3.4549 KOps/s 3.4134 KOps/s $\color{#35bf28}+1.22\%$
test_empty[False] 19.6392μs 0.8232μs 1.2147 MOps/s 1.2162 MOps/s $\color{#d91a1a}-0.12\%$
test_to 88.3210μs 57.9467μs 17.2572 KOps/s 16.5160 KOps/s $\color{#35bf28}+4.49\%$
test_to_nonblocking 0.1978ms 50.7261μs 19.7137 KOps/s 20.1987 KOps/s $\color{#d91a1a}-2.40\%$
test_unbind_speed 0.2935ms 0.2485ms 4.0249 KOps/s 4.1546 KOps/s $\color{#d91a1a}-3.12\%$
test_unbind_speed_stack0 0.4216ms 0.2440ms 4.0985 KOps/s 4.1515 KOps/s $\color{#d91a1a}-1.28\%$
test_unbind_speed_stack1 95.9903ms 0.7520ms 1.3297 KOps/s 1.4609 KOps/s $\textbf{\color{#d91a1a}-8.98\%}$
test_split 96.6101ms 1.6455ms 607.7316 Ops/s 630.0710 Ops/s $\color{#d91a1a}-3.55\%$
test_chunk 98.7773ms 1.6391ms 610.1004 Ops/s 626.7351 Ops/s $\color{#d91a1a}-2.65\%$
test_consolidate[False-None] 96.5682ms 3.1286ms 319.6322 Ops/s 322.9764 Ops/s $\color{#d91a1a}-1.04\%$
test_consolidate[default-None] 1.8566ms 1.7620ms 567.5313 Ops/s 586.8993 Ops/s $\color{#d91a1a}-3.30\%$
test_consolidate[reduce-overhead-None] 1.9720ms 1.7948ms 557.1725 Ops/s 575.4361 Ops/s $\color{#d91a1a}-3.17\%$
test_consolidate_njt[False-None] 7.0278ms 6.5437ms 152.8193 Ops/s 151.1377 Ops/s $\color{#35bf28}+1.11\%$
test_to[False-False-None] 2.0105ms 1.7950ms 557.1067 Ops/s 561.2088 Ops/s $\color{#d91a1a}-0.73\%$
test_to[True-False-None] 2.0246ms 1.4623ms 683.8490 Ops/s 696.8114 Ops/s $\color{#d91a1a}-1.86\%$
test_to[within-False-None] 4.5728ms 4.3906ms 227.7615 Ops/s 227.5894 Ops/s $\color{#35bf28}+0.08\%$
test_to[True-default-None] 5.6276ms 5.4198ms 184.5077 Ops/s 188.8464 Ops/s $\color{#d91a1a}-2.30\%$
test_to_njt[False-False-None] 7.9840ms 7.0162ms 142.5267 Ops/s 142.9416 Ops/s $\color{#d91a1a}-0.29\%$
test_to_njt[True-False-None] 6.1037ms 5.6955ms 175.5773 Ops/s 181.6074 Ops/s $\color{#d91a1a}-3.32\%$
test_to_njt[within-False-None] 0.3274s 16.2764ms 61.4388 Ops/s 80.5356 Ops/s $\textbf{\color{#d91a1a}-23.71\%}$
test_creation[device0] 0.4516ms 79.6779μs 12.5505 KOps/s 12.1059 KOps/s $\color{#35bf28}+3.67\%$
test_creation_from_tensor 0.5319ms 82.7355μs 12.0867 KOps/s 11.7030 KOps/s $\color{#35bf28}+3.28\%$
test_add_one[memmap_tensor0] 0.1129ms 7.0840μs 141.1625 KOps/s 138.3790 KOps/s $\color{#35bf28}+2.01\%$
test_contiguous[memmap_tensor0] 1.8745μs 0.4443μs 2.2508 MOps/s 2.3433 MOps/s $\color{#d91a1a}-3.95\%$
test_stack[memmap_tensor0] 33.9300μs 4.7966μs 208.4809 KOps/s 236.6951 KOps/s $\textbf{\color{#d91a1a}-11.92\%}$
test_memmaptd_index 1.4265ms 0.2474ms 4.0416 KOps/s 4.0547 KOps/s $\color{#d91a1a}-0.32\%$
test_memmaptd_index_astensor 0.4355ms 0.3086ms 3.2408 KOps/s 3.2636 KOps/s $\color{#d91a1a}-0.70\%$
test_memmaptd_index_op 0.9644ms 0.5701ms 1.7539 KOps/s 1.7741 KOps/s $\color{#d91a1a}-1.14\%$
test_serialize_model 0.1335s 0.1330s 7.5204 Ops/s 5.4437 Ops/s $\textbf{\color{#35bf28}+38.15\%}$
test_serialize_model_pickle 1.3658s 1.2159s 0.8224 Ops/s 0.8209 Ops/s $\color{#35bf28}+0.19\%$
test_serialize_weights 0.1327s 0.1316s 7.5967 Ops/s 7.5386 Ops/s $\color{#35bf28}+0.77\%$
test_serialize_weights_returnearly 0.3268s 53.7557ms 18.6027 Ops/s 23.4008 Ops/s $\textbf{\color{#d91a1a}-20.50\%}$
test_serialize_weights_pickle 1.3790s 1.2213s 0.8188 Ops/s 0.8186 Ops/s $\color{#35bf28}+0.03\%$
test_reshape_pytree 0.1181ms 22.1975μs 45.0501 KOps/s 44.9321 KOps/s $\color{#35bf28}+0.26\%$
test_reshape_td 0.1109ms 26.6986μs 37.4551 KOps/s 37.4338 KOps/s $\color{#35bf28}+0.06\%$
test_view_pytree 0.1923ms 22.0964μs 45.2563 KOps/s 46.0493 KOps/s $\color{#d91a1a}-1.72\%$
test_view_td 0.1723ms 31.8510μs 31.3962 KOps/s 33.0567 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_unbind_pytree 0.1411ms 28.6206μs 34.9398 KOps/s 36.1268 KOps/s $\color{#d91a1a}-3.29\%$
test_unbind_td 0.6228ms 38.2103μs 26.1709 KOps/s 27.1863 KOps/s $\color{#d91a1a}-3.73\%$
test_split_pytree 0.2063ms 30.1250μs 33.1950 KOps/s 34.1219 KOps/s $\color{#d91a1a}-2.72\%$
test_split_td 0.7766ms 39.9099μs 25.0564 KOps/s 25.5137 KOps/s $\color{#d91a1a}-1.79\%$
test_add_pytree 0.2096ms 35.9953μs 27.7814 KOps/s 28.3802 KOps/s $\color{#d91a1a}-2.11\%$
test_add_td 0.2907ms 54.1557μs 18.4653 KOps/s 19.8761 KOps/s $\textbf{\color{#d91a1a}-7.10\%}$
test_compile_add_one_nested[tensordict-compile] 0.2771ms 0.1258ms 7.9465 KOps/s 7.8646 KOps/s $\color{#35bf28}+1.04\%$
test_compile_add_one_nested[tensordict-eager] 0.3297ms 0.1467ms 6.8155 KOps/s 6.9776 KOps/s $\color{#d91a1a}-2.32\%$
test_compile_add_one_nested[pytree-compile] 0.2451ms 97.2797μs 10.2796 KOps/s 9.7509 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_compile_add_one_nested[pytree-eager] 1.4617ms 0.1536ms 6.5105 KOps/s 6.5403 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_copy_nested[tensordict-compile] 0.1557ms 24.2395μs 41.2550 KOps/s 41.3765 KOps/s $\color{#d91a1a}-0.29\%$
test_compile_copy_nested[tensordict-eager] 0.1403ms 34.9787μs 28.5888 KOps/s 28.1649 KOps/s $\color{#35bf28}+1.50\%$
test_compile_copy_nested[pytree-compile] 0.4627ms 63.1122μs 15.8448 KOps/s 15.5761 KOps/s $\color{#35bf28}+1.73\%$
test_compile_copy_nested[pytree-eager] 0.1922ms 48.8713μs 20.4619 KOps/s 20.4176 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_one_flat[tensordict-compile] 0.3027ms 0.1503ms 6.6523 KOps/s 6.9150 KOps/s $\color{#d91a1a}-3.80\%$
test_compile_add_one_flat[tensordict-eager] 0.3637ms 0.2224ms 4.4967 KOps/s 4.5030 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_add_one_flat[tensorclass-compile] 0.2472ms 98.3418μs 10.1686 KOps/s 10.2190 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_add_one_flat[tensorclass-eager] 0.2809ms 61.1687μs 16.3482 KOps/s 16.8166 KOps/s $\color{#d91a1a}-2.78\%$
test_compile_add_one_flat[pytree-compile] 0.2876ms 0.1383ms 7.2308 KOps/s 7.1986 KOps/s $\color{#35bf28}+0.45\%$
test_compile_add_one_flat[pytree-eager] 0.7038ms 0.5062ms 1.9756 KOps/s 1.9902 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_add_self_flat[tensordict-eager] 0.4663ms 0.2673ms 3.7416 KOps/s 3.7554 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_add_self_flat[tensordict-compile] 0.2877ms 0.1450ms 6.8965 KOps/s 6.9643 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_add_self_flat[tensorclass-eager] 0.2212ms 70.7835μs 14.1276 KOps/s 14.0108 KOps/s $\color{#35bf28}+0.83\%$
test_compile_add_self_flat[tensorclass-compile] 0.2449ms 0.1035ms 9.6610 KOps/s 10.1861 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_compile_add_self_flat[pytree-eager] 0.5832ms 0.4185ms 2.3898 KOps/s 2.4180 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_add_self_flat[pytree-compile] 0.2763ms 0.1368ms 7.3102 KOps/s 7.4703 KOps/s $\color{#d91a1a}-2.14\%$
test_compile_copy_flat[tensordict-compile] 0.1536ms 19.6160μs 50.9789 KOps/s 52.4837 KOps/s $\color{#d91a1a}-2.87\%$
test_compile_copy_flat[tensordict-eager] 68.8710μs 33.0324μs 30.2733 KOps/s 30.4742 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_copy_flat[pytree-compile] 0.1276ms 70.6230μs 14.1597 KOps/s 14.2935 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_copy_flat[pytree-eager] 0.1008ms 52.5740μs 19.0208 KOps/s 18.9895 KOps/s $\color{#35bf28}+0.16\%$
test_compile_assign_and_add[tensordict-compile] 1.6833ms 0.4030ms 2.4814 KOps/s 2.2253 KOps/s $\textbf{\color{#35bf28}+11.51\%}$
test_compile_assign_and_add[tensordict-eager] 2.9806ms 2.7646ms 361.7215 Ops/s 367.5903 Ops/s $\color{#d91a1a}-1.60\%$
test_compile_assign_and_add[pytree-compile] 1.6336ms 0.4409ms 2.2683 KOps/s 2.2760 KOps/s $\color{#d91a1a}-0.34\%$
test_compile_assign_and_add[pytree-eager] 2.9756ms 2.7193ms 367.7359 Ops/s 382.4617 Ops/s $\color{#d91a1a}-3.85\%$
test_compile_indexing[tensor-tensordict-compile] 0.5404ms 0.1121ms 8.9172 KOps/s 8.7726 KOps/s $\color{#35bf28}+1.65\%$
test_compile_indexing[tensor-tensordict-eager] 0.5491ms 84.4860μs 11.8363 KOps/s 11.8624 KOps/s $\color{#d91a1a}-0.22\%$
test_compile_indexing[tensor-tensorclass-compile] 0.6146ms 0.1105ms 9.0532 KOps/s 9.2628 KOps/s $\color{#d91a1a}-2.26\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2406ms 71.6950μs 13.9480 KOps/s 14.4439 KOps/s $\color{#d91a1a}-3.43\%$
test_compile_indexing[tensor-pytree-compile] 0.2972ms 0.1155ms 8.6594 KOps/s 8.6956 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_indexing[tensor-pytree-eager] 0.2259ms 73.2720μs 13.6478 KOps/s 13.3282 KOps/s $\color{#35bf28}+2.40\%$
test_compile_indexing[slice-tensordict-compile] 0.2498ms 0.1014ms 9.8668 KOps/s 9.5884 KOps/s $\color{#35bf28}+2.90\%$
test_compile_indexing[slice-tensordict-eager] 0.1885ms 19.7453μs 50.6450 KOps/s 51.6525 KOps/s $\color{#d91a1a}-1.95\%$
test_compile_indexing[slice-tensorclass-compile] 0.2921ms 0.1024ms 9.7696 KOps/s 10.3616 KOps/s $\textbf{\color{#d91a1a}-5.71\%}$
test_compile_indexing[slice-tensorclass-eager] 0.1398ms 16.2706μs 61.4606 KOps/s 64.8256 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_compile_indexing[slice-pytree-compile] 0.2924ms 98.4282μs 10.1597 KOps/s 9.6200 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_compile_indexing[slice-pytree-eager] 0.1997ms 16.2660μs 61.4779 KOps/s 64.8231 KOps/s $\textbf{\color{#d91a1a}-5.16\%}$
test_compile_indexing[int-tensordict-compile] 0.2558ms 0.1089ms 9.1812 KOps/s 10.0205 KOps/s $\textbf{\color{#d91a1a}-8.38\%}$
test_compile_indexing[int-tensordict-eager] 0.5932ms 19.6898μs 50.7878 KOps/s 53.4219 KOps/s $\color{#d91a1a}-4.93\%$
test_compile_indexing[int-tensorclass-compile] 0.2723ms 97.3711μs 10.2700 KOps/s 10.2692 KOps/s $+0.01\%$
test_compile_indexing[int-tensorclass-eager] 0.1373ms 16.0513μs 62.3004 KOps/s 64.8437 KOps/s $\color{#d91a1a}-3.92\%$
test_compile_indexing[int-pytree-compile] 0.2834ms 0.1030ms 9.7133 KOps/s 9.6610 KOps/s $\color{#35bf28}+0.54\%$
test_compile_indexing[int-pytree-eager] 0.1315ms 16.1351μs 61.9765 KOps/s 63.4239 KOps/s $\color{#d91a1a}-2.28\%$
test_mod_add[eager] 0.1899ms 41.4317μs 24.1361 KOps/s 25.7828 KOps/s $\textbf{\color{#d91a1a}-6.39\%}$
test_mod_add[compile] 0.3691ms 83.4823μs 11.9786 KOps/s 11.7884 KOps/s $\color{#35bf28}+1.61\%$
test_mod_add[compile-overhead] 0.3270ms 0.1686ms 5.9310 KOps/s 5.6402 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_mod_wrap[eager] 0.4346ms 0.2554ms 3.9155 KOps/s 3.9025 KOps/s $\color{#35bf28}+0.33\%$
test_mod_wrap[compile] 0.7867ms 0.2905ms 3.4418 KOps/s 3.4045 KOps/s $\color{#35bf28}+1.10\%$
test_mod_wrap[compile-overhead] 7.4475ms 3.8071ms 262.6702 Ops/s 264.0832 Ops/s $\color{#d91a1a}-0.54\%$
test_mod_wrap_and_backward[eager] 1.8245ms 1.3780ms 725.7089 Ops/s 680.8905 Ops/s $\textbf{\color{#35bf28}+6.58\%}$
test_mod_wrap_and_backward[compile] 1.4759ms 1.2837ms 779.0203 Ops/s 715.3521 Ops/s $\textbf{\color{#35bf28}+8.90\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4014ms 0.9376ms 1.0665 KOps/s 957.8628 Ops/s $\textbf{\color{#35bf28}+11.34\%}$
test_seq_add[eager] 0.3174ms 0.1295ms 7.7239 KOps/s 7.7334 KOps/s $\color{#d91a1a}-0.12\%$
test_seq_add[compile] 0.2361ms 90.9667μs 10.9930 KOps/s 10.7934 KOps/s $\color{#35bf28}+1.85\%$
test_seq_add[compile-overhead] 0.2888ms 0.1309ms 7.6375 KOps/s 7.5401 KOps/s $\color{#35bf28}+1.29\%$
test_seq_wrap[eager] 1.0175ms 0.4411ms 2.2673 KOps/s 2.2695 KOps/s $\color{#d91a1a}-0.10\%$
test_seq_wrap[compile] 1.1318ms 0.3120ms 3.2056 KOps/s 3.2043 KOps/s $\color{#35bf28}+0.04\%$
test_seq_wrap[compile-overhead] 0.6277ms 0.2325ms 4.3009 KOps/s 4.3066 KOps/s $\color{#d91a1a}-0.13\%$
test_func_call_runtime[False-eager] 1.1701ms 0.7579ms 1.3194 KOps/s 1.3303 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_runtime[False-compile] 0.9484ms 0.7569ms 1.3211 KOps/s 1.3163 KOps/s $\color{#35bf28}+0.37\%$
test_func_call_runtime[False-compile-overhead] 0.4880ms 0.3720ms 2.6882 KOps/s 2.7007 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_runtime[True-eager] 1.5182ms 0.9383ms 1.0657 KOps/s 1.0830 KOps/s $\color{#d91a1a}-1.60\%$
test_func_call_runtime[True-compile] 1.1989ms 0.7813ms 1.2799 KOps/s 1.2844 KOps/s $\color{#d91a1a}-0.35\%$
test_func_call_runtime[True-compile-overhead] 0.5397ms 0.3917ms 2.5529 KOps/s 2.5541 KOps/s $\color{#d91a1a}-0.05\%$
test_func_call_cm_runtime[False-eager] 1.1683ms 0.7553ms 1.3241 KOps/s 1.3339 KOps/s $\color{#d91a1a}-0.74\%$
test_func_call_cm_runtime[False-compile] 1.1732ms 0.7730ms 1.2936 KOps/s 1.3075 KOps/s $\color{#d91a1a}-1.06\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5293ms 0.3740ms 2.6738 KOps/s 2.6959 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_cm_runtime[True-eager] 1.4565ms 1.0231ms 977.3895 Ops/s 974.2029 Ops/s $\color{#35bf28}+0.33\%$
test_func_call_cm_runtime[True-compile] 1.4598ms 1.0203ms 980.0838 Ops/s 991.0166 Ops/s $\color{#d91a1a}-1.10\%$
test_func_call_cm_runtime[True-compile-overhead] 1.4354ms 1.0186ms 981.7698 Ops/s 945.7932 Ops/s $\color{#35bf28}+3.80\%$
test_vmap_func_call_cm_runtime[eager] 2.5711ms 2.1338ms 468.6508 Ops/s 470.1115 Ops/s $\color{#d91a1a}-0.31\%$
test_vmap_func_call_cm_runtime[compile] 1.0235ms 0.8316ms 1.2025 KOps/s 1.1966 KOps/s $\color{#35bf28}+0.49\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5714ms 0.4199ms 2.3817 KOps/s 2.3555 KOps/s $\color{#35bf28}+1.11\%$
test_distributed 0.5993ms 0.1238ms 8.0766 KOps/s 8.6218 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_tdmodule 41.2210μs 20.4612μs 48.8729 KOps/s 48.2330 KOps/s $\color{#35bf28}+1.33\%$
test_tdmodule_dispatch 57.8810μs 38.3629μs 26.0668 KOps/s 26.1336 KOps/s $\color{#d91a1a}-0.26\%$
test_tdseq 0.1596ms 20.5623μs 48.6327 KOps/s 49.4170 KOps/s $\color{#d91a1a}-1.59\%$
test_tdseq_dispatch 62.0200μs 40.3792μs 24.7652 KOps/s 25.4030 KOps/s $\color{#d91a1a}-2.51\%$
test_instantiation_functorch 1.9663ms 1.5691ms 637.3115 Ops/s 634.3889 Ops/s $\color{#35bf28}+0.46\%$
test_exec_functorch 0.2821ms 0.1463ms 6.8368 KOps/s 6.8611 KOps/s $\color{#d91a1a}-0.35\%$
test_exec_functional_call 0.5620ms 0.1431ms 6.9902 KOps/s 7.2588 KOps/s $\color{#d91a1a}-3.70\%$
test_exec_td_decorator 0.5928ms 0.1937ms 5.1626 KOps/s 5.3025 KOps/s $\color{#d91a1a}-2.64\%$
test_vmap_mlp_speed_decorator[True-True] 1.0861ms 0.6962ms 1.4364 KOps/s 1.4409 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_mlp_speed_decorator[True-False] 1.1060ms 0.6919ms 1.4454 KOps/s 1.4446 KOps/s $\color{#35bf28}+0.05\%$
test_vmap_mlp_speed_decorator[False-True] 1.0089ms 0.6032ms 1.6577 KOps/s 1.6666 KOps/s $\color{#d91a1a}-0.54\%$
test_vmap_mlp_speed_decorator[False-False] 1.0156ms 0.6032ms 1.6579 KOps/s 1.6631 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_transformer_speed_decorator[True-True] 19.7805ms 19.4949ms 51.2954 Ops/s 51.6867 Ops/s $\color{#d91a1a}-0.76\%$
test_vmap_transformer_speed_decorator[True-False] 19.9169ms 19.4967ms 51.2907 Ops/s 51.7096 Ops/s $\color{#d91a1a}-0.81\%$
test_vmap_transformer_speed_decorator[False-True] 19.8132ms 19.4007ms 51.5445 Ops/s 52.0631 Ops/s $\color{#d91a1a}-1.00\%$
test_vmap_transformer_speed_decorator[False-False] 19.7102ms 19.3753ms 51.6120 Ops/s 51.9257 Ops/s $\color{#d91a1a}-0.60\%$
test_to_module_speed[True] 1.4068ms 0.9773ms 1.0233 KOps/s 1.0353 KOps/s $\color{#d91a1a}-1.16\%$
test_to_module_speed[False] 1.4554ms 0.9540ms 1.0482 KOps/s 1.0351 KOps/s $\color{#35bf28}+1.27\%$
test_tc_init 0.4362ms 35.4464μs 28.2116 KOps/s 26.8493 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_tc_init_tensor_only 0.1050ms 10.7241μs 93.2480 KOps/s 94.1173 KOps/s $\color{#d91a1a}-0.92\%$
test_tc_init_nested 0.4668ms 68.0415μs 14.6969 KOps/s 13.8051 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_tc_first_layer_tensor 64.8340μs 0.8127μs 1.2305 MOps/s 1.1004 MOps/s $\textbf{\color{#35bf28}+11.82\%}$
test_tc_first_layer_tensor_only 19.6192μs 0.4208μs 2.3765 MOps/s 2.3580 MOps/s $\color{#35bf28}+0.78\%$
test_tc_first_layer_tensor_set 19.0300μs 2.9045μs 344.2971 KOps/s 341.8997 KOps/s $\color{#35bf28}+0.70\%$
test_tc_first_layer_tensor_only_set 0.1355ms 1.7795μs 561.9476 KOps/s 567.6566 KOps/s $\color{#d91a1a}-1.01\%$
test_tc_first_layer_nontensor 23.5410μs 2.3556μs 424.5219 KOps/s 426.4916 KOps/s $\color{#d91a1a}-0.46\%$
test_tc_second_layer_tensor 21.9210μs 1.7455μs 572.9032 KOps/s 571.0618 KOps/s $\color{#35bf28}+0.32\%$
test_tc_second_layer_nontensor 0.4023ms 3.1801μs 314.4543 KOps/s 309.1404 KOps/s $\color{#35bf28}+1.72\%$
test_unbind 0.2428s 11.0531ms 90.4720 Ops/s 144.5431 Ops/s $\textbf{\color{#d91a1a}-37.41\%}$
test_full_like 6.6516ms 4.4846ms 222.9843 Ops/s 111.2904 Ops/s $\textbf{\color{#35bf28}+100.36\%}$
test_zeros_like 5.3540ms 4.4084ms 226.8419 Ops/s 227.5609 Ops/s $\color{#d91a1a}-0.32\%$
test_ones_like 5.0673ms 4.4027ms 227.1317 Ops/s 225.0744 Ops/s $\color{#35bf28}+0.91\%$
test_clone 7.6909ms 7.0276ms 142.2957 Ops/s 149.1309 Ops/s $\color{#d91a1a}-4.58\%$
test_squeeze 0.1264ms 10.1664μs 98.3632 KOps/s 101.4917 KOps/s $\color{#d91a1a}-3.08\%$
test_unsqueeze 0.4931ms 78.2618μs 12.7776 KOps/s 13.6937 KOps/s $\textbf{\color{#d91a1a}-6.69\%}$
test_split 0.2953ms 0.1652ms 6.0525 KOps/s 6.0458 KOps/s $\color{#35bf28}+0.11\%$
test_permute 0.5897ms 0.1834ms 5.4532 KOps/s 5.3816 KOps/s $\color{#35bf28}+1.33\%$
test_stack 52.2046ms 51.4179ms 19.4485 Ops/s 52.8766 Ops/s $\textbf{\color{#d91a1a}-63.22\%}$
test_cat 52.1687ms 51.3724ms 19.4657 Ops/s 36.6331 Ops/s $\textbf{\color{#d91a1a}-46.86\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants