Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]: Experimental: Try to mitigate 64-bit-packed tile state performance degradation #3

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

elstehle
Copy link
Owner

@elstehle elstehle commented Aug 5, 2024

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@elstehle
Copy link
Owner Author

elstehle commented Aug 5, 2024

Perf comparison main versus 30e24e2

T{ct} OffsetT{ct} IsInPlace{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 false 2^24 1 51.896 us 1.34% 51.802 us 1.33% -0.094 us -0.18% PASS
I8 I32 false 2^28 1 698.938 us 0.50% 697.534 us 0.50% -1.404 us -0.20% PASS
I8 I32 false 2^24 0.544 48.953 us 1.10% 48.613 us 1.11% -0.340 us -0.69% PASS
I8 I32 false 2^28 0.544 642.796 us 0.50% 642.662 us 0.48% -0.134 us -0.02% PASS
I8 I32 false 2^24 0 42.687 us 1.23% 42.947 us 1.14% 0.260 us 0.61% PASS
I8 I32 false 2^28 0 528.164 us 0.50% 528.383 us 0.50% 0.219 us 0.04% PASS
I8 I64 false 2^24 1 71.300 us 0.51% 63.672 us 0.71% -7.628 us -10.70% FAIL
I8 I64 false 2^28 1 1.037 ms 0.26% 894.047 us 0.30% -143.057 us -13.79% FAIL
I8 I64 false 2^24 0.544 68.637 us 0.52% 60.406 us 0.77% -8.231 us -11.99% FAIL
I8 I64 false 2^28 0.544 986.232 us 0.31% 832.442 us 0.33% -153.790 us -15.59% FAIL
I8 I64 false 2^24 0 63.950 us 0.49% 54.130 us 0.77% -9.819 us -15.35% FAIL
I8 I64 false 2^28 0 885.056 us 0.50% 714.081 us 0.50% -170.974 us -19.32% FAIL
I16 I32 false 2^24 1 63.479 us 3.28% 63.719 us 3.27% 0.240 us 0.38% PASS
I16 I32 false 2^28 1 874.745 us 1.43% 875.389 us 1.44% 0.644 us 0.07% PASS
I16 I32 false 2^24 0.544 56.245 us 2.86% 56.263 us 3.05% 0.017 us 0.03% PASS
I16 I32 false 2^28 0.544 700.144 us 1.60% 700.063 us 1.59% -0.082 us -0.01% PASS
I16 I32 false 2^24 0 47.617 us 2.61% 47.973 us 2.60% 0.355 us 0.75% PASS
I16 I32 false 2^28 0 505.644 us 1.06% 505.672 us 1.09% 0.028 us 0.01% PASS
I16 I64 false 2^24 1 76.787 us 0.72% 67.936 us 2.74% -8.851 us -11.53% FAIL
I16 I64 false 2^28 1 1.100 ms 0.50% 928.325 us 1.13% -171.597 us -15.60% FAIL
I16 I64 false 2^24 0.544 74.412 us 0.71% 61.174 us 2.29% -13.238 us -17.79% FAIL
I16 I64 false 2^28 0.544 1.064 ms 0.41% 784.941 us 0.83% -278.641 us -26.20% FAIL
I16 I64 false 2^24 0 67.481 us 0.72% 54.044 us 1.97% -13.438 us -19.91% FAIL
I16 I64 false 2^28 0 924.492 us 0.50% 635.277 us 0.64% -289.215 us -31.28% FAIL
I32 I32 false 2^24 1 90.582 us 2.75% 90.957 us 2.71% 0.375 us 0.41% PASS
I32 I32 false 2^28 1 1.345 ms 0.96% 1.343 ms 0.97% -1.420 us -0.11% PASS
I32 I32 false 2^24 0.544 78.464 us 3.16% 78.526 us 3.11% 0.062 us 0.08% PASS
I32 I32 false 2^28 0.544 1.077 ms 1.36% 1.073 ms 1.10% -4.435 us -0.41% PASS
I32 I32 false 2^24 0 62.607 us 2.20% 62.745 us 2.15% 0.138 us 0.22% PASS
I32 I32 false 2^28 0 652.051 us 2.38% 652.093 us 1.78% 0.042 us 0.01% PASS
I32 I64 false 2^24 1 100.878 us 1.57% 96.820 us 2.45% -4.058 us -4.02% FAIL
I32 I64 false 2^28 1 1.517 ms 1.03% 1.407 ms 0.82% -110.209 us -7.26% FAIL
I32 I64 false 2^24 0.544 90.822 us 1.79% 83.880 us 2.16% -6.942 us -7.64% FAIL
I32 I64 false 2^28 0.544 1.337 ms 1.26% 1.157 ms 1.09% -179.515 us -13.43% FAIL
I32 I64 false 2^24 0 77.814 us 1.97% 69.889 us 1.42% -7.925 us -10.18% FAIL
I32 I64 false 2^28 0 1.038 ms 2.42% 802.473 us 0.84% -235.572 us -22.69% FAIL
I64 I32 false 2^24 1 167.386 us 2.17% 166.741 us 2.17% -0.646 us -0.39% PASS
I64 I32 false 2^28 1 2.573 ms 0.50% 2.563 ms 0.56% -10.214 us -0.40% PASS
I64 I32 false 2^24 0.544 138.900 us 3.33% 138.524 us 3.48% -0.376 us -0.27% PASS
I64 I32 false 2^28 0.544 2.061 ms 0.80% 2.044 ms 0.83% -16.246 us -0.79% PASS
I64 I32 false 2^24 0 99.196 us 3.03% 98.935 us 2.93% -0.261 us -0.26% PASS
I64 I32 false 2^28 0 1.245 ms 0.88% 1.239 ms 0.95% -6.592 us -0.53% PASS
I64 I64 false 2^24 1 188.720 us 1.86% 174.109 us 2.37% -14.611 us -7.74% FAIL
I64 I64 false 2^28 1 2.926 ms 0.50% 2.693 ms 0.56% -233.485 us -7.98% FAIL
I64 I64 false 2^24 0.544 162.007 us 2.22% 143.649 us 2.80% -18.359 us -11.33% FAIL
I64 I64 false 2^28 0.544 2.465 ms 1.04% 2.178 ms 0.55% -286.342 us -11.62% FAIL
I64 I64 false 2^24 0 131.508 us 2.80% 106.763 us 2.21% -24.744 us -18.82% FAIL
I64 I64 false 2^28 0 1.899 ms 0.88% 1.444 ms 1.21% -454.708 us -23.95% FAIL
I128 I32 false 2^24 1 348.036 us 1.99% 346.659 us 1.87% -1.377 us -0.40% PASS
I128 I32 false 2^28 1 5.395 ms 0.32% 5.390 ms 0.40% -5.473 us -0.10% PASS
I128 I32 false 2^24 0.544 284.099 us 1.68% 283.047 us 1.78% -1.052 us -0.37% PASS
I128 I32 false 2^28 0.544 4.345 ms 0.49% 4.313 ms 0.41% -31.826 us -0.73% FAIL
I128 I32 false 2^24 0 187.273 us 2.62% 185.523 us 2.71% -1.750 us -0.93% PASS
I128 I32 false 2^28 0 2.841 ms 4.73% 2.635 ms 1.03% -206.428 us -7.27% FAIL
I128 I64 false 2^24 1 396.549 us 3.78% 354.858 us 1.62% -41.691 us -10.51% FAIL
I128 I64 false 2^28 1 5.916 ms 1.09% 5.524 ms 0.35% -391.806 us -6.62% FAIL
I128 I64 false 2^24 0.544 324.322 us 1.08% 290.003 us 1.64% -34.319 us -10.58% FAIL
I128 I64 false 2^28 0.544 5.174 ms 1.23% 4.455 ms 0.49% -719.040 us -13.90% FAIL
I128 I64 false 2^24 0 276.285 us 1.97% 195.827 us 1.98% -80.458 us -29.12% FAIL
I128 I64 false 2^28 0 4.221 ms 1.31% 2.840 ms 0.93% -1381.254 us -32.72% FAIL
F32 I32 false 2^24 1 93.009 us 3.00% 92.905 us 2.98% -0.104 us -0.11% PASS
F32 I32 false 2^28 1 1.490 ms 4.63% 1.422 ms 1.50% -67.238 us -4.51% FAIL
F32 I32 false 2^24 0.544 68.219 us 3.61% 65.946 us 2.82% -2.274 us -3.33% FAIL
F32 I32 false 2^28 0.544 838.273 us 1.41% 797.503 us 1.26% -40.770 us -4.86% FAIL
F32 I32 false 2^24 0 64.815 us 3.21% 63.736 us 2.65% -1.079 us -1.66% PASS
F32 I32 false 2^28 0 802.259 us 9.26% 700.190 us 1.86% -102.069 us -12.72% FAIL
F32 I64 false 2^24 1 117.309 us 4.92% 100.467 us 2.57% -16.842 us -14.36% FAIL
F32 I64 false 2^28 1 1.680 ms 1.70% 1.501 ms 0.79% -179.458 us -10.68% FAIL
F32 I64 false 2^24 0.544 85.709 us 3.36% 75.167 us 2.12% -10.542 us -12.30% FAIL
F32 I64 false 2^28 0.544 1.261 ms 4.98% 976.214 us 1.10% -284.355 us -22.56% FAIL
F32 I64 false 2^24 0 81.453 us 2.89% 71.517 us 2.31% -9.936 us -12.20% FAIL
F32 I64 false 2^28 0 1.166 ms 4.07% 898.807 us 2.55% -267.358 us -22.93% FAIL
F64 I32 false 2^24 1 171.655 us 2.37% 170.125 us 2.32% -1.531 us -0.89% PASS
F64 I32 false 2^28 1 3.008 ms 5.66% 2.802 ms 4.69% -205.519 us -6.83% FAIL
F64 I32 false 2^24 0.544 122.800 us 7.60% 115.659 us 4.54% -7.142 us -5.82% FAIL
F64 I32 false 2^28 0.544 1.817 ms 8.81% 1.629 ms 6.61% -188.519 us -10.37% FAIL
F64 I32 false 2^24 0 104.585 us 5.51% 104.090 us 3.91% -0.496 us -0.47% PASS
F64 I32 false 2^28 0 1.531 ms 13.70% 1.389 ms 2.93% -141.598 us -9.25% FAIL
F64 I64 false 2^24 1 234.254 us 5.72% 185.441 us 3.37% -48.813 us -20.84% FAIL
F64 I64 false 2^28 1 3.158 ms 4.74% 2.811 ms 0.62% -346.823 us -10.98% FAIL
F64 I64 false 2^24 0.544 155.313 us 5.53% 119.435 us 2.66% -35.877 us -23.10% FAIL
F64 I64 false 2^28 0.544 2.226 ms 1.90% 1.740 ms 5.73% -485.819 us -21.82% FAIL
F64 I64 false 2^24 0 135.071 us 2.58% 112.850 us 3.11% -22.221 us -16.45% FAIL
F64 I64 false 2^28 0 2.000 ms 2.95% 1.515 ms 1.74% -484.435 us -24.22% FAIL

@elstehle
Copy link
Owner Author

elstehle commented Aug 6, 2024

After e1b150c worst-case runtime increase for i64 over i32 offset types is down to 1.35x in select.if benchmarks.

T{ct} IsInPlace{ct} Elements{io} Entropy I32 i64 i64/i32 time
I8 FALSE 2^24 1 51.802 63.672 122.91%
I8 FALSE 2^28 1 697.534 894.047 128.17%
I8 FALSE 2^24 0.544 48.613 60.406 124.26%
I8 FALSE 2^28 0.544 642.662 832.442 129.53%
I8 FALSE 2^24 0 42.947 54.13 126.04%
I8 FALSE 2^28 0 528.383 714.081 135.14%
I16 FALSE 2^24 1 63.719 67.936 106.62%
I16 FALSE 2^28 1 875.389 928.325 106.05%
I16 FALSE 2^24 0.544 56.263 61.174 108.73%
I16 FALSE 2^28 0.544 700.063 784.941 112.12%
I16 FALSE 2^24 0 47.973 54.044 112.66%
I16 FALSE 2^28 0 505.672 635.277 125.63%
I32 FALSE 2^24 1 90.957 96.82 106.45%
I32 FALSE 2^28 1 1343 1407 104.77%
I32 FALSE 2^24 0.544 78.526 83.88 106.82%
I32 FALSE 2^28 0.544 1073 1157 107.83%
I32 FALSE 2^24 0 62.745 69.889 111.39%
I32 FALSE 2^28 0 652.093 802.473 123.06%
I64 FALSE 2^24 1 166.741 174.109 104.42%
I64 FALSE 2^28 1 2563 2693 105.07%
I64 FALSE 2^24 0.544 138.524 143.649 103.70%
I64 FALSE 2^28 0.544 2044 2178 106.56%
I64 FALSE 2^24 0 98.935 106.763 107.91%
I64 FALSE 2^28 0 1239 1444 116.55%
I128 FALSE 2^24 1 346.659 354.858 102.37%
I128 FALSE 2^28 1 5390 5524 102.49%
I128 FALSE 2^24 0.544 283.047 290.003 102.46%
I128 FALSE 2^28 0.544 4313 4455 103.29%
I128 FALSE 2^24 0 185.523 195.827 105.55%
I128 FALSE 2^28 0 2635 2840 107.78%
F32 FALSE 2^24 1 92.905 100.467 108.14%
F32 FALSE 2^28 1 1422 1501 105.56%
F32 FALSE 2^24 0.544 65.946 75.167 113.98%
F32 FALSE 2^28 0.544 797.503 976.214 122.41%
F32 FALSE 2^24 0 63.736 71.517 112.21%
F32 FALSE 2^28 0 700.19 898.807 128.37%
F64 FALSE 2^24 1 170.125 185.441 109.00%
F64 FALSE 2^28 1 2802 2811 100.32%
F64 FALSE 2^24 0.544 115.659 119.435 103.26%
F64 FALSE 2^28 0.544 1629 1740 106.81%
F64 FALSE 2^24 0 104.09 112.85 108.42%
F64 FALSE 2^28 0 1389 1515 109.07%

Comparing 30e24e2 to e1b150c

T{ct} OffsetT{ct} IsInPlace{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 false 2^24 1 51.715 us 1.40% 52.253 us 1.34% 0.538 us 1.04% PASS
I8 I32 false 2^28 1 698.210 us 0.50% 700.855 us 0.50% 2.645 us 0.38% PASS
I8 I32 false 2^24 0.544 48.884 us 1.21% 48.975 us 1.17% 0.092 us 0.19% PASS
I8 I32 false 2^28 0.544 642.616 us 0.50% 643.749 us 0.50% 1.133 us 0.18% PASS
I8 I32 false 2^24 0 42.670 us 1.25% 43.185 us 1.15% 0.514 us 1.21% FAIL
I8 I32 false 2^28 0 528.725 us 0.52% 533.239 us 0.50% 4.514 us 0.85% FAIL
I8 I64 false 2^24 1 63.280 us 0.70% 63.236 us 0.68% -0.044 us -0.07% PASS
I8 I64 false 2^28 1 892.796 us 0.31% 881.331 us 0.35% -11.465 us -1.28% FAIL
I8 I64 false 2^24 0.544 60.089 us 0.68% 60.048 us 0.69% -0.041 us -0.07% PASS
I8 I64 false 2^28 0.544 831.474 us 0.35% 822.872 us 0.35% -8.602 us -1.03% FAIL
I8 I64 false 2^24 0 53.846 us 0.72% 56.934 us 0.60% 3.087 us 5.73% FAIL
I8 I64 false 2^28 0 713.249 us 0.50% 756.683 us 0.46% 43.434 us 6.09% FAIL
I16 I32 false 2^24 1 63.677 us 3.30% 63.992 us 3.42% 0.316 us 0.50% PASS
I16 I32 false 2^28 1 881.749 us 1.44% 882.635 us 1.40% 0.885 us 0.10% PASS
I16 I32 false 2^24 0.544 56.262 us 3.04% 56.519 us 3.06% 0.257 us 0.46% PASS
I16 I32 false 2^28 0.544 706.058 us 1.51% 711.285 us 1.54% 5.227 us 0.74% PASS
I16 I32 false 2^24 0 48.364 us 2.71% 48.382 us 2.70% 0.018 us 0.04% PASS
I16 I32 false 2^28 0 509.986 us 1.08% 512.730 us 1.05% 2.744 us 0.54% PASS
I16 I64 false 2^24 1 68.125 us 2.73% 63.531 us 3.09% -4.593 us -6.74% FAIL
I16 I64 false 2^28 1 935.388 us 1.14% 878.986 us 1.40% -56.402 us -6.03% FAIL
I16 I64 false 2^24 0.544 60.867 us 2.40% 56.820 us 2.93% -4.047 us -6.65% FAIL
I16 I64 false 2^28 0.544 790.346 us 0.86% 711.710 us 1.41% -78.636 us -9.95% FAIL
I16 I64 false 2^24 0 53.830 us 2.02% 50.361 us 2.31% -3.469 us -6.44% FAIL
I16 I64 false 2^28 0 639.579 us 0.65% 557.284 us 0.70% -82.295 us -12.87% FAIL
I32 I32 false 2^24 1 90.591 us 2.72% 90.830 us 2.73% 0.239 us 0.26% PASS
I32 I32 false 2^28 1 1.349 ms 0.97% 1.349 ms 0.99% 0.190 us 0.01% PASS
I32 I32 false 2^24 0.544 78.574 us 3.03% 78.791 us 3.01% 0.217 us 0.28% PASS
I32 I32 false 2^28 0.544 1.076 ms 1.11% 1.077 ms 1.09% 0.287 us 0.03% PASS
I32 I32 false 2^24 0 63.033 us 2.34% 63.264 us 2.33% 0.231 us 0.37% PASS
I32 I32 false 2^28 0 650.170 us 1.49% 649.780 us 1.48% -0.391 us -0.06% PASS
I32 I64 false 2^24 1 96.959 us 2.59% 96.000 us 2.62% -0.958 us -0.99% PASS
I32 I64 false 2^28 1 1.414 ms 0.82% 1.404 ms 0.85% -9.988 us -0.71% PASS
I32 I64 false 2^24 0.544 84.064 us 2.18% 83.355 us 2.44% -0.709 us -0.84% PASS
I32 I64 false 2^28 0.544 1.157 ms 0.74% 1.150 ms 0.78% -7.525 us -0.65% PASS
I32 I64 false 2^24 0 70.226 us 1.42% 69.709 us 1.46% -0.517 us -0.74% PASS
I32 I64 false 2^28 0 809.672 us 0.83% 798.024 us 0.83% -11.647 us -1.44% FAIL
I64 I32 false 2^24 1 166.578 us 2.15% 166.658 us 2.13% 0.080 us 0.05% PASS
I64 I32 false 2^28 1 2.568 ms 0.53% 2.564 ms 0.53% -3.462 us -0.13% PASS
I64 I32 false 2^24 0.544 138.569 us 3.36% 138.804 us 3.37% 0.235 us 0.17% PASS
I64 I32 false 2^28 0.544 2.034 ms 0.81% 2.037 ms 0.81% 3.152 us 0.15% PASS
I64 I32 false 2^24 0 97.963 us 2.86% 98.190 us 2.81% 0.227 us 0.23% PASS
I64 I32 false 2^28 0 1.234 ms 0.92% 1.234 ms 0.91% 0.278 us 0.02% PASS
I64 I64 false 2^24 1 173.789 us 2.38% 166.013 us 2.40% -7.776 us -4.47% FAIL
I64 I64 false 2^28 1 2.669 ms 0.53% 2.563 ms 0.50% -106.712 us -4.00% FAIL
I64 I64 false 2^24 0.544 143.140 us 2.84% 138.786 us 3.37% -4.354 us -3.04% FAIL
I64 I64 false 2^28 0.544 2.134 ms 0.54% 2.037 ms 0.81% -96.773 us -4.54% FAIL
I64 I64 false 2^24 0 105.736 us 1.96% 98.794 us 2.80% -6.942 us -6.57% FAIL
I64 I64 false 2^28 0 1.355 ms 0.71% 1.235 ms 0.91% -120.047 us -8.86% FAIL
I128 I32 false 2^24 1 345.373 us 1.80% 346.099 us 1.83% 0.726 us 0.21% PASS
I128 I32 false 2^28 1 5.380 ms 0.36% 5.390 ms 0.39% 9.730 us 0.18% PASS
I128 I32 false 2^24 0.544 283.231 us 1.74% 283.905 us 1.74% 0.674 us 0.24% PASS
I128 I32 false 2^28 0.544 4.300 ms 0.39% 4.301 ms 0.41% 1.485 us 0.03% PASS
I128 I32 false 2^24 0 186.207 us 2.79% 186.601 us 2.75% 0.394 us 0.21% PASS
I128 I32 false 2^28 0 2.532 ms 0.27% 2.534 ms 0.27% 1.774 us 0.07% PASS
I128 I64 false 2^24 1 351.132 us 1.36% 348.200 us 1.53% -2.933 us -0.84% PASS
I128 I64 false 2^28 1 5.457 ms 0.27% 5.406 ms 0.38% -50.251 us -0.92% FAIL
I128 I64 false 2^24 0.544 288.054 us 1.67% 285.283 us 1.81% -2.771 us -0.96% PASS
I128 I64 false 2^28 0.544 4.385 ms 0.34% 4.330 ms 0.34% -54.156 us -1.24% FAIL
I128 I64 false 2^24 0 191.855 us 2.49% 188.373 us 2.80% -3.482 us -1.81% PASS
I128 I64 false 2^28 0 2.624 ms 0.23% 2.566 ms 0.28% -57.676 us -2.20% FAIL
F32 I32 false 2^24 1 91.553 us 2.81% 91.013 us 2.85% -0.540 us -0.59% PASS
F32 I32 false 2^28 1 1.375 ms 0.93% 1.373 ms 1.01% -1.424 us -0.10% PASS
F32 I32 false 2^24 0.544 65.137 us 2.44% 64.686 us 2.50% -0.451 us -0.69% PASS
F32 I32 false 2^28 0.544 775.675 us 1.02% 774.611 us 1.02% -1.064 us -0.14% PASS
F32 I32 false 2^24 0 63.194 us 2.20% 62.928 us 2.24% -0.267 us -0.42% PASS
F32 I32 false 2^28 0 651.877 us 1.47% 651.348 us 1.46% -0.529 us -0.08% PASS
F32 I64 false 2^24 1 97.444 us 2.56% 96.018 us 2.61% -1.427 us -1.46% PASS
F32 I64 false 2^28 1 1.444 ms 0.83% 1.435 ms 0.85% -8.501 us -0.59% PASS
F32 I64 false 2^24 0.544 72.372 us 1.49% 70.918 us 1.59% -1.454 us -2.01% FAIL
F32 I64 false 2^28 0.544 883.511 us 0.67% 861.978 us 0.73% -21.533 us -2.44% FAIL
F32 I64 false 2^24 0 70.209 us 1.29% 69.442 us 1.28% -0.767 us -1.09% PASS
F32 I64 false 2^28 0 808.316 us 0.83% 803.031 us 0.83% -5.285 us -0.65% PASS
F64 I32 false 2^24 1 166.529 us 2.13% 166.014 us 2.10% -0.516 us -0.31% PASS
F64 I32 false 2^28 1 2.565 ms 0.49% 2.560 ms 0.57% -4.203 us -0.16% PASS
F64 I32 false 2^24 0.544 108.211 us 3.30% 107.782 us 3.30% -0.429 us -0.40% PASS
F64 I32 false 2^28 0.544 1.470 ms 0.84% 1.469 ms 0.84% -1.324 us -0.09% PASS
F64 I32 false 2^24 0 98.102 us 2.62% 98.053 us 2.67% -0.049 us -0.05% PASS
F64 I32 false 2^28 0 1.231 ms 0.92% 1.231 ms 0.92% 0.603 us 0.05% PASS
F64 I64 false 2^24 1 173.501 us 2.38% 166.231 us 2.48% -7.271 us -4.19% FAIL
F64 I64 false 2^28 1 2.663 ms 0.56% 2.565 ms 0.54% -98.074 us -3.68% FAIL
F64 I64 false 2^24 0.544 112.153 us 2.35% 107.996 us 3.28% -4.157 us -3.71% FAIL
F64 I64 false 2^28 0.544 1.558 ms 0.58% 1.466 ms 0.83% -91.126 us -5.85% FAIL
F64 I64 false 2^24 0 104.005 us 1.79% 98.200 us 2.60% -5.805 us -5.58% FAIL
F64 I64 false 2^28 0 1.322 ms 0.76% 1.231 ms 0.91% -90.964 us -6.88% FAIL

@elstehle elstehle changed the title Exp/mitigate 64 bitpacked perf degrad [DRAFT]: Experimental: Try to mitigate 64-bit-packed tile state performance degradation Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant