-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opportunities to improve TE Platform-Plaintext benchmark #63059
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/area-system-buffers, @GrabYourPitchforks Issue DetailsPlatform-Plaintext TechEmpower Benchmark where dotnet shows impressive results (2nd place, ~12.4mln RPS on our PerfLab hardware) seems to be slightly bottlenecked by three SpanHelpers.IndexOf[Any] functions: e.g. I tried to rewrite them by hands e.g. I changed some branches, removed "Duff's devices" (even for cases where needed char is found at position = 0 my impl is still faster). Here is a standalone benchmark project with test data extracted from the Platform-Plaintext benchmark. As the result, I constantly see stable improvements around 1-3% (never slower):
^ with PGO. As you can see from https://aka.ms/aspnet/benchmarks (17th page, "full" checked) the best results we've seen were around 12.42M RPS Same relative improvements can be observed in non-PGO mode (default). Here is the script I used to benchmark it:
Standalone benchmark: https://gist.github.com/EgorBo/1d059726dae285e3a1db501896e8a1bd /cc @stephentoub @GrabYourPitchforks
|
Is it possible that this could regress workloads that don't look like this one? Did you get a chance to run the IndexOf/IndexOfAny benchmarks in dotnet/performance (not sure how good they are but they're there) |
Yes, it's always about trade-offs, but what I'm 100% sure in that we can add a "two 256bit vectors per iteration" path for arrays >= 64 elements (bytes) without hurting other cases https://gist.github.com/EgorBo/1d059726dae285e3a1db501896e8a1bd#file-faster_spanhelpers_indexof-cs-L148-L186. so it will help us to find
faster 🙂 |
Also, it seems there are optimization opportunities inside the caller of that IndexOfAny - ParseHeaders Currently we do Who knows maybe we can cross 13M RPS 😄 |
I thought we were on a path/plan to switch to using Vector128 in all of these implementations. Is that not the case, @tannergooding? |
Is there a reason? Also, I believe all the instructions involved don't cause downlclocking (PL0 aka Power License 0) especially on newer CPUs where all avx2 instructions don't do it. Finding the end-line symbol in this header:
is twice slower with SSE. |
Moved to dotnet/aspnet as a PR dotnet/aspnetcore#39216 |
Platform-Plaintext TechEmpower Benchmark where dotnet shows impressive results (2nd place, up to ~12.4mln requests per second (RPS) on our PerfLab hardware with bigger network bandwidth) seems to be slightly bottlenecked by three
SpanHelpers.IndexOf[Any]
functions:^ Linux-x64
e.g.
IndexOfAny(val0, val1)
mostly tries to find\n
or\r
in ASCII strings of length = 26, 49 and 151 where needed symbols usually found at positions 21, 22 and 100 (http headers)I tried to rewrite them by hands e.g. I changed some branches, removed "Duff's devices", added "two 256bit vectors per iteration" path. Here is a standalone benchmark project with test data extracted from the Platform-Plaintext benchmark.
As the result, I constantly see stable improvements around 1-3% (never slower):
^ with PGO. As you can see from https://aka.ms/aspnet/benchmarks (17th page, "full" checked) the best results we've ever seen were 12.42M RPS so 12.68M does look like an improvement:
Same relative improvements can be observed in non-PGO mode (default).
Here is the script I used to benchmark it:
Standalone benchmark: https://gist.github.com/EgorBo/1d059726dae285e3a1db501896e8a1bd
Commit in dotnet/runtime: EgorBo@b2ee6ad
/cc @stephentoub @GrabYourPitchforks
The text was updated successfully, but these errors were encountered: