-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster ParseHeaders #39216
Faster ParseHeaders #39216
Conversation
Thanks, didn't see those |
Couldn't you use the bitmask to refer to the second find, rather than searching again? i.e. the vector match has found all the matches; so can look at them one after another until run out of matched bits and then vector search again? |
Not sure I understand, do you mean like if we found ":" let's check what else we found in the current vector? E.g. for this benchmark we only deal with these:
In real world we might see something like this (Chrome):
Ah, for some of them we indeed can see \r in the same vector 🤷♂️ |
This I'd love to prototype an optimization for JIT someday where for some high-perf primitives like IndexOf, Memmove, etc we will be inserting additional polling in tier0, e.g. |
/benchmark plaintext aspnet-citrine-lin kestrel |
Benchmark started for plaintext on aspnet-citrine-lin with kestrel |
Failed to benchmark PR #39216. Skipping... Details:
|
Wow! Is that a new thing? Can we have it in dotnet/runtime 😄 |
@EgorBo maybe you should pay attention to your emails ... |
Oops, I'm sorry, somehow I missed that one 😢 |
The benchmark failed because of code design rules. I will need to improve the reports to display the build errors if any instead of this random stack trace.
|
will fix the formatting |
/benchmark plaintext aspnet-citrine-lin kestrel |
;) Does it mean you can't actually merge PRs on this repository? Also I might need to render this message too. |
/benchmark plaintext aspnet-citrine-lin kestrel |
Benchmark started for plaintext on aspnet-citrine-lin with kestrel |
plaintext - aspnet-citrine-lin
|
hm.. weird, I did see numbers like 12M with this PR |
/benchmark plaintext aspnet-citrine-lin kestrel |
@sebastienros Can we run a benchmark with more request headers? Something like the Chrome headers referenced earlier?
|
Benchmark started for plaintext on aspnet-citrine-lin with kestrel |
plaintext - aspnet-citrine-lin
|
/benchmark plaintext aspnet-citrine-lin,aspnet-citrine-win,aspnet-citrine-amd,aspnet-perf-lin kestrel |
Benchmark started for plaintext on aspnet-citrine-lin, aspnet-citrine-win, aspnet-citrine-amd, aspnet-perf-lin with kestrel |
An error occured, please check the logs |
much better 😄 |
/benchmark plaintext aspnet-citrine-win,aspnet-citrine-amd,aspnet-perf-lin kestrel |
Benchmark started for plaintext on aspnet-citrine-win, aspnet-citrine-amd, aspnet-perf-lin with kestrel |
An error occured, please check the logs |
@halter73 so is this PR worth finishing? It almost is, it only reports invalid headers differently so the tests don't like it. I am asking because of #39216 (comment) |
@sebastienros could you please kick a citrine-arm run? I expect to see 2-3% improvement |
/benchmark plaintext aspnet-citrine-arm kestrel |
Benchmark started for plaintext on aspnet-citrine-arm with kestrel |
plaintext - aspnet-citrine-arm
|
Good news is that the benchmark worked, right ;) |
btw how do you deal with flaky benchmarks? I see that PlatformPlaintext jumps from 6.67M RPS to 6.3M on arm - it's more than 5% difference 😮 |
The arm machine we use doesn't produce stable results unfortunately. The ones on INTEL are much more stable. I could add more iterations automatically for ARM but I'd rather just wait for the new machines. |
Yeah if I run it multiple times I see almost 6.8M RPS locally (with a fix for faster comparison against zero vector) while the best official result is ~6.4M |
@davidfowl Assuming we get a consistent >2% improvement in the modified plaintext benchmark (using Chrome request headers), would we be willing to take the unsafe code? I would be personally. If we take it (no promises yet), we'll need to make sure we don't break on big-endian architectures. We should also reject the same requests we did before (e.g. |
Ping @davidfowl, this seems to be blocked on your input. |
Judging by the native traces from Platform-Plaintext TE benchmark (linux-x64) it seems we spend some noticeable time in
ParseHeaders
:Linux-x64-Xeon:
Linux-arm64-v8.0
The current algorithm walks span of data to find "\r\n" and then it tries to extract name and value while doing some validation checks and trimming value. My implementation walks span just once, first it tries to find ":" using a sort of
IndexOfAny(span, ':', ' ', '\t', '\n', '\r')
function to find':'
's position and makes sure none of illegal symbols come before it. Then, it tries to find\r\n
starting from the previously found:
's position. After that we try to trim the value from both sides via plain loops. I assume in most cases the value will only have a single space or tab after ":".I haven't done Arm64 support (waiting for some feedback on this one first) and haven't tested AVX vs SSE yet, but from what I have here it seems to show quite some stable improvements for Platform-Plaintext (up to +300_000 RPS resulting in 12M requests per second):
^ peak RPS for my changes was
12,017,325
This benchmark is quite stable on Linux-x64 server (but Latency's metrics are not🙂) and the improvements show up every run
Test methodology
I was using crank like this:
where
Microsoft.AspNetCore.Server.Kestrel.Core.dll
is either a "baseline" or "baseline + my changes"TODO
"1:\r\n"
header-value line valid?Also, I didn't optimize the Multi-span case (where header is split between two reads) but from what I see it happens quite rarely (I can print some statistics from our benchmarks if you need it).