-
Notifications
You must be signed in to change notification settings - Fork 528
Conversation
Related: #411 which removes all string allocations on repeated requests on a keep alive - however this resolves common allocators across all requests |
|
||
if (httpVersion != null) | ||
{ | ||
for (int i = 0; i < 8; i++) scan.Take(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can re avoid this by having PeekLong advance the iterator for us.
@benaadams After discussing #411 with @davidfowl and @CesarBS we were thinking about closing it due to the relative complexity of managing a per-connection invalidating cache to reduce string allocations. I didn't what to start that discussion before offering an alternative, and this is it. I prefer this approach, because it has a better (or at least easier to analyze) worst case performance both memory and cpu-wise. This might also have a better memory footprint per-connection since we don't have to allocate a new cache each time. @benaadams What do you think? |
{ | ||
httpMethod = HttpDeleteMethod; | ||
} | ||
else if (((scanLong ^ _httpGetMethodLong) << 32) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpicky, but we might as well check for GETs and POSTs first since I assume they are the most common methods.
@halter73 👍 to this as it resolves expected shared strings in known locations; also the two changes aren't in conflict. Want a discussion over on other issue? Or was this the discussion? 😉 |
You are right that the two changes aren't mutually exclusive, but I'm still not sold on the StringPool yet. I was hoping this change would help convince you that the StringPool isn't necessary 😉 We can can continue discussing that over on the other issue though. |
I was able to optimize this even further. Given we have a set of 11 known strings that will never change, I went looking for a divisor that would yield a unique modulo value for each string's long representation. Turns out there is such a value (37). So now we have a perfect hash of the known strings and matching the input to those is just a matter of clearing uninteresting bits in the input and looking up the resulting value, then making sure the longs are actually the same. |
@@ -0,0 +1,3 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
Now we're cooking 👍 |
I think we need to make sure that somebody won't accidentally get hash collision when adding new string to known strings. And add more comments to intialization code, because without this issue context it looks spooky |
@pakrym Goot point 👍 I'll add comments explaining what is going on. |
d509edf
to
bf927e5
Compare
This is really unfortunate, but the 30% figure from my initial tests where for very simple requests (no headers, I should've thought better about that). With more realistic requests the reduction is a lot smaller (around 1%) 😞 I'm looking for more places where I can apply a similar optimization. |
LGTM Only comment would be |
I'm waiting on verification that this will work on big-endian architectures before merging. |
@halter73 It's going to be hard to check that. Raspberry Pi distros set the processor to little endian. Actually little endian seems to be the default on ARM environments. I'm not sure it's worth verifying this right now. I could enter a bug to track that and move on with this change. |
I've emailed @stephentoub asking if they test Core on big endian. If they do I might try the same as them to test this. |
Would there be problems with the Frame header collection also? |
@benaadams Can you elaborate? I don't see what you mean. |
@benaadams Oh, with regard to endianess, you mean? I hadn't looked at that code, but it's likely it might be affected by it. |
cbc261c
to
83f55c5
Compare
@stephentoub replied that they don't test Core on big endian. Again I'd say we can postpone verification on big endian. |
@CesarBS Aside from potentially addressing further feedback, do you think this PR is complete? |
I'd suggest at least adding a |
@halter73 Yes. |
@stephentoub I'd rather not. This code likely does not work on big endian, but it doesn't break things. It'll just go through a less optimized path. |
I've not looked at the code in depth. Just looked like wrong values would be computed in big endian. If that's not true, then an assert isn't valuable. But if running on a big endian system would start resulting in erroneous results, an assert would help to point out the problem immediately, and there's little downside. Like I said, though, I've not looked much at the code, so you know better than I whether there's a problem. |
@stephentoub On a second thought, I noticed someone could craft some weird requests (not necessarily malicious) and this would not behave as expected. I'll add the assert. |
83f55c5
to
56a5cd1
Compare
Went with a slightly different approach. Instead of an assert I'm just checking |
56a5cd1
to
20e6862
Compare
Squashed. |
Branchmarks: dev
This PR
@CesarBS ran those several times and the RPS was consistently around those marks for each branch. |
LGTM! 👍 |
20e6862
to
49439e8
Compare
Ping. |
Yay 😀 |
49439e8
to
349af50
Compare
There is a fixed set of methods and a fixed set of HTTP versions that we'll see in most requests. When we see those we can reuse the same string instead of allocating a new one. If not, we fall back to allocating a new string.
Since all methods we're checking for plus the HTTP versions fit in 8 bytes, I'm pre-computing longs containing those bytes to make the comparisons faster.
cc @halter73 @davidfowl @benaadams @DamianEdwards