Resuse writes, initalize queues #363

benaadams · 2015-11-11T12:29:01Z

Reuse UvWriteReq on the socket rather than recreating for each write - pooled at per connection level.

Pre initialize the circular buffers; rather than resizing as work is added; with a copy on resize.

Saves 1.12 MB per 4k requests; see #363 (comment) for details.

Resolves #165 Pool UvWriteReq objects
Resolves #289 High Allocator DoWriteIfNeeded
Resolves #290 High Allocator SocketOutput.Write

halter73 · 2015-11-13T01:18:36Z

@benaadams When profiling, we see a fair amount of time spent resizing queues (though this is mostly in SocketOutput). Why did you close this PR?

benaadams · 2015-11-13T06:05:36Z

I didn't feel it was worthwhile to go round the Travis repeat loop for to try to get to pass; will rebase.

benaadams · 2015-11-13T12:54:12Z

Rebased and added SocketOutput

benaadams · 2015-11-13T15:51:56Z

Allocation reductions for 8k requests

Before 32,023 objects; 2,032,968 bytes

After 12,059 objects; 947,168 bytes

37% of the objects than before; 46% of the bytes of before

halter73 · 2015-11-13T18:32:23Z

I think @CesarBS is pretty close to fixing our travis woes. I wouldn't bother going through the travis repeat loop until that gets in. Sorry that it's been such a pain for you.

I should be able kick off reruns for all failing PRs myself once that's in.

benaadams · 2015-11-14T02:59:29Z

Removed the allocs from DoWriteIfNeeded so down by another 212800 bytes

benaadams · 2015-11-14T03:03:03Z

And from UvWriteReq.Write

benaadams · 2015-11-14T03:14:25Z

Boom! Use MemoryPool2 rather than new byte[buffer.Count] in write and SocketOutput alloc free; after the first request on the connection.

DamianEdwards · 2015-11-18T04:30:42Z

Rebased on 7e8a405 and measured. Basically the same. I really need to profile it to see if the size we're setting them to is high enough.

dev (7e8a405) - 499k
363 (2a32d33) - 501k (+0.4%)

AspNetSmurfLab · 2015-11-18T04:36:02Z

Still spending some time in resizing it seems:

benaadams · 2015-11-18T09:45:03Z

@DamianEdwards that's quite a nice tool; what are you using? I'm have a hard time measuring GC with core; but 44.5% still; ouch!

mattwarren · 2015-11-18T11:58:43Z

@benaadams I'm pretty sure that screenshot is from JetBrains DotTrace, not sure which version though.

khellang · 2015-11-18T12:16:20Z

Yep, looks like and older version of dotTrace 😄

benaadams · 2015-11-18T12:39:29Z

With 45% GC load though all bets are off on algorithmic improvements; need to get that overhead down first.

mattwarren · 2015-11-18T13:33:40Z

Out of interest, which version of the GC were these tests run under? Server, Client, Background Server, Concurrent, etc?

benaadams · 2015-11-18T14:06:38Z

@DamianEdwards could you retest? Obv resizing the queues will add to GC pressure.

Have increased the queues from and array of 16 pointers to an array of 64 pointers - shouldn't cause much difference in allocation expense i.e. objects allocated is the same and they are reused.

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

AspNetSmurfLab · 2015-12-08T05:19:59Z

Here are the numbers from wrk when running the plaintext benchmark with and without this change:

Baseline:

wrk -c 256 -t 32 -d 30 -s ./scripts/pipeline.lua http://10.0.0.100:5001/plaintext -- 16

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.00ms   37.09ms 443.35ms   92.45%
    Req/Sec    21.68k     2.62k  139.04k    89.94%
  20722668 requests in 30.10s, 2.55GB read
Requests/sec: 688462.29
Transfer/sec:     86.67MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.18ms   36.59ms 640.33ms   90.76%
    Req/Sec    21.75k     2.69k  146.35k    90.92%
  20786049 requests in 30.10s, 2.56GB read
Requests/sec: 690577.45
Transfer/sec:     86.93MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.44ms   45.55ms   1.09s    92.51%
    Req/Sec    21.65k     2.28k  185.05k    88.12%
  20695354 requests in 30.10s, 2.54GB read
Requests/sec: 687599.80
Transfer/sec:     86.56MB

After merge:

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.26ms    5.24ms 137.84ms   95.14%
    Req/Sec    21.69k     1.11k   34.26k    73.50%
  20739007 requests in 30.04s, 2.55GB read
Requests/sec: 690491.69
Transfer/sec:     86.92MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.24ms    5.42ms 218.06ms   95.36%
    Req/Sec    21.74k     1.07k   48.29k    72.91%
  20789200 requests in 30.05s, 2.56GB read
Requests/sec: 691930.78
Transfer/sec:     87.10MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.21ms    5.03ms 150.64ms   94.96%
    Req/Sec    21.58k     1.03k   30.87k    69.44%
  20637557 requests in 30.03s, 2.54GB read
Requests/sec: 687156.69
Transfer/sec:     86.50MB

Nothing too significant RPS-wise.

benaadams · 2015-12-08T12:38:14Z

@halter73 Yeah, but look at latency...!

Measure	Before	After	%	Improvement
Avg	14 / 15 / 16 ms	4 ms	26%	x 3.8
StdDev	37 / 37 / 45 ms	5 ms	13%	x 7.7
StdDev%	92% < 51 / 52 / 61 ms	95% < 9 ms	17%	x 5.8
Max	443 / 640 / 1090ms	137 / 218 / 150ms	23%	x 4.3

That's a massive improvement; as well as reduced variability.

I'll take the x4 latency improvement with no detriment to RPS 👍

That's like being 1,000 km closer to your customers.. Can always increase bandwidth, can't increase the speed of light.

benaadams · 2015-12-08T13:30:21Z

Rebased - tests pass; the App unhappy with 'LogLevel' does not contain a definition for 'Trace'

muratg · 2015-12-08T16:47:58Z

@benaadams LogLevel.Verbose was recently renamed to LogLevel.Trace (aspnet/Announcements#124). Do you have the up-to-date logging packages?

khellang · 2015-12-08T16:51:13Z

@muratg It fails both locally and on the CI server; https://ci.appveyor.com/project/aspnet-ci/kestrelhttpserver/build/1.0.867#L509.

Also, the Travis build seems to be stuck; https://travis-ci.org/aspnet/KestrelHttpServer/pull_requests

muratg · 2015-12-08T16:56:20Z

I guess the updated packages were not pushed to the feeds yet for some reason. cc: @pranavkm

benaadams · 2015-12-08T18:24:16Z

@muratg timing? Feeds updated 20 hours ago logging change went in 19 hours ago.

halter73 · 2015-12-08T19:42:36Z

We're going to need to get new numbers: #458 (comment)

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

…rk' into default-work-queuesizes

benaadams · 2015-12-09T08:46:37Z

Merged #427 with this as it contains a regress fix.

halter73 · 2015-12-10T01:18:03Z

@benaadams What do you think of this idea of pooling UvWriteReqs instead of WriteContext? I started this change with the intention of continuing to pool WriteContexts, but have a shared pool per thread. I quickly realized this wouldn't be possible (without locking or a thread-safe data structure) since SocketOutput.WriteAsync is called from an arbitrary ThreadPool thread.

So at the cost of one extra allocation (that of the WriteContext), we can now have a bigger pool shared by all connections managed by a given thread. I am thinking about opening a PR with this change, but since this is an alternative to pooling WriteContexts, I figured we could discuss it here first.

benaadams · 2015-12-10T02:22:29Z

@halter73 Well... minimising UvWriteReq in all ways can only be a good thing being both a finalized object and having allocated native memory - so a wider sharing is probably good especially if it can be thread safe without requiring extra synchronisation.

However... WriteContext is a pretty big object; so what about pooling the UvWriteReq on the thread as you've done; but also pooling the WriteContext on the socket as it was before (so two queue references)?

halter73 · 2015-12-10T07:18:30Z

I'm not sure WriteContext is big enough to be worth pooling. Isn't the UvWriteReq the main component of WriteContext that currently makes it big?

The WriteContext pool only living for the lifetime of a connection instead of the lifetime of the server also makes me disinclined to add it.

benaadams · 2015-12-10T11:45:06Z

On x64 the size of WriteContext without the UvWriteReq allocation is

Component	bytes	total
MemoryPoolIterator2 _lockedStart;	-	0
ptr _block;	8	8
int _index;	4	12
MemoryPoolIterator2 _lockedEnd;	-	12
ptr _block;	8	20
int _index;	4	24
int _bufferCount;	4	28
int ByteCount;	4	32
ptr Self;	8	40
bool SocketShutdownSend;	4	44
bool SocketDisconnect;	4	48
int WriteStatus;	4	52
ptr WriteError;	8	60
ptr _writeReq;	8	68
int ShutdownSendStatus;	4	72
object _syncblk	4	76
object TypeHandle	8	84
object alignment	4	88

88 bytes; for 4k requests (write+shutdown+disconnect) where they only have a single write its 1 MB

The number of writes would be higher than the number of requests as there is generally more than 1 write per request.

However, I agree with moving UvWriteReq to a more global pool rather than pooling with WriteContext as it has higher costs (finalizer+native memory)

benaadams · 2015-12-10T12:49:06Z

@halter73 which would be this on top of yours; look good?

Added PR "SocketOutput Allocation Reductions" #466 with it.

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

dnfclas added the cla-already-signed label Nov 11, 2015

benaadams closed this Nov 11, 2015

benaadams reopened this Nov 13, 2015

dnfclas added the cla-already-signed label Nov 13, 2015

benaadams force-pushed the default-work-queuesizes branch from d0123f3 to e331b86 Compare November 13, 2015 12:52

benaadams force-pushed the default-work-queuesizes branch 2 times, most recently from 4746d28 to 36b71f8 Compare November 13, 2015 13:26

benaadams force-pushed the default-work-queuesizes branch 2 times, most recently from 904ee08 to 9037939 Compare November 13, 2015 20:15

benaadams force-pushed the default-work-queuesizes branch from f85f123 to 65221c8 Compare November 14, 2015 09:09

benaadams mentioned this pull request Nov 14, 2015

[For Testing] Bundled Memory Changes #386

Closed

benaadams force-pushed the default-work-queuesizes branch from 65221c8 to 2a32d33 Compare November 17, 2015 10:47

davidfowl added the needs measuring label Nov 17, 2015

DamianEdwards removed the needs measuring label Nov 18, 2015

benaadams added a commit to benaadams/KestrelHttpServer that referenced this pull request Dec 4, 2015

Process cascaded work immediately

7bd14bd

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

benaadams mentioned this pull request Dec 7, 2015

[Testing] Combo changes #458

Closed

halter73 removed the needs measuring label Dec 8, 2015

benaadams force-pushed the default-work-queuesizes branch from f2e60cc to 7236d4a Compare December 8, 2015 13:30

benaadams closed this Dec 9, 2015

benaadams reopened this Dec 9, 2015

dnfclas added the cla-already-signed label Dec 9, 2015

benaadams added 3 commits December 9, 2015 08:38

Resuse writes, initalize queues

9f58b61

Process cascaded work immediately

d72e6ae

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

Merge remote-tracking branch 'refs/remotes/origin/Process-cascaded-wo…

4accd88

…rk' into default-work-queuesizes

benaadams force-pushed the default-work-queuesizes branch from 7236d4a to 4accd88 Compare December 9, 2015 08:40

benaadams mentioned this pull request Dec 10, 2015

SocketOutput Allocation Reductions #466

Merged

benaadams closed this Dec 10, 2015

benaadams added a commit to benaadams/KestrelHttpServer that referenced this pull request Dec 10, 2015

Process cascaded work immediately

e897bc3

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

benaadams added a commit to benaadams/KestrelHttpServer that referenced this pull request Dec 10, 2015

Process cascaded work immediately

992664e

Without waiting for next libuv pass Fix for potential regression in aspnet#363 due to bug fix.

benaadams deleted the default-work-queuesizes branch May 10, 2016 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resuse writes, initalize queues #363

Resuse writes, initalize queues #363

benaadams commented Nov 11, 2015

halter73 commented Nov 13, 2015

benaadams commented Nov 13, 2015

benaadams commented Nov 13, 2015

benaadams commented Nov 13, 2015

halter73 commented Nov 13, 2015

benaadams commented Nov 14, 2015

benaadams commented Nov 14, 2015

benaadams commented Nov 14, 2015

DamianEdwards commented Nov 18, 2015

AspNetSmurfLab commented Nov 18, 2015

benaadams commented Nov 18, 2015

mattwarren commented Nov 18, 2015

khellang commented Nov 18, 2015

benaadams commented Nov 18, 2015

mattwarren commented Nov 18, 2015

benaadams commented Nov 18, 2015

AspNetSmurfLab commented Dec 8, 2015

benaadams commented Dec 8, 2015

benaadams commented Dec 8, 2015

muratg commented Dec 8, 2015

khellang commented Dec 8, 2015

muratg commented Dec 8, 2015

benaadams commented Dec 8, 2015

halter73 commented Dec 8, 2015

benaadams commented Dec 9, 2015

halter73 commented Dec 10, 2015

benaadams commented Dec 10, 2015

halter73 commented Dec 10, 2015

benaadams commented Dec 10, 2015

benaadams commented Dec 10, 2015

Resuse writes, initalize queues #363

Resuse writes, initalize queues #363

Conversation

benaadams commented Nov 11, 2015

halter73 commented Nov 13, 2015

benaadams commented Nov 13, 2015

benaadams commented Nov 13, 2015

benaadams commented Nov 13, 2015

halter73 commented Nov 13, 2015

benaadams commented Nov 14, 2015

benaadams commented Nov 14, 2015

benaadams commented Nov 14, 2015

DamianEdwards commented Nov 18, 2015

AspNetSmurfLab commented Nov 18, 2015

benaadams commented Nov 18, 2015

mattwarren commented Nov 18, 2015

khellang commented Nov 18, 2015

benaadams commented Nov 18, 2015

mattwarren commented Nov 18, 2015

benaadams commented Nov 18, 2015

AspNetSmurfLab commented Dec 8, 2015

benaadams commented Dec 8, 2015

benaadams commented Dec 8, 2015

muratg commented Dec 8, 2015

khellang commented Dec 8, 2015

muratg commented Dec 8, 2015

benaadams commented Dec 8, 2015

halter73 commented Dec 8, 2015

benaadams commented Dec 9, 2015

halter73 commented Dec 10, 2015

benaadams commented Dec 10, 2015

halter73 commented Dec 10, 2015

benaadams commented Dec 10, 2015

benaadams commented Dec 10, 2015