Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

[Testing] Combo changes #458

Closed
wants to merge 11 commits into from
Closed

Conversation

benaadams
Copy link
Contributor

Pre allocs 213,806 objects allocated 13,118,880 bytes allocated
Post allocs "only" 93,754 objects allocated, 5,037,880 bytes allocated

Objects down 56%; memory down 62%

Resuse writes, initalize queues #363
Process cascaded work immediately #427
Trim req/resp streams by a field each #455
Request header StringCache #411

With
Use PooledContextFactory aspnet/Benchmarks#34
Allow feature cache to be updated aspnet/HttpAbstractions#501

/cc @halter73 combo'd

@benaadams
Copy link
Contributor Author

4k requests, 16 pipelining, firefox

Pre allocs 213,806 objects allocated 13,118,880 bytes allocated
Post allocs "only" 93,754 objects allocated, 5,037,880 bytes allocated

Objects down 56%; memory down 62%

Pre allocation distribution

alloc-pre-profile

Post allocation distribution:

alloc-post-profile

@AspNetSmurfLab
Copy link

Here are the numbers from wrk when running the plaintext benchmark with and without these changes:

Baseline:

wrk -c 256 -t 32 -d 30 -s ./scripts/pipeline.lua http://10.0.0.100:5001/plaintext -- 16

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.67ms   16.71ms 352.12ms   95.24%
    Req/Sec    21.64k     2.35k  163.01k    96.07%
  20689840 requests in 30.10s, 2.54GB read
Requests/sec: 687375.40
Transfer/sec:     86.53MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.69ms   16.66ms 371.33ms   94.92%
    Req/Sec    21.55k     2.13k  189.32k    92.44%
  20600781 requests in 30.10s, 2.53GB read
Requests/sec: 684421.20
Transfer/sec:     86.16MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.04ms   14.21ms 384.10ms   95.46%
    Req/Sec    21.63k     1.56k   48.00k    85.55%
  20647251 requests in 30.03s, 2.54GB read
Requests/sec: 687611.44
Transfer/sec:     86.56MB

After merge (#486):

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.06ms    3.97ms  92.93ms   93.23%
    Req/Sec    21.59k     1.04k   31.70k    71.16%
  20647844 requests in 30.04s, 2.54GB read
Requests/sec: 687408.76
Transfer/sec:     86.53MB


Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.10ms    4.01ms  81.76ms   93.29%
    Req/Sec    21.53k     1.18k   43.50k    76.14%
  20583514 requests in 30.04s, 2.53GB read
Requests/sec: 685172.74
Transfer/sec:     86.25MB


Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.07ms    4.24ms 154.14ms   92.71%
    Req/Sec    21.50k     1.12k   34.26k    72.66%
  20559265 requests in 30.03s, 2.53GB read
Requests/sec: 684572.10
Transfer/sec:     86.18MB

@benaadams
Copy link
Contributor Author

Latency is good, not sure about the stringcache change

@benaadams benaadams closed this Dec 8, 2015
@halter73
Copy link
Member

halter73 commented Dec 8, 2015

I screwed up the test runs last night 😢

The benchmarks repo still had Kestrel referenced in the sln file, but it had been removed from global.json. I was testing the Kestrel package each time. I'm going to have to rerun all the tests.

@benaadams
Copy link
Contributor Author

Make sure not debug also ;-)

@benaadams
Copy link
Contributor Author

Want me to rebase stuffs?

@AspNetSmurfLab
Copy link

No need to rebase for now, I just picked a baseline from yesterday. I also verified that the merges yesterday did not degrade performance. The combined changes

New results (about a 4% improvement!):

Baseline (dff3a4f):

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.33ms   32.51ms 617.34ms   92.28%
    Req/Sec    24.76k     3.11k  167.97k    92.54%
  23629859 requests in 30.10s, 2.90GB read
Requests/sec: 785068.60
Transfer/sec:     98.83MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.11ms   23.88ms 369.65ms   93.03%
    Req/Sec    24.68k     2.29k  187.74k    87.32%
  23590851 requests in 30.10s, 2.90GB read
Requests/sec: 783763.63
Transfer/sec:     98.66MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.57ms   28.19ms 484.16ms   92.35%
    Req/Sec    24.77k     2.55k  113.82k    90.24%
  23677061 requests in 30.10s, 2.91GB read
Requests/sec: 786644.89
Transfer/sec:     99.03MB

Merged (benaadams/current-combo):

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.74ms   22.52ms 391.71ms   94.50%
    Req/Sec    25.73k     2.54k  121.34k    94.41%
  24597841 requests in 30.10s, 3.02GB read
Requests/sec: 817232.07
Transfer/sec:    102.88MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.59ms   22.61ms 327.02ms   94.79%
    Req/Sec    25.77k     2.40k  212.82k    91.68%
  24632813 requests in 30.10s, 3.03GB read
Requests/sec: 818379.20
Transfer/sec:    103.02MB

Running 30s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.75ms   20.02ms 318.43ms   95.34%
    Req/Sec    25.81k     2.29k  103.40k    92.55%
  24680686 requests in 30.10s, 3.03GB read
Requests/sec: 819959.94
Transfer/sec:    103.22MB

@benaadams
Copy link
Contributor Author

And better latency - but prob better leaving the StringCache out for now and looking at it in isolation post changes.

@AspNetSmurfLab
Copy link

I noticed the baseline numbers were changing a bit (maybe because the cpu was heating up?) so I reran this with a 15 second duration. I no longer see the improvement in the RPS numbers, but the latency still looks a bit better on average:

Baseline (origin/dev 90ece57):

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.53ms   36.99ms 402.00ms   90.46%
    Req/Sec    24.89k     2.35k   52.63k    79.11%
  11932727 requests in 15.10s, 1.47GB read
Requests/sec: 790247.14
Transfer/sec:     99.48MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.49ms   36.99ms 447.96ms   88.60%
    Req/Sec    24.84k     2.19k   46.08k    76.67%
  11909480 requests in 15.10s, 1.46GB read
Requests/sec: 788732.15
Transfer/sec:     99.29MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.56ms   33.26ms 388.83ms   88.75%
    Req/Sec    25.00k     2.40k   59.52k    82.59%
  11974934 requests in 15.10s, 1.47GB read
Requests/sec: 793027.12
Transfer/sec:     99.83MB

Merged (benaadams/current-combo):

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.27ms   29.48ms 549.19ms   92.95%
    Req/Sec    24.81k     1.80k   47.30k    79.44%
  11899625 requests in 15.10s, 1.46GB read
Requests/sec: 788067.27
Transfer/sec:     99.21MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.81ms   23.14ms 495.63ms   92.19%
    Req/Sec    24.79k     2.15k   56.92k    87.45%
  11878076 requests in 15.10s, 1.46GB read
Requests/sec: 786631.93
Transfer/sec:     99.03MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.73ms   29.83ms 544.53ms   92.73%
    Req/Sec    24.91k     2.45k   50.65k    89.33%
  11888035 requests in 15.10s, 1.46GB read
Requests/sec: 787304.24
Transfer/sec:     99.11MB

@benaadams
Copy link
Contributor Author

Have you got balanced powersaving on? (is default) Try running

powercfg /setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c

Should switch it into High performance

@AspNetSmurfLab
Copy link

I ran powercfg /getactivescheme and I got Power Scheme GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c, so it looks like we're already good there:

I also tried forcing collections between work runs. This verifies this PR reduces gen 0 allocations, but otherwise it's pretty inconclusive:

Baseline (origin/dev dff3a4f) w/ forced gc between runs:

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.82ms    9.59ms 173.66ms   95.13%
    Req/Sec    24.91k     3.25k  122.61k    97.42%
  11920278 requests in 15.10s, 1.47GB read
Requests/sec: 789440.21
Transfer/sec:     99.38MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.61ms   12.90ms 207.50ms   94.72%
    Req/Sec    24.89k     3.27k  113.27k    95.42%
  11862925 requests in 15.10s, 1.46GB read
Requests/sec: 785653.00
Transfer/sec:     98.90MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.41ms   17.18ms 251.51ms   95.27%
    Req/Sec    25.12k     3.54k  157.96k    93.93%
  11927293 requests in 15.10s, 1.47GB read
Requests/sec: 789998.39
Transfer/sec:     99.45MB

Benchmark console output:

Database already seeded!
Database tests enabled
Hosting environment: Production
Now listening on: http://*:5001
Application started. Press Ctrl+C to shut down.

Press 'C' to force GC or any other key to display GC stats

Allocated: 96.34 MB
Gen 0: 36, Gen 1: 2, Gen 2: 1

Forcing GC... done!

Allocated: 33.08 MB
Gen 0: 37, Gen 1: 3, Gen 2: 2

Allocated: 155.44 MB
Gen 0: 76, Gen 1: 4, Gen 2: 2

Forcing GC... done!

Allocated: 33.09 MB
Gen 0: 77, Gen 1: 5, Gen 2: 3

Allocated: 207.97 MB
Gen 0: 116, Gen 1: 6, Gen 2: 3

Forcing GC... done!

Allocated: 33.10 MB
Gen 0: 117, Gen 1: 7, Gen 2: 4

Allocated: 138.33 MB
Gen 0: 158, Gen 1: 8, Gen 2: 4

Forcing GC... done!

Allocated: 33.09 MB
Gen 0: 159, Gen 1: 9, Gen 2: 5

Merged (benaadams/current-combo) w/ forced gc between runs:

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.95ms   17.12ms 238.74ms   96.30%
    Req/Sec    24.93k     3.31k  124.56k    97.52%
  11928980 requests in 15.10s, 1.47GB read
Requests/sec: 789989.00
Transfer/sec:     99.45MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.41ms   10.17ms 209.24ms   96.54%
    Req/Sec    25.12k     3.24k  123.10k    99.19%
  12020616 requests in 15.10s, 1.48GB read
Requests/sec: 796053.05
Transfer/sec:    100.21MB

Running 15s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.82ms   12.77ms 313.45ms   96.59%
    Req/Sec    24.89k     2.02k   63.02k    91.04%
  11924729 requests in 15.10s, 1.47GB read
Requests/sec: 789904.43
Transfer/sec:     99.44MB

Benchmark console output:

Database already seeded!
Database tests enabled
Hosting environment: Production
Now listening on: http://*:5001
Application started. Press Ctrl+C to shut down.

Press 'C' to force GC or any other key to display GC stats

Allocated: 169.51 MB
Gen 0: 23, Gen 1: 2, Gen 2: 1

Forcing GC... done!

Allocated: 33.56 MB
Gen 0: 24, Gen 1: 3, Gen 2: 2

Allocated: 99.57 MB
Gen 0: 53, Gen 1: 4, Gen 2: 2

Forcing GC... done!

Allocated: 33.56 MB
Gen 0: 54, Gen 1: 5, Gen 2: 3

Allocated: 91.44 MB
Gen 0: 85, Gen 1: 6, Gen 2: 3

Forcing GC... done!

Allocated: 33.56 MB
Gen 0: 86, Gen 1: 7, Gen 2: 4

Allocated: 80.77 MB
Gen 0: 116, Gen 1: 8, Gen 2: 4

Forcing GC... done!

Allocated: 33.56 MB
Gen 0: 117, Gen 1: 9, Gen 2: 5

Maybe the next step is to disable dynamic frequency scaling.

@benaadams
Copy link
Contributor Author

From my point of view, my preference would be to take

Resuse writes, initalize queues #363
Process cascaded work immediately #427
Reduce string allocations per request #441

Then swing back round to StringCache (which is in this combo) - I'm not convinced its value has been proved vs its complexity.

@NickCraver
Copy link
Contributor

What is the actual hardware platform here? You definitely want to disable C-States in BIOS (or UEFI) on the both ends - if it's one of a few current-gen vendors though, you'll need updates as these settings aren't respected.

Related note for others: You also can only trust Task Manager or CPU-Z type readings here as they pull the actual frequency via an Intel C API. WMI cannot be trusted, it lies and we've seen it do such repeatedly. Many things pulling from WMI report max frequency which isn't correct at all.

@benaadams
Copy link
Contributor Author

@NickCraver Xeon E5-1650 going on what's stated on aspnet/benchmarks

@NickCraver
Copy link
Contributor

Aye, I saw the processor which is a good testbed with the 3/3/4/5/6/6 stepping on the 3.2 -> 3.8 clocking (I'm assuming it's actually a v1 Sandy Bridge in play here and not v2 or v3). But, the BIOS controls stepping - so I'd love to see what the test is doing live here. Is it just 1 processor like the README seems to indicate? Is there any L2/L3 thrashing in play from an external process?

@benaadams
Copy link
Contributor Author

Will do some better analysis on the StringCache retracting that PR for now...

@benaadams benaadams closed this Dec 9, 2015
@benaadams benaadams deleted the current-combo branch May 10, 2016 02:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants