Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: significant performance regression on master #37937

Closed
2 of 3 tasks
mcollina opened this issue Mar 26, 2021 · 44 comments
Closed
2 of 3 tasks

http: significant performance regression on master #37937

mcollina opened this issue Mar 26, 2021 · 44 comments
Assignees
Labels
confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem.

Comments

@mcollina
Copy link
Member

mcollina commented Mar 26, 2021

  • Version: master vs v14 and v15
  • Platform: linux
  • Subsystem: http

What steps will reproduce the bug?

run:

'use strict'

const server = require('http').createServer(function (req, res) {
  res.setHeader('content-type', 'application/json; charset=utf-8')
  res.end(JSON.stringify({ hello: 'world' }))
})

server.listen(3000)

and then:

$ npm i autocannon -g
$ autocannon -c 100 -d 5 -p 10 localhost:3000

on v14.16 this produces:

$ autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max    │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 5 ms │ 13 ms │ 21 ms │ 27 ms │ 13.05 ms │ 5.93 ms │ 129 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 60543   │ 60543   │ 77439   │ 78079   │ 73916.8 │ 6747.59 │ 60531   │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 11.3 MB │ 11.3 MB │ 14.5 MB │ 14.6 MB │ 13.8 MB │ 1.26 MB │ 11.3 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

370k requests in 5.04s, 69.1 MB read

on master:

$ autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬───────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max    │
├─────────┼───────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 10 ms │ 14 ms │ 29 ms │ 54 ms │ 16.83 ms │ 8.54 ms │ 163 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 43007   │ 43007   │ 61151   │ 62143   │ 57724.8 │ 7388.63 │ 42988   │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 8.04 MB │ 8.04 MB │ 11.4 MB │ 11.6 MB │ 10.8 MB │ 1.38 MB │ 8.04 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

289k requests in 5.03s, 54 MB read

On master it also produces a significant amount of warnings:

(node:235900) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
(node:235900) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit

Update as of 2021/3/29 bisect from head a9cdeed:

@mcollina
Copy link
Member Author

cc @ronag

@mcollina mcollina added confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem. labels Mar 26, 2021
@Linkgoron
Copy link
Member

Linkgoron commented Mar 27, 2021

I think that the warning messages originated here: #36816

@mcollina
Copy link
Member Author

I think that the warning messages originated here: #36816

I think so as well.

@ronag
Copy link
Member

ronag commented Mar 27, 2021

I will take a look as soon as I can

@mcollina
Copy link
Member Author

I'm looking into it as well!

@ronag
Copy link
Member

ronag commented Mar 27, 2021

I will have time tonight. Please keep me posted if you find anything.

@mcollina
Copy link
Member Author

A couple of notes:

#36816 is the cause of the warnings and likely memory leak.

Going to the parent commit has a minor performance benefit:

$ autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max   │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 8 ms │ 17 ms │ 19 ms │ 28 ms │ 14.53 ms │ 5.42 ms │ 71 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 60895   │ 60895   │ 68351   │ 68543   │ 66806.4 │ 2971.63 │ 60887   │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 11.4 MB │ 11.4 MB │ 12.8 MB │ 12.8 MB │ 12.5 MB │ 556 kB  │ 11.4 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

334k requests in 5.03s, 62.5 MB read

The majority performance regression is present in v15 as well, so it's unrelated.

@mcollina
Copy link
Member Author

Apparently the majority of the regression happened in the v15 cycle.

v15.0.0: 77k req/sec (median)
v15.11.0: 67k req/sec (median)

@ronag
Copy link
Member

ronag commented Mar 27, 2021

#36816 is the cause of the warnings and likely memory leak.

I will sort this.

@ronag
Copy link
Member

ronag commented Mar 27, 2021

Apparently the majority of the regression happened in the v15 cycle.

v15.0.0: 77k req/sec (median)
v15.11.0: 67k req/sec (median)

Are you bisecting?

@ronag
Copy link
Member

ronag commented Mar 27, 2021

@mcollina: Any idea how to repro without using autocannon?

I tried:

'use strict';

const common = require('../common');
const http = require('http');
const Countdown = require('../common/countdown');

const NUM_REQ = 128

const agent = new http.Agent({ keepAlive: true });
const countdown = new Countdown(NUM_REQ, () => server.close());

const server = http.createServer(common.mustCall(function(req, res) {
  res.setHeader('content-type', 'application/json; charset=utf-8')
  res.end(JSON.stringify({ hello: 'world' }))
}, NUM_REQ)).listen(0, function() {
  for (let i = 0; i < NUM_REQ; ++i) {
    http.request({
      port: server.address().port,
      agent,
      method: 'GET'
    }, function(res) {
      res.resume();
      res.on('end', () => {
        countdown.dec();
      })
    }).end();
  }
});

process.on('warning', common.mustNotCall());

@mcollina
Copy link
Member Author

Are you bisecting?

I've done a nvm "rough" bisect just narrow down a bit my scope of action.

I'm doing a git bisect as we speak, it just takes a lot of time as it spans several V8 versions.

@mcollina
Copy link
Member Author

@mcollina: Any idea how to repro without using autocannon?

I think you need to use net to send a manual request with http pipelining.

@ronag
Copy link
Member

ronag commented Mar 27, 2021

I've done a nvm "rough" bisect just narrow down a bit my scope of action.

And between which versions did it occur?

@mcollina
Copy link
Member Author

So far I have identified these commits as problematic:

  • requestTimeout df08d52 80k -> 76k

  • SafeArrayIterator 0b6d307 72k -> 67k

There are probably more. I'll start to suspect the other primordials change.


On df08d52, I benchmarked this extensively and I did not see a regression at the time. Also it did not have the same worsening effect on v14 instead of master.

@targos
Copy link
Member

targos commented Mar 28, 2021

SafeArrayIterator 0b6d307 72k -> 67k

Is it the debuglog change?

@mcollina
Copy link
Member Author

SafeArrayIterator 0b6d307 72k -> 67k

Is it the debuglog change?

Yes it is.

@mcollina
Copy link
Member Author

cc @nodejs/tsc

@mcollina mcollina self-assigned this Mar 29, 2021
@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Mar 29, 2021
@mcollina
Copy link
Member Author

mcollina commented Mar 29, 2021

I have identified the three offending commits that causes these regressions:

  • requestTimeout df08d52 80k -> 76k
  • SafeArrayIterator 0b6d307 72k -> 67k
  • align with stream.Writable e2f5bb7 67k -> 60k

I plan to issue a revert for 0b6d307 and e2f5bb7 and work on a faster implementation of df08d52 as it's a security fix.

cc @nodejs/http

@ronag
Copy link
Member

ronag commented Mar 29, 2021

e2f5bb7 has 2 problems then:

  • Performance regression
  • Leaking event listeners

What's the timeframe where we need this fixed? If there is time I'd like to have a go at trying to resolve these.

@mcollina
Copy link
Member Author

What's the timeframe where we need this fixed? If there is time I'd like to have a go at trying to resolve these.

Before v16 comes out. It can either be yanked from the release or reverted on master. I prefer to revert as we are already close to the deadline for semver-majors.

@ronag
Copy link
Member

ronag commented Mar 29, 2021

Before v16 comes out. It can either be yanked from the release or reverted on master. I prefer to revert as we are already close to the deadline for semver-majors.

Alright. I'll have another go at it for v17? Assuming we agree that the core idea behind the PR is still considered good.

@mcollina
Copy link
Member Author

Alright. I'll have another go at it for v17? Assuming we agree that the core idea behind the PR is still considered good.

I still think it's a good idea. I'll add a test to prevent the regression in the future.

@ronag
Copy link
Member

ronag commented Mar 29, 2021

I'll add a test to prevent the regression in the future.

A test with pipelining would be nice to sort the listener leak regression.

@targos
Copy link
Member

targos commented Mar 29, 2021

I have trouble to get consistent results on my computer with the provided benchmark. Every time I execute autocannon, I get a very different result, up to 10% different from another one.

targos added a commit that referenced this issue Mar 31, 2021
Exporting a variable that will be mutated later doesn't work.

Refs: #37937

PR-URL: #37966
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
lpinca pushed a commit that referenced this issue Mar 31, 2021
This reverts commit 2da3611.

PR-URL: #37964
Refs: #37937
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
lpinca pushed a commit that referenced this issue Mar 31, 2021
PR-URL: #37964
Refs: #37937
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@targos
Copy link
Member

targos commented Apr 1, 2021

I think this can be closed now?

From a quick glance at the CPU profile, it looks like the biggest bottleneck on master is now Socket._writeGeneric.

@mcollina
Copy link
Member Author

mcollina commented Apr 1, 2021

I think there is still some gaps. Here are this morning tests.

master:


$ autocannon -c 100 -d 40 -p 10 localhost:3000
Running 40s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max   │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 8 ms │ 17 ms │ 18 ms │ 18 ms │ 13.77 ms │ 4.47 ms │ 85 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 61695   │ 61695   │ 70143   │ 70399   │ 69872.4 │ 1331.58 │ 61673   │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 11.5 MB │ 11.5 MB │ 13.1 MB │ 13.2 MB │ 13.1 MB │ 250 kB  │ 11.5 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

2795k requests in 40.03s, 523 MB read

node v14.16.0:


$ autocannon -c 100 -d 40 -p 10 localhost:3000
Running 40s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max   │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 7 ms │ 15 ms │ 16 ms │ 17 ms │ 12.78 ms │ 3.93 ms │ 83 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼────────┼─────────┤
│ Req/Sec   │ 65311   │ 65311   │ 75711   │ 76287   │ 75287.61 │ 1700.3 │ 65304   │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼────────┼─────────┤
│ Bytes/Sec │ 12.2 MB │ 12.2 MB │ 14.2 MB │ 14.3 MB │ 14.1 MB  │ 318 kB │ 12.2 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴────────┴─────────┘

Req/Bytes counts sampled once per second.

3011k requests in 40.03s, 563 MB read

I plan to do a bit more work on this area and try to bring it back to the same throughput.

@mcollina
Copy link
Member Author

mcollina commented Apr 2, 2021

I have identified a significant regression between v15.5.1 and v15.6.0.

Node v15.5.1

$ autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max   │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 7 ms │ 15 ms │ 20 ms │ 28 ms │ 13.26 ms │ 4.99 ms │ 93 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬───────┬─────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5% │ Avg     │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼───────┼─────────┼────────┼─────────┤
│ Req/Sec   │ 66751   │ 66751   │ 74239   │ 74623 │ 72774.4 │ 3034.6 │ 66749   │
├───────────┼─────────┼─────────┼─────────┼───────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 12.5 MB │ 12.5 MB │ 13.9 MB │ 14 MB │ 13.6 MB │ 565 kB │ 12.5 MB │
└───────────┴─────────┴─────────┴─────────┴───────┴─────────┴────────┴─────────┘

Req/Bytes counts sampled once per second.

Node v15.6.0

$ autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max    │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 8 ms │ 16 ms │ 22 ms │ 43 ms │ 14.44 ms │ 6.55 ms │ 134 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬───────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min   │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───────┤
│ Req/Sec   │ 53727   │ 53727   │ 70591   │ 70975   │ 67203.2 │ 6753.66 │ 53719 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───────┤
│ Bytes/Sec │ 10.1 MB │ 10.1 MB │ 13.2 MB │ 13.3 MB │ 12.6 MB │ 1.26 MB │ 10 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴───────┘

Req/Bytes counts sampled once per second.

336k requests in 5.03s, 62.8 MB read

@mcollina
Copy link
Member Author

mcollina commented Apr 2, 2021

Nevermind, this was the debuglog that @targos fixed.

@mcollina
Copy link
Member Author

mcollina commented Apr 2, 2021

I finally tracked down one more regression by looking at the flamegraphs, f3eb224 added two more calls to hrtime() that were not there before. Just by disabling those the throughput moves from 69k to 72k.

I already spoke to @jasnell and he will work on this next week.

@ronag
Copy link
Member

ronag commented Apr 2, 2021

@mcollina excellent work!

@jasnell
Copy link
Member

jasnell commented Apr 2, 2021

I've got the fix already identified, I'm just not writing any code today so I will do that first thing Monday morning and have the PR open soon after

@ronag
Copy link
Member

ronag commented Apr 2, 2021

Do these things not show up when running a cpu profile? Seems a bit unfortunate that we need to git bisect to identify new or old performance bottlenecks?

@mcollina
Copy link
Member Author

mcollina commented Apr 2, 2021

Do these things not show up when running a cpu profile? Seems a bit unfortunate that we need to git bisect to identify new or old performance bottlenecks?

They show up. However git bisecting is actually simpler because you do not have to code hypothetical fixes.

The hrtime showed up. It's far down from the main bottleneck, but it was a key difference on an hot path.

@jasnell
Copy link
Member

jasnell commented Apr 2, 2021

And I did run perf tests on this one change. I just think some of the other issues were masking the perf hit one the hrtime change making it far less obvious. Fortunately, it's a quick fix

MylesBorins pushed a commit that referenced this issue Apr 4, 2021
Exporting a variable that will be mutated later doesn't work.

Refs: #37937

PR-URL: #37966
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
jasnell added a commit to jasnell/node that referenced this issue Apr 6, 2021
Signed-off-by: James M Snell <[email protected]>
Refs: nodejs#37937
Refs: nodejs#37136
jasnell added a commit to jasnell/node that referenced this issue Apr 6, 2021
Only call into hrtime if there's an observer

Also, fix up some previously missed changes from the original refactor

Signed-off-by: James M Snell <[email protected]>
Refs: nodejs#37937
Refs: nodejs#37136
jasnell added a commit to jasnell/node that referenced this issue Apr 9, 2021
Only call into hrtime if there's an observer

Also, fix up some previously missed changes from the original refactor

Signed-off-by: James M Snell <[email protected]>
Refs: nodejs#37937
Refs: nodejs#37136
jasnell added a commit that referenced this issue Apr 9, 2021
Only call into hrtime if there's an observer

Also, fix up some previously missed changes from the original refactor

Signed-off-by: James M Snell <[email protected]>
Refs: #37937
Refs: #37136

PR-URL: #38110
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
@mcollina mcollina removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Apr 14, 2021
@mcollina
Copy link
Member Author

Here is another one: #38245.

I have one more coming :).

@mcollina
Copy link
Member Author

#38246 include my last findings.

@mcollina
Copy link
Member Author

With the latest PRs having landed, the HTTP throughput of the upcoming v16 is on par with the one of the latest v14. I think this is a milestone and I'll celebrate to close this issue and possibly open a fresh one with other optimizations that we might want to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants