Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net,stream: skip chunk check on incoming data #19559

Closed
wants to merge 2 commits into from

Conversation

lpinca
Copy link
Member

@lpinca lpinca commented Mar 23, 2018

Do not validate data chunks read from the socket handle as they are
guaranteed to be buffers and validation is costly.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added net Issues and PRs related to the net subsystem. stream Issues and PRs related to the stream subsystem. labels Mar 23, 2018
@@ -192,9 +192,8 @@ Readable.prototype._destroy = function(err, cb) {
// This returns true if the highWaterMark has not been hit yet,
// similar to how Writable.write() returns true if you should
// write() some more.
Readable.prototype.push = function(chunk, encoding) {
Readable.prototype.push = function(chunk, encoding, skipChunkCheck) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is acceptable, but I have no better ideas. If it's ok, should we document it or keep it private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to use a special encoding value to specify that chunk is a buffer and should not be checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this approach. What would likely be better is a new internal only Readable.prototype[kPush] = function(...) that Readable.prototype.push(...) can defer to and that our internal code can call directly.. then keep the type check in the existing push() and skip it in the internal one.

Whatever happens here, tho, the @nodejs/streams wg needs to take a look :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The private push would basically be just an helper for

function readableAddChunk(stream, chunk, encoding, addToFront, skipChunkCheck) {
so maybe we can add that to the prototype hidden behind a symbol?

@lpinca
Copy link
Member Author

lpinca commented Mar 23, 2018

@mscdex
Copy link
Contributor

mscdex commented Mar 23, 2018

Can we get benchmark (at least net and streams.Readable) results for this?

@lpinca
Copy link
Member Author

lpinca commented Mar 23, 2018

No noticeable difference on net benchmarks but that's because they are not stressing this code path enough. Will post results of an ad hoc Readable benchmark in a bit.

Btw I've opened this only because chunkInvalid() is marked as hot in some flame graphs I'm generating.

@@ -192,9 +192,8 @@ Readable.prototype._destroy = function(err, cb) {
// This returns true if the highWaterMark has not been hit yet,
// similar to how Writable.write() returns true if you should
// write() some more.
Readable.prototype.push = function(chunk, encoding) {
Readable.prototype.push = function(chunk, encoding, skipChunkCheck) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this approach. What would likely be better is a new internal only Readable.prototype[kPush] = function(...) that Readable.prototype.push(...) can defer to and that our internal code can call directly.. then keep the type check in the existing push() and skip it in the internal one.

Whatever happens here, tho, the @nodejs/streams wg needs to take a look :-)

@lpinca
Copy link
Member Author

lpinca commented Mar 23, 2018

@mscdex

'use strict';

const common = require('../common');
const { Readable } = require('stream');

const bench = common.createBenchmark(main, {
  n: [1e6]
});

function main({ n }) {
  const buf = Buffer.alloc(64);
  const readable = new Readable({
    read() {}
  });

  readable.on('end', function() {
    bench.end(n);
  });
  readable.on('resume', function() {
    bench.start();
    for (var i = 0; i < n; i++)
      this.push(buf, undefined, true);
    this.push(null);
  });
  readable.resume();
}
$ cat compare2.csv | Rscript benchmark/compare.R 
                           confidence improvement accuracy (*)   (**)  (***)
 streams/skip.js n=1000000        ***      6.86 %       ±2.06% ±2.78% ±3.67%

so it probably does not worth the effort. I should try on v8.x and v9.x but I'm too lazy to recompile now.

@lpinca
Copy link
Member Author

lpinca commented Mar 24, 2018

Same benchmark on v9.x-staging:

                           confidence improvement accuracy (*)   (**)  (***)
 streams/skip.js n=1000000        ***     35.99 %       ±1.42% ±1.88% ±2.45%

which explains why chunkInvalid() was marked as hot in my flame graph.

Edit: same results on on v8.x-staging.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very conflicted by this PR. It’d be worthy running benchmarks and applying this also for HTTP (post) and TLS.

I would prefer having this as an option at creation time and documenting it.
This seems like a feature we should support and offet to our users as well.

Also, some of the benefits might come from calling push with the right number of arguments and avoid the trampoline.

@lpinca
Copy link
Member Author

lpinca commented Mar 24, 2018

@mcollina

I've run net benchmarks and there is no noticeable difference on master but that's expected, I should and will rerun them on v8.x where the impact is big. Also the stream must be in flowing mode to see anything appreciable.

This seems like a feature we should support and offet to our users as well.

Agreed but why having it as an instance option instead of a per push option?

Also, some of the benefits might come from calling push with the right number of arguments and avoid the trampoline.

Not sure I understand care to elaborate?

@mcollina
Copy link
Member

Also, some of the benefits might come from calling push with the right number of arguments and avoid the trampoline.

Not sure I understand care to elaborate?

If you define a function with 2 arguments, but you invoke it with only one, V8 will place a trampoline between the two to fill in the missing arguments. For maximum performance, we should always call this.push(chunk, enc), not just this.push(chunk). I do not know how much this impact performance.

Agreed but why having it as an instance option instead of a per push option?

Either your code is "safe", or it's not. And also I prefer if we do not add a third parameter to push, for the reasons above.

@addaleax
Copy link
Member

Hey! Just for context, one of my motivations for #18537 (string_decoder in C++) was that we could move the .setEncoding() functionality for StreamBases into C++, which would get rid of an extra allocation + copy operation in the string case, which gave around a 20 % - 30 % increase in those benchmarks (prototype: addaleax/node@e548b13).

I wouldn’t expect many people to call that for network sockets, but for file streams or HTTP/2 streams this seems a lot more likely, so I’ve been waiting for progress on #19060 (merging handling for StreamBases).

I don’t know whether that conflicts with this patch, but in case it does, I just wanted to mention it. :)

@lpinca
Copy link
Member Author

lpinca commented Mar 25, 2018

If you define a function with 2 arguments, but you invoke it with only one, V8 will place a trampoline between the two to fill in the missing arguments. For maximum performance, we should always call this.push(chunk, enc), not just this.push(chunk). I do not know how much this impact performance.

Yes, but it doesn't matter in this case, the benchmark always run with 3 arguments and the 30% improvement is consistent.

@lpinca lpinca force-pushed the skip/chunk-check branch from 6f7aa29 to 07477b7 Compare March 25, 2018 11:42
@lpinca
Copy link
Member Author

lpinca commented Mar 25, 2018

@mcollina does it better align with your idea now?
New CI: https://ci.nodejs.org/job/node-test-pull-request/13860/

There are cases where there is no need to validate the `chunk` argument
of `Readable.prototype.push()` because it is known beforehand that
`chunk` is valid.

Make `Readable` constructor accept a `skipChunkCheck` option to skip
`chunk` validation.
@lpinca lpinca force-pushed the skip/chunk-check branch from 07477b7 to fe4513b Compare March 25, 2018 15:01
@lpinca
Copy link
Member Author

lpinca commented Mar 25, 2018

Benchmark results on v9.x-staging:

$ cat compare.csv | Rscript benchmark/compare.R 
                                                                 confidence improvement accuracy (*)   (**)  (***)
 streams/readable-skip-chunk-check.js n=1000000 skipChunkCheck=0                 0.08 %       ±0.55% ±0.73% ±0.95%
 streams/readable-skip-chunk-check.js n=1000000 skipChunkCheck=1        ***     32.79 %       ±1.37% ±1.84% ±2.43%

Do not validate data chunks read from the socket handle as they are
guaranteed to be buffers and validation is costly.
@lpinca lpinca force-pushed the skip/chunk-check branch from fe4513b to 359b1ae Compare March 25, 2018 17:23
@mcollina
Copy link
Member

Yes, but it doesn't matter in this case, the benchmark always run with 3 arguments and the 30% improvement is consistent.

I mean, in net it's currently called with 1 and the trampoline will be used.

I'm not really concerned about the benchmarks within stream, but rather something involving net or http. The benefits should be prominent everywhere.

@lpinca
Copy link
Member Author

lpinca commented Mar 26, 2018

@mcollina, I've just rerun net benchmarks on v9.x-staging and there aren't noticeable differences, just a tiny (~5/6 % with one star) gain on some tests, but I think they are not representative.

Please take a look at this: https://foo-pcsvujxlnp.now.sh/

@mcollina
Copy link
Member

Can you run http simple as well? I can't see a definite benefit on all my checks.

@lpinca
Copy link
Member Author

lpinca commented Mar 26, 2018

I think there is no point, if net benchmarks show no gain I really doubt http (a layer on top) will be different.
That said I still think there is value in skipping chunk validation when possible because a good amount of CPU cycles is wasted there as show in the above graph and the added "readable" benchmark.
The only drawback added by this PR is the maintenance burden added by the skipChunkCheck option.

Feel free to close if you think there is no value.

@mcollina
Copy link
Member

What it is puzzling me is why there is no impact of this in real life scenarios. Truly a 30% improvement in streams should be noticeable.
Maybe our current net and http benchmarks do not cover this hot path?
How did you spot this in the first place?

@lpinca
Copy link
Member Author

lpinca commented Mar 27, 2018

My guess is that the gain is masked by other, heavier things.

Maybe our current net and http benchmarks do not cover this hot path?

Yes I think so but I've also tried to write an ad hoc test case where a very fast TCP client writes many small chunks as fast as possible with no luck. The chunks are coalesced into a single big TCP packet making the test useless.

How did you spot this in the first place?

By running and profiling an echo server benchmark in ws using uws as client (to make sure the client is faster than the server) and sending a lot (100k) of small messages.

@mcollina
Copy link
Member

@lpinca in that case, what was the performance improvement of this change?

@lpinca
Copy link
Member Author

lpinca commented Mar 27, 2018

@mcollina still insignificant but chunkInvalid() appeared in the graph, so I said why paying for something that is not needed?

I think the reason why it doesn't make a difference is that only a chunk is pushed per tick? It might make sense to test with tens of thousands connected sockets, but I'm not passionate enough to do that, sorry.

@lpinca
Copy link
Member Author

lpinca commented Mar 27, 2018

Closing as per #19559 (comment). It's easier to upgrade Node.js than adding an option that is only useful on Node.js < 10 and that we will have to support forever.

@lpinca lpinca closed this Mar 27, 2018
@lpinca lpinca deleted the skip/chunk-check branch March 27, 2018 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Issues and PRs related to the net subsystem. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants