Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request body sent in separate SSL segment (bug at 9.6.0) #27573

Closed
mildsunrise opened this issue May 5, 2019 · 11 comments
Closed

request body sent in separate SSL segment (bug at 9.6.0) #27573

mildsunrise opened this issue May 5, 2019 · 11 comments
Labels
http Issues or PRs related to the http subsystem. tls Issues and PRs related to the tls subsystem.

Comments

@mildsunrise
Copy link
Member

mildsunrise commented May 5, 2019

  • Version: bug introduced in 9.6.0 and still present on 12.1.0
  • Platform: Linux 4.4.0-142-generic Release Cycle Proposal #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: http / https (maybe tls)
  • Minor bug.

The code:

let req = https.request("https://example.org")
req.end( Buffer.from("hello\n") );

For Node >= 9.6.0, sends two SSL segments (one with request + headers, the other with the body hello\n). Node <= 9.5.0 sends a single SSL segment with everything as expected.

If instead of passing a Buffer, we pass a string:

req.end( "hello\n" );

This works correctly in all Node.JS versions.

@sam-github
Copy link
Contributor

I can confirm this happens as described, with TLS1.2 or 1.3, on master/12.x, but it's not clear to me that there is any negative effect to it. I wiresharked the exchange, and it looks like both TLS records are sent in a single TCP segment, so that isn't a problem.

Since it is odd, I'll try to see why it does this when I have time (the first few layers of functions called by .end() don't behave differently for String vs Buffer).

Why is this a concern to you?

@mildsunrise
Copy link
Member Author

Oh, you're right, I genuinely thought they were separate TCP segments 😅 In that case it probably won't make a difference (unless you're making like, a ton of requests). For debugging it'd still be nice to have full control on the SSL segments, though.

It seems the distinction between string and buffer is being done here:

node/lib/_http_outgoing.js

Lines 217 to 238 in 8c4bd2a

if (!this._headerSent) {
if (typeof data === 'string' &&
(encoding === 'utf8' || encoding === 'latin1' || !encoding)) {
data = this._header + data;
} else {
var header = this._header;
if (this.outputData.length === 0) {
this.outputData = [{
data: header,
encoding: 'latin1',
callback: null
}];
} else {
this.outputData.unshift({
data: header,
encoding: 'latin1',
callback: null
});
}
this.outputSize += header.length;
this._onPendingData(header.length);
}

So, replacing lines 222-237 with:

var header = Buffer.from(this._header, 'latin1');
data = Buffer.concat([header, data]);

Would make the behaviour symmetrical among strings and buffers, I think. Would you accept a PR?

@sam-github
Copy link
Contributor

We do accept PRs, of course, but changes in core can be tricky.

Concating the buffers seems like it would add an unnecessary memory copy/buffer concatenation into a hot path, the current code looks pretty deliberate. I'm more surprised that using a string causes concatenation, than that a data buffer is a different write from the header values (which are probably strings).

@bnoordhuis
Copy link
Member

I think this can be improved, most sockets have a ._writeGeneric() method that can write out a mix of strings and buffers.

OutgoingMessage#_send() calls OutgoingMessage#_writeRaw() which calls this.connection.write() to actually write the data. The "write generic" logic would look something like this:

if (conn._writeGeneric) {
  const chunks = [/* ... */];  // derive from this.outputData
  chunks.allBuffers = false;  // if it's a mix of strings and buffers
  // explicit check needed because it returns undefined on success
  return false !== conn._writeGeneric(true, chunks, '', callback);
}
// ...
return conn.write(data, encoding, callback);

Chunks have this format:

const chunk = {
  chunk: /* buffer or string */,
  encoding: /* string, set to '' for buffer */,
};

You could go even further by calling conn._handle.writev() directly because ._writeGeneric() does a lot of unnecessary work - but it also does some necessary work so you'll have to read the source carefully. :-)


Honestly, I'd love if someone goes over the net, http and streams code with a fine-tooth comb. There are a lot of accrued inefficiencies.

@bnoordhuis bnoordhuis added http Issues or PRs related to the http subsystem. tls Issues and PRs related to the tls subsystem. labels May 7, 2019
@mildsunrise
Copy link
Member Author

Thanks for the info! I'm new to Node internals, but I'll try to work on this

@mildsunrise
Copy link
Member Author

@bnoordhuis Just to make sure, you meant conn._handle._writev()?

@addaleax
Copy link
Member

@jmendeth The method should be named ._handle.writev() (e.g. process.stdout._handle.writev when process.stdout refers to a pipe/TTY).

That being said … Ben’s suggestion only really works if we can actually skip through the streams layer, i.e. there are no other chunks currently being written and no other interaction with the socket as a JS stream takes place.

@mildsunrise
Copy link
Member Author

Thanks! I have investigated further and _writeGeneric() (and thus handle.writev) is already called with both chunks. So if we modify TLSWrap::DoWrite() and TLSWrap::ClearIn() to emit a single SSL segment, that's all that's needed for the code I posted.

Of course, copying into a temporary buffer may decrease performance for large chunks, but it's faster than doing it in JS, and given the incredible overhead this might have for small requests / responses, I think we should at least allow users to enable concatenation.

(Unrelated question: I've seen that SSL_MODE_ENABLE_PARTIAL_WRITE isn't used (and the code assumes all writes are complete) is that intentional?)

@sam-github
Copy link
Contributor

The writes all go into a BIO buffer, so incomplete writes aren't possible, unlike writing to an OS socket descriptor.

@mildsunrise
Copy link
Member Author

I also found out this is more significant in http2. My gf had a homework task where she had to push 49 resources to the browser:

  for (var i = 1; i < 50; i++) {
    var path = "/testStreamShort" + i;
    stream.pushStream({":path": path}, (err, pushStream) => {
      if (err) throw err;
      pushStream.respond({ ':status': 200, 'content-type': 'text/plain' });
      pushStream.end("a");
    });
  }
  stream.respond({ ':status': 200, 'content-type': 'text/plain' });
  stream.end("a");

This emits 249 segments: 49 PUSH_PROMISE (31 bytes), then 50 HEADERS (12 bytes), then 50 DATA (9 bytes), then 50 segments with the actual body (1 byte), then 50 empty DATA (9 bytes) to indicate end of stream. This results in 8562 bytes written to the TCP socket.

With the change I mentioned, everything is emitted in only 2 segments, weighting 3128 bytes. That's a 63% saving. I don't know how to properly benchmark this, but a quick attempt shows a 28% decrease in CPU time.

@mildsunrise
Copy link
Member Author

@sam-github Thanks!

mildsunrise added a commit to mildsunrise/node that referenced this issue May 28, 2019
TLSWrap::DoWrite() now concatenates data chunks and makes a single
call to SSL_write(). Grouping data into a single segment:

- reduces network overhead: by factors of even 2 or 3 in usages
  like `http2` or `form-data`

- improves security: segment lengths can reveal lots of info, i.e.
  with `form-data`, how many fields are sent and the approximate length
  of every individual field and its headers

- reduces encryption overhead: a quick benchmark showed a ~30% CPU time
  decrease for an extreme case, see
  nodejs#27573 (comment)

Fixes: nodejs#27573
targos pushed a commit that referenced this issue May 31, 2019
TLSWrap::DoWrite() now concatenates data chunks and makes a single
call to SSL_write(). Grouping data into a single segment:

- reduces network overhead: by factors of even 2 or 3 in usages
  like `http2` or `form-data`

- improves security: segment lengths can reveal lots of info, i.e.
  with `form-data`, how many fields are sent and the approximate length
  of every individual field and its headers

- reduces encryption overhead: a quick benchmark showed a ~30% CPU time
  decrease for an extreme case, see
  #27573 (comment)

Fixes: #27573

PR-URL: #27861
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ujjwal Sharma <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
mildsunrise added a commit to mildsunrise/node that referenced this issue Jul 30, 2019
TLSWrap::DoWrite() now concatenates data chunks and makes a single
call to SSL_write(). Grouping data into a single segment:

- reduces network overhead: by factors of even 2 or 3 in usages
  like `http2` or `form-data`

- improves security: segment lengths can reveal lots of info, i.e.
  with `form-data`, how many fields are sent and the approximate length
  of every individual field and its headers

- reduces encryption overhead: a quick benchmark showed a ~30% CPU time
  decrease for an extreme case, see
  nodejs#27573 (comment)

Fixes: nodejs#27573

PR-URL: nodejs#27861
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ujjwal Sharma <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
BethGriggs pushed a commit that referenced this issue Oct 7, 2019
TLSWrap::DoWrite() now concatenates data chunks and makes a single
call to SSL_write(). Grouping data into a single segment:

- reduces network overhead: by factors of even 2 or 3 in usages
  like `http2` or `form-data`

- improves security: segment lengths can reveal lots of info, i.e.
  with `form-data`, how many fields are sent and the approximate length
  of every individual field and its headers

- reduces encryption overhead: a quick benchmark showed a ~30% CPU time
  decrease for an extreme case, see
  #27573 (comment)

Fixes: #27573

Backport-PR-URL: #28904
PR-URL: #27861
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ujjwal Sharma <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants