querystring: improve parse() and escape() performance #5012

mscdex · 2016-01-31T20:08:01Z

parse() performance is improved by ~20-200% with the various querystring-parse benchmarks.

Some optimization strategies used include:

Combining multiple searches (for '&', '=', and '+') on the same
string into a single loop
Avoiding string.split()
Minimizing creation of temporary strings
Avoiding string decoding if no encoded bytes were found and the
default string decoder is being used

escape() performance is improved a bit, up to ~15% with the various querystring-stringify benchmarks by reducing the number of string concatenations and avoiding a potential deopt if the input string ends on an incomplete multibyte character.

Also, a constant deopt in unescapeBuffer() is avoided by checking the index (to make sure it is not out of bounds) passed to charCodeAt()

jbergstroem · 2016-01-31T20:34:39Z

CI: https://ci.nodejs.org/job/node-test-commit/2009/

infusion · 2016-01-31T22:28:06Z

Why did you remove the str.length cache in escape()?

mscdex · 2016-01-31T22:34:29Z

@infusion It's not necessary with modern versions of v8.

evanlucas · 2016-02-01T11:52:59Z

lib/querystring.js

@@ -18,7 +18,7 @@ QueryString.unescapeBuffer = function(s, decodeSpaces) {
  var n, m, hexchar;

  for (var inIndex = 0, outIndex = 0; inIndex <= s.length; inIndex++) {
-    var c = s.charCodeAt(inIndex);
+    var c = inIndex < s.length ? s.charCodeAt(inIndex) : NaN;


Just curious, what is the benefit of using NaN here?

charCodeAt() returns NaN for out of bounds indices. I was just keeping the same behavior here but avoiding the deopt.

ahh that makes sense. Thanks

Would there be drawbacks to changing the <= to < in the loop test? You'd have to replicate some of the out[outIndex++] assignments after the loop but it would keep the loop body simple. I guess you can also accomplish that with a s/NaN/0/.

I left it as-is to keep changes minimal. I typically don't like to duplicate code when reusing the loop logic like that is easy/simple enough.

mscdex · 2016-02-01T17:27:55Z

@bnoordhuis I've fixed the missing post-OptimizeFunctionOnNextCall function calls. Performance is still the same FWIW.

jasnell · 2016-02-01T17:41:57Z

LGTM

bnoordhuis · 2016-02-01T19:02:43Z

lib/querystring.js

@@ -18,7 +18,7 @@ QueryString.unescapeBuffer = function(s, decodeSpaces) {
  var n, m, hexchar;

  for (var inIndex = 0, outIndex = 0; inIndex <= s.length; inIndex++) {
-    var c = s.charCodeAt(inIndex);
+    var c = inIndex < s.length ? s.charCodeAt(inIndex) : NaN;
    switch (state) {
      case 'CHAR':
        switch (c) {


I suspect you can eke out some more performance if you replace the calls to charCode() with their number literal equivalents.

dolphin278 · 2016-02-05T10:03:11Z

lib/querystring.js

+    // slower
+    if (keys.indexOf(key) === -1) {
+      obj[key] = value;
+      keys.push(key);


@mscdex General question – isn't keys[keys.length] = key still better than keys.push(key)?

Few months ago I've inspected compiled code using IRHydra and I remember some difference between pushing items and assigning by array length in loop in favor of second one.

Does it matter anymore?

Testing with Chrome with v8 4.7 on jsperf shows that using arr[arr.length] = x is indeed much faster, but there wasn't as large of a performance gain in the node benchmark (including a new benchmark input I just added that has even more duplicate keys). However I've changed it anyway for the small performance increase it does provide.

mscdex · 2016-02-11T15:58:26Z

Can I get some more LGTMs on this one?

/cc @nodejs/collaborators

jasnell · 2016-02-11T16:00:50Z

Still LGTM

silverwind · 2016-02-11T16:07:30Z

LGTM pending CI.

CI: https://ci.nodejs.org/job/node-test-pull-request/1638/

mcollina · 2016-02-11T20:38:17Z

LGTM

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase.

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds.

mscdex · 2016-02-13T01:29:23Z

CI again since the last one had CI infrastructure issues: https://ci.nodejs.org/job/node-test-commit/2216/

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

mscdex · 2016-02-13T01:30:43Z

Landed in 00638ac, c8e650d, and a2a69a2.

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354 PR-URL: #5295

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354 PR-URL: #5295

MylesBorins · 2016-03-10T21:28:17Z

Adding the LTS watch flag, but I think this will have to sit for a while before we know that this is stable.

Thoughts?

jasnell · 2016-03-11T00:45:19Z

Yeah, like the path changes, we'll want to let this sit for a good long time before backporting.

rvagg · 2016-03-14T01:47:55Z

-1 for LTS is my vote (not being absolute, if you two want to disagree with me). I'm leaning that way for perf changes, stronger for larger changes (not just in LOC but impact). Performance profile of LTS should be relatively stable over time and we owe it to users not to screw with things too much. While we can pick up edge cases with Stable releases, there's a whole different sector of users who use LTS that may experience totally different edge cases than users of Stable might.

MylesBorins · 2016-03-14T01:56:58Z

@rvagg I don't disagree.
What I do think could be interesting though is keeping track of larger changes like this and all the future regressions so that it will be easy to backport the entire lot if we need to (e.g. to many changes making the overall backporting process a nightmare)

MylesBorins · 2016-05-17T22:33:39Z

marking this don't land for now

mscdex added the querystring Issues and PRs related to the built-in querystring module. label Jan 31, 2016

evanlucas reviewed Feb 1, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from e7bb9ef to ed61c27 Compare February 1, 2016 17:26

bnoordhuis reviewed Feb 1, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from ed61c27 to ceffae4 Compare February 4, 2016 02:11

MylesBorins mentioned this pull request Feb 5, 2016

querystring: check that maxKeys is finite #5066

Merged

dolphin278 reviewed Feb 5, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from ceffae4 to 6060e98 Compare February 6, 2016 18:48

mscdex added 3 commits February 12, 2016 19:56

mscdex force-pushed the perf-querystring branch from 6060e98 to 3df8e85 Compare February 13, 2016 00:57

mscdex closed this Feb 13, 2016

mscdex deleted the perf-querystring branch February 13, 2016 01:33

rvagg mentioned this pull request Feb 18, 2016

Release proposal: 5.7.0 (Stable) #5295

Merged

MylesBorins mentioned this pull request Mar 10, 2016

Audit commits not found on v4.x #5647

Closed

MylesBorins added the lts-watch-v4.x label Mar 10, 2016

MylesBorins added dont-land-on-v4.x and removed lts-watch-v4.x labels May 17, 2016

This was referenced Apr 24, 2023

[Snyk] Fix for 44 vulnerabilities aliscco/alisco-node#115

Open

[Snyk] Fix for 8 vulnerabilities aliscco/alisco-node#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

querystring: improve parse() and escape() performance #5012

querystring: improve parse() and escape() performance #5012

mscdex commented Jan 31, 2016

jbergstroem commented Jan 31, 2016

infusion commented Jan 31, 2016

mscdex commented Jan 31, 2016

evanlucas Feb 1, 2016

mscdex Feb 1, 2016

evanlucas Feb 1, 2016

bnoordhuis Feb 1, 2016

mscdex Feb 1, 2016

mscdex commented Feb 1, 2016

jasnell commented Feb 1, 2016

bnoordhuis Feb 1, 2016

mscdex Feb 4, 2016

dolphin278 Feb 5, 2016

mscdex Feb 6, 2016

mscdex commented Feb 11, 2016

jasnell commented Feb 11, 2016

silverwind commented Feb 11, 2016

mcollina commented Feb 11, 2016

mscdex commented Feb 13, 2016

mscdex commented Feb 13, 2016

MylesBorins commented Mar 10, 2016

jasnell commented Mar 11, 2016

rvagg commented Mar 14, 2016

MylesBorins commented Mar 14, 2016

MylesBorins commented May 17, 2016

querystring: improve parse() and escape() performance #5012

querystring: improve parse() and escape() performance #5012

Conversation

mscdex commented Jan 31, 2016

jbergstroem commented Jan 31, 2016

infusion commented Jan 31, 2016

mscdex commented Jan 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mscdex commented Feb 1, 2016

jasnell commented Feb 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mscdex commented Feb 11, 2016

jasnell commented Feb 11, 2016

silverwind commented Feb 11, 2016

mcollina commented Feb 11, 2016

mscdex commented Feb 13, 2016

mscdex commented Feb 13, 2016

MylesBorins commented Mar 10, 2016

jasnell commented Mar 11, 2016

rvagg commented Mar 14, 2016

MylesBorins commented Mar 14, 2016

MylesBorins commented May 17, 2016