buffer: optimize Buffer#toString() #2027

bnoordhuis · 2015-06-21T21:52:05Z

Break up Buffer#toString() into a fast and slow path. The fast path
optimizes for zero-length buffers and no-arg method invocation.

The speedup for zero-length buffers is a satisfying 700%. The no-arg
toString() operation gets faster by about 13% for a one-byte buffer.

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls. Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

R=@trevnorris?

CI: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/64/

bnoordhuis · 2015-06-21T21:53:23Z

lib/buffer.js

+    return '';
+  if (arguments.length === 0)
+    return this.utf8Slice(0, length);
+  return slowToString(this, arguments[0], arguments[1], arguments[2]);


This seems to be a tiny bit faster than slowToString.apply(this, arguments) but that may change when we upgrade to V8 4.4.

Okay, seems 4.2 does a pretty good job too, once it warms up. I'll switch this over to .apply().

mscdex · 2015-06-21T22:16:21Z

lib/buffer.js

+    return '';
+  if (arguments.length === 0)
+    return this.utf8Slice(0, length);
+  return slowToString.apply(this, arguments);


Wouldn't it be faster to do return slowToString.call(this, arguments[0], arguments[1], arguments[2]);? Or maybe pass this as the first argument and avoid .call()/.apply() altogether?

See this comment. The initial version called slowToString(this, arguments[0], ...) but when I ran more benchmarks, it turned out that .apply() is faster by about 25-30% once the optimizing compiler kicks in.

Won't function(encoding, start, end) { and return slowToString.apply(this, [encoding, start, end]); work here? There seems to be no reason to use arguments. Could you test that, please?

Why create a new array every time when there is already arguments?

Ok, true, an array is slow.

In my local microbenchmark function(encoding, start, end) { and return .call(this, encoding, start, end) wins for all number of arguments (except three, where .apply(this, arguments) is as fast).

The problem with return slowToString.call(this, arguments[0], arguments[1], arguments[2]); is in arguments, not in .call().

Hm. Can't get my test to perform accurately. Seems the true performance hit is using undefined arguments[N] values. Welp, seems we have some cleaning up to do in places like: https://github.com/nodejs/io.js/blob/v2.3.0/src/node.js#L339-L350

@trevnorris What do you mean cleaning up that particular section of code? That code is switching the arguments length and only passing that many arguments.

FWIW I already benchmarked various alternative function calling methods for this patch on top of the next branch (v8 4.3):

Replacing apply() with .call(this, arguments[0], arguments[1], ...) slows down the cases when there are arguments passed, and there is a slight performance hit in the zero argument case (with apply I saw ~510% increase, but call showed ~470%).

Replacing apply() with a direct function call, passing in the context as an extra argument performs about the same as using .call().

Replacing apply() with a switch on arguments.length and using either .call() or passing the context in the < 3 cases (using .apply() as default), the zero argument case is a bit lower IIRC (~470% increase), but now the non-zero argument cases are no longer affected.

So just using .apply() instead of several-line switch is shorter and even a tad faster on the zero argument case. I haven't tested these scenarios on the master branch (v8 4.2) though.

@mscdex Is args there an arguments object or an Array? I'm aware that referencing undefined values on an arguments object does have significant overhead, but my benchmarks show that that is not the case for a real array.

@trevnorris I did not test with an array, just arguments.

@mscdex neither had I before this PR. Some testing showed that referencing undefined members in an array doesn't have any performance impact. Only side effect is the argument length being too long on the called function.

mscdex · 2015-06-21T23:08:10Z

LGTM with one style nit

dcousens · 2015-06-22T03:52:51Z

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls. Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

Do we have data to back that up? Most of my calls to toString() are typically hex, but maybe that is offset by stream encoding?
Why couldn't the compiler do something to know which path is being [continually] re-used? This shouldn't be an issue here...

Also, why did you split up the function, is this purely something for the compiler to avoid unpacking the arguments until they are used?
If so, this optimization is in the wrong place.

bnoordhuis · 2015-06-22T09:19:52Z

Do we have data to back that up? Most of my calls to toString() are typically hex, but maybe that is offset by stream encoding?

I'm basing it off the number of implicit toString() calls you get in scripts that do string += buf. If there is compelling evidence that e.g. .toString('hex') calls are more prevalent, then it makes sense to optimize for that. That was actually my initial hunch but a (quick, small, non-scientific) sampling of modules didn't bear that out.

Why couldn't the compiler do something to know which path is being [continually] re-used?

V8 is a method JIT, not a tracing JIT. It optimizes whole methods, not individual code paths.

Also, why did you split up the function, is this purely something for the compiler to avoid unpacking the arguments until they are used?

Two reasons: small methods are more likely to get inlined at the call site and generally result in tighter machine code.

trevnorris · 2015-06-22T17:14:57Z

lib/buffer.js

@@ -379,6 +378,16 @@ Buffer.prototype.toString = function(encoding, start, end) {
 };


+Buffer.prototype.toString = function() {
+  const length = this.length | 0;


Is the int32 conversion on the length a safeguard in case the length has been altered or this isn't actually a Buffer instance?

Neither, really. I just like my variables to have the type I expect them to have. I can remove it if you want.

Just wanted to make sure there wasn't something I didn't see. Don't bother taking it out.

trevnorris · 2015-06-22T17:18:59Z

One question, but code LGTM.

dcousens · 2015-06-23T00:31:30Z

I'm basing it off the number of implicit toString() calls you get in scripts that do string += buf. If there is compelling evidence that e.g. .toString('hex') calls are more prevalent, then it makes sense to optimize for that. That was actually my initial hunch but a (quick, small, non-scientific) sampling of modules didn't bear that out.

This is my only concern going forward, I feel like this 'optimization' is entirely speculative at this point, and it does introduce some more complexity to maintaining this code in the future.

If we can put together some actual statistics that show this will be more beneficial, then I think that'd be great.

However, from a different view point, if we are going to do this code path style optimization, why not offer the same for the other toString methods?

If you really want to do this, why not offer toUTF8, toHex, among others.
I'm not aware of the reasons the all-encompassing toString was opted for 'in the beggining', but I'm sure they exist.

bnoordhuis · 2015-06-23T06:53:35Z

I feel like this 'optimization' is entirely speculative at this point

It's a win no matter how you slice it: it makes the default case faster without regressing the non-default case.

I don't find the complexity argument convincing. You should see some of the other code in the lib/ directory!

tellnes · 2015-06-23T07:24:14Z

I've not run the benchmark, but otherwise LGTM.

dcousens · 2015-06-23T11:55:59Z

without regressing the non-default case.

Do we have stats for that? Or we just taking each others word on these things :)

I don't find the complexity argument convincing. You should see some of the other code in the lib/ directory!

Hardly a convincing rebuttal either, "look, its as bad everywhere else!".

Not meaning to be a PITA, its just that this potentially touches on a lot of code, and I'm just trying to help :)

trevnorris · 2015-06-23T16:08:23Z

Change in complexity is minimal at best, and there's an included benchmark to verify the result.

And in terms of allowed complexity. We allow some crazy stuff to be done, far beyond what this patch does, for even minimal performance gains. I'm responsible for some of those myself. This patch is doing nothing out of the ordinary.

Fishrock123 · 2015-06-24T19:23:52Z

LGTM

dcousens · 2015-06-25T00:11:44Z

@trevnorris sorry, I missed that benchmark.
Anyway, LGTM, IMHO this optimization is a lame compromise, but, if it works it works.
In a perfect world... haha.

Fishrock123 · 2015-06-25T00:19:44Z

@dcousens such is the world of JIT compiled languages. :)

Break up Buffer#toString() into a fast and slow path. The fast path optimizes for zero-length buffers and no-arg method invocation. The speedup for zero-length buffers is a satisfying 700%. The no-arg toString() operation gets faster by about 13% for a one-byte buffer. This change exploits the fact that most Buffer#toString() calls are plain no-arg method calls. Rewriting the method to take no arguments means a call doesn't go through an ArgumentsAdaptorTrampoline stack frame in the common case. PR-URL: nodejs#2027 Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: Christian Tellnes <christian@tellnes.no> Reviewed-By: Daniel Cousens <email@dcousens.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: Trevor Norris <trev.norris@gmail.com>

rvagg · 2015-07-02T03:59:00Z

@bnoordhuis I'm running your new benchmark script against master and v2.3.1 and it seems slower now:

buffers/buffer-tostring.js arg=true len=0 n=10000000    57467728.45114  67528289.82904  85.10%
buffers/buffer-tostring.js arg=true len=1 n=10000000    13827075.37094  14473678.67815  95.53%
buffers/buffer-tostring.js arg=true len=64 n=10000000   10174151.11764  10710577.4752   94.99%
buffers/buffer-tostring.js arg=true len=1024 n=10000000 4867748.07704   5020575.83599   96.96%
buffers/buffer-tostring.js arg=false len=0 n=10000000   70084364.82509  85137567.23931  82.32%
buffers/buffer-tostring.js arg=false len=1 n=10000000   13126817.325    14294504.30314  91.83%
buffers/buffer-tostring.js arg=false len=64 n=10000000  10044569.75487  10748428.7662   93.45%
buffers/buffer-tostring.js arg=false len=1024 n=10000000    4760963.57608   5092279.38181   93.49%

trevnorris · 2015-07-02T04:42:32Z

@rvagg That may actually be a byproduct of a patch I'm responsible for and landed after this one about preventing Buffer methods from aborting.

rvagg · 2015-07-02T04:47:30Z

@trevnorris 700 steps forward, 800 steps back?

rvagg · 2015-07-02T04:52:10Z

backing up to this commit it looks like you're right @trevnorris, you've ruined great gains!

buffers/buffer-tostring.js arg=true len=0 n=10000000    57467728.45114  67528289.82904  429491405.29717 636.02%
buffers/buffer-tostring.js arg=true len=1 n=10000000    13827075.37094  14473678.67815  14508241.08644  100.24%
buffers/buffer-tostring.js arg=true len=64 n=10000000   10174151.11764  10710577.4752   10397595.66746  97.08%
buffers/buffer-tostring.js arg=true len=1024 n=10000000 4867748.07704   5020575.83599   4967084.6993    98.93%
buffers/buffer-tostring.js arg=false len=0 n=10000000   70084364.82509  85137567.23931  455011800.27603 534.44%
buffers/buffer-tostring.js arg=false len=1 n=10000000   13126817.325    14294504.30314  13154697.29727  92.03%
buffers/buffer-tostring.js arg=false len=64 n=10000000  10044569.75487  10748428.7662   10844602.80418  100.89%
buffers/buffer-tostring.js arg=false len=1024 n=10000000    4760963.57608   5092279.38181   5012971.76375   98.44%

636% and 534% perf improvement at this commit, but going back below 100% @ master

trevnorris · 2015-07-02T05:06:50Z

Okay, the len=0 case I knew would take a big hit. But realistically how often is that happening? Assumptions aside, I had to drop the quick return in order to properly check the instance on the native side to ensure throwing was consistent despite length.

rvagg · 2015-07-02T05:29:47Z

yeah, I'm not overly concerned since I don't think len=0 is a particularly common case, I just needed to know whether this was a notable item for the changelog and the answer is no

dcousens · 2015-07-02T10:07:01Z

So is this commit still useful?

trevnorris · 2015-07-02T16:22:25Z

Probably. Will need some massaging to regain as much perf as possible.

Break up Buffer#toString() into a fast and slow path. The fast path optimizes for zero-length buffers and no-arg method invocation. The speedup for zero-length buffers is a satisfying 700%. The no-arg toString() operation gets faster by about 13% for a one-byte buffer. This change exploits the fact that most Buffer#toString() calls are plain no-arg method calls. Rewriting the method to take no arguments means a call doesn't go through an ArgumentsAdaptorTrampoline stack frame in the common case. PR-URL: nodejs#2027 Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: Christian Tellnes <christian@tellnes.no> Reviewed-By: Daniel Cousens <email@dcousens.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com> Reviewed-By: Trevor Norris <trev.norris@gmail.com>

bnoordhuis added buffer benchmark labels Jun 21, 2015

bnoordhuis reviewed Jun 21, 2015
View reviewed changes

bnoordhuis force-pushed the optimize-buffer-tostring branch from f858003 to 4b66a40 Compare June 21, 2015 22:02

mscdex reviewed Jun 21, 2015
View reviewed changes

trevnorris reviewed Jun 22, 2015
View reviewed changes

rvagg force-pushed the master branch from 9d21135 to 628a3ab Compare June 25, 2015 09:18

bnoordhuis force-pushed the optimize-buffer-tostring branch from 4b66a40 to 8350f3a Compare June 25, 2015 16:33

bnoordhuis closed this Jun 25, 2015

bnoordhuis deleted the optimize-buffer-tostring branch June 25, 2015 16:33

bnoordhuis merged commit 8350f3a into nodejs:master Jun 25, 2015

rvagg mentioned this pull request Jun 30, 2015

Release proposal: v2.3.2 #2083

Closed

buffer: optimize Buffer#toString() #2027

buffer: optimize Buffer#toString() #2027

Conversation

bnoordhuis commented Jun 21, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mscdex commented Jun 21, 2015

dcousens commented Jun 22, 2015

bnoordhuis commented Jun 22, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trevnorris commented Jun 22, 2015

dcousens commented Jun 23, 2015

bnoordhuis commented Jun 23, 2015

tellnes commented Jun 23, 2015

dcousens commented Jun 23, 2015

trevnorris commented Jun 23, 2015

Fishrock123 commented Jun 24, 2015

dcousens commented Jun 25, 2015

Fishrock123 commented Jun 25, 2015

rvagg commented Jul 2, 2015

trevnorris commented Jul 2, 2015

rvagg commented Jul 2, 2015

rvagg commented Jul 2, 2015

trevnorris commented Jul 2, 2015

rvagg commented Jul 2, 2015

dcousens commented Jul 2, 2015

trevnorris commented Jul 2, 2015