Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: replace macro with lambda #13676

Closed
wants to merge 1 commit into from
Closed

src: replace macro with lambda #13676

wants to merge 1 commit into from

Conversation

seishun
Copy link
Contributor

@seishun seishun commented Jun 14, 2017

Makes it debuggable and easier to reason about.

I'm not sure if it would be worthwhile to replace V([] {}) with V([&] {}) for consistency. Also, couldn't find any other usages of parameterless lambdas in the codebase, so I'm not sure if this is the right style. (as opposed to []{}, without a space)

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Jun 14, 2017
@evanlucas
Copy link
Contributor

Does this bring any performance impact?

V(dst[(*k)++] = ((hi & 0x03) << 6) | ((lo & 0x3F) >> 0));
#undef V
return true; // Continue decoding.
auto V = [&](auto expr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

In file included from ../src/inspector_socket.cc:6:0:
../src/base64.h: In function ‘bool node::base64_decode_group_slow(char*, size_t, const TypeName*, size_t, size_t*, size_t*)’:
../src/base64.h:60:16: error: use of ‘auto’ in lambda parameter declaration only available with -std=c++14 or -std=gnu++14
   auto V = [&](auto expr) {
                ^~~~

I’m not sure what to do about that. :/

Copy link
Contributor Author

@seishun seishun Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to http://en.cppreference.com/w/cpp/compiler_support, polymorphic lambdas have been supported in GCC starting with version 4.9, which was released on April 22, 2014. Any reason why we can't require a compiler that's less than 3 years old?

According to this answer, -std=c++14 is supported in GCC from 5.2, but we can use -std=c++1y instead.

@seishun
Copy link
Contributor Author

seishun commented Jun 14, 2017

(edited to include benchmark results from other platforms)

Windows 7, VS2015

                                                    improvement confidence      p.value
 buffers\\buffer-base64-decode-wrapped.js n=32         -24.95 %        *** 3.227530e-32
 buffers\\buffer-base64-decode.js n=32                 -13.10 %        *** 2.297506e-25
 buffers\\buffer-base64-encode.js n=32 len=67108864     -1.57 %            1.531956e-01

Windows 10, VS2015

                                                    improvement confidence      p.value
 buffers\\buffer-base64-decode-wrapped.js n=32         -21.93 %        *** 5.113313e-34
 buffers\\buffer-base64-decode.js n=32                  -2.37 %            8.234397e-02
 buffers\\buffer-base64-encode.js n=32 len=67108864      0.19 %            8.352671e-01

Linux, gcc 4.9.1

                                                   improvement confidence     p.value
 buffers/buffer-base64-decode-wrapped.js n=32          -3.27 %            0.198180336
 buffers/buffer-base64-decode.js n=32                  -8.61 %         ** 0.003427911
 buffers/buffer-base64-encode.js n=32 len=67108864     -0.89 %            0.631352206

macOS, Apple LLVM version 8.1.0 (clang-802.0.42)

                                                    improvement confidence   p.value
 buffers/buffer-base64-decode-wrapped.js n=32           1.22 %            0.2750779
 buffers/buffer-base64-decode.js n=32                  -0.41 %            0.7213537
 buffers/buffer-base64-encode.js n=32 len=67108864     -0.78 %            0.5002049

@evanlucas
Copy link
Contributor

@seishun I wouldn't call 25% performance decrease insignificant...

@addaleax
Copy link
Member

Any reason why we can't require a compiler that's less than 3 years old?

I think I saw talk about raising the minimum to GCC 4.8 somewhere; but tbh I can’t really say how much of an impact bumping the requirements to C++14 would have.

@seishun
Copy link
Contributor Author

seishun commented Jun 14, 2017

@evanlucas Well, zero-filling buffers caused about 50% performance decrease on average (#12141 (comment)) and I don't see anyone complaining about that. And I'd assume that API is used much more frequently than base64 decoding.

@addaleax GCC 4.8 is already required, but there's a PR to raise that to 4.9: #13466.

@evanlucas
Copy link
Contributor

@seishun there is also a way to prevent that from happening (Buffer.allocUnsafe()). That is not the case with base64 decoding.

-1 from me unless the performance impact no longer exists

@seishun
Copy link
Contributor Author

seishun commented Jun 14, 2017

there is also a way to prevent that from happening (Buffer.allocUnsafe()).

That's not exactly a trivial change, considering it affected dependencies too, so the existence of an alternative API doesn't really change anything.

-1 from me unless the performance impact no longer exists

Are you also using Windows? If not, can you provide benchmark results on your platform as well? GCC and clang might be better at optimizing this.

@gibfahn
Copy link
Member

gibfahn commented Jun 14, 2017

Well, zero-filling buffers caused about 50% performance decrease on average (#12141 (comment)) and I don't see anyone complaining about that. And I'd assume that API is used much more frequently than base64 decoding.

But zero-filling buffers is a security fix. This is for debugging and readability. I don't think those things are comparable.

cc/ @nodejs/performance for the question of whether we care about this perf regression (and whether it's just on Windows). I have no opinion on that (no idea how often this gets used).

@mscdex
Copy link
Contributor

mscdex commented Jun 14, 2017

Well, zero-filling buffers caused about 50% performance decrease on average (#12141 (comment)) and I don't see anyone complaining about that.

Some of us did complain about it (when it became the default).

Anyway, I am also -1 on this if there is a performance regression and the change is not absolutely necessary.

@benjamingr
Copy link
Member

cc/ @nodejs/performance for the question of whether we care about this perf regression

I'm more in the JS side of performance - but as a user of Node who encodes base64 a lot, I definitely care about this change. As a project collaborator - we haven't merged PRs for -2% performance regressions in util.inspect - I'm surprised the 25% difference isn't optimized away by the compiler in this case but given it clearly isn't - I'm -1 on this as well.

@mcollina
Copy link
Member

there is also a way to prevent that from happening (Buffer.allocUnsafe()).

That's not exactly a trivial change, considering it affected dependencies too, so the existence of an alternative API doesn't really change anything.

I do not see how this is relevant to this discussion.

BTW, I'm 👎 for this change to land if there is any decrease in performance.

@seishun
Copy link
Contributor Author

seishun commented Jun 14, 2017

Since no one else did, I benchmarked this on Linux (gcc 4.9.1):

                                                   improvement confidence     p.value
 buffers/buffer-base64-decode-wrapped.js n=32          -3.27 %            0.198180336
 buffers/buffer-base64-decode.js n=32                  -8.61 %         ** 0.003427911
 buffers/buffer-base64-encode.js n=32 len=67108864     -0.89 %            0.631352206

I'd assume the difference is even less noticeable with newer compilers. I have no idea why buffer-base64-decode.js is affected at all, considering base64_decode_group_slow is never called in it.

Also note that buffers/buffer-base64-decode-wrapped.js tests wrapped base64 input, which isn't the usual case from my experience.

Some of us did complain about it (when it became the default).

I meant end users.

But zero-filling buffers is a security fix. This is for debugging and readability. I don't think those things are comparable.

As someone who has fixed base64 decoding twice (#11995, #13660), I can tell first hand how painful it is to debug functions with macros. In fact, I changed it to a lambda while investigating the latter issue to make my life easier. Sure, it's not as important as security, but let's not dismiss the fact that it significantly raises the barrier for contributions.

@refack
Copy link
Contributor

refack commented Jun 14, 2017

As someone who has fixed base64 decoding twice (#11995, #13660), I can tell first hand how painful it is to debug functions with macros. In fact, I changed it to a lambda while investigating the latter issue to make my life easier. Sure, it's not as important as a security, but let's not dismiss the fact that it significantly raises the barrier for contributions.

I'm +10 on debuggable and easier to reason about
As a compromise could we use the lambda for DEBUG builds, and the macro for RELEASE?
Can we prove the semantics are the same (apart from timing which differ anyway between DEBUG and RELEASE)?

P.S. the picture on linux looks much better

                                                  improvement confidence     p.value
buffers/buffer-base64-decode-wrapped.js n=32          -3.27 %            0.198180336
buffers/buffer-base64-decode.js n=32                  -8.61 %         ** 0.003427911
buffers/buffer-base64-encode.js n=32 len=67108864     -0.89 %            0.631352206

p values show insignificance except for buffer-base64-decode.js and that uses a really large data chunk ('abcd'.repeat(8 << 20) X 32).

@aqrln
Copy link
Contributor

aqrln commented Jun 15, 2017

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

(I believe this translates to vanilla clang 3.9.x, but I'm not sure.)

../src/base64.h:60:16: error: 'auto' not allowed in lambda parameter
  auto V = [&](auto expr) {
               ^~~~

After applying this patch:

diff --git a/src/base64.h b/src/base64.h
index 39a8ccf196..07053af955 100644
--- a/src/base64.h
+++ b/src/base64.h
@@ -5,6 +5,7 @@
 
 #include "util.h"
 
+#include <functional>
 #include <stddef.h>
 
 namespace node {
@@ -57,7 +58,7 @@ bool base64_decode_group_slow(char* const dst, const size_t dstlen,
                               size_t* const i, size_t* const k) {
   uint8_t hi;
   uint8_t lo;
-  auto V = [&](auto expr) {
+  auto V = [&](std::function<void()> expr) {
     for (;;) {
       const uint8_t c = src[*i];
       lo = unbase64(c);

it compiles, but I get the following benchmark results (comparing against master):

                                                   improvement confidence      p.value
 buffers/buffer-base64-decode-wrapped.js n=32         -72.99 %        *** 4.540138e-39
 buffers/buffer-base64-decode.js n=32                   0.07 %            7.470800e-01
 buffers/buffer-base64-encode.js n=32 len=67108864     -0.30 %            6.156450e-01

@seishun
Copy link
Contributor Author

seishun commented Jun 15, 2017

@aqrln Well yes, it would certainly be much slower with std::function. Try replacing -std=gnu++0x with -std=gnu++1y in common.gypi instead.

@bnoordhuis
Copy link
Member

As a compromise could we use the lambda for DEBUG builds, and the macro for RELEASE?

Different code paths in debug vs release is a terrible idea.

@aqrln
Copy link
Contributor

aqrln commented Jun 15, 2017

@seishun hmm, yes, that's better :D

                                                    improvement confidence   p.value
 buffers/buffer-base64-decode-wrapped.js n=32           1.22 %            0.2750779
 buffers/buffer-base64-decode.js n=32                  -0.41 %            0.7213537
 buffers/buffer-base64-encode.js n=32 len=67108864     -0.78 %            0.5002049

@refack
Copy link
Contributor

refack commented Jun 15, 2017

As a compromise could we use the lambda for DEBUG builds, and the macro for RELEASE?

Different code paths in debug vs release is a terrible idea.

@bnoordhuis, Ack.

@seishun
Copy link
Contributor Author

seishun commented Jun 16, 2017

@refack
Copy link
Contributor

refack commented Jun 16, 2017

Killed https://ci.nodejs.org/job/node-test-pull-request/8697/ that has been running for 3h

@seishun
Copy link
Contributor Author

seishun commented Jun 16, 2017

  g++ '-DV8_GYP_BUILD' '-DV8_TARGET_ARCH_ARM' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT' '-Dv8_promise_internal_field_count' '-DV8_I18N_SUPPORT' '-DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_STATIC' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_LEGACY_CONVERSION=1' -I../deps/v8 -I../. -I/home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj/gen -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common  -pthread -Wall -Wextra -Wno-unused-parameter -fno-strict-aliasing -mfloat-abi=hard -marm -fdata-sections -ffunction-sections -O3 -O3 -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++1y -MMD -MF /home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/.deps//home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj.target/v8_base/deps/v8/src/runtime/runtime-operators.o.d.raw   -c -o /home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj.target/v8_base/deps/v8/src/runtime/runtime-operators.o ../deps/v8/src/runtime/runtime-operators.cc
g++: internal compiler error: Terminated (program cc1plus)

Interesting. I wonder if it's because of the -std=gnu++1y switch? And why would it hang after that?

@refack
Copy link
Contributor

refack commented Jun 16, 2017

g++ '-DV8_GYP_BUILD' '-DV8_TARGET_ARCH_ARM' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT' '-Dv8_promise_internal_field_count' '-DV8_I18N_SUPPORT' '-DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_STATIC' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_LEGACY_CONVERSION=1' -I../deps/v8 -I../. -I/home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj/gen -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common  -pthread -Wall -Wextra -Wno-unused-parameter -fno-strict-aliasing -mfloat-abi=hard -marm -fdata-sections -ffunction-sections -O3 -O3 -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++1y -MMD -MF /home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/.deps//home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj.target/v8_base/deps/v8/src/runtime/runtime-operators.o.d.raw   -c -o /home/iojs/build/workspace/node-test-commit-arm/nodes/armv7-wheezy/out/Release/obj.target/v8_base/deps/v8/src/runtime/runtime-operators.o ../deps/v8/src/runtime/runtime-operators.cc
g++: internal compiler error: Terminated (program cc1plus)

It didn't hang, was just SUPER slow (I watched it for a while)

@seishun
Copy link
Contributor Author

seishun commented Jun 16, 2017

I see. Let's just hope it's fixed in a newer GCC version. (See #13466 (comment))

@BridgeAR
Copy link
Member

BridgeAR commented Sep 4, 2017

CI https://ci.nodejs.org/job/node-test-pull-request/9934/

@nodejs/collaborators PTAL

@eljefedelrodeodeljefe
Copy link
Contributor

After reading the above -1

@benjamingr
Copy link
Member

@BridgeAR did you mean to post a CI there?

@BridgeAR
Copy link
Member

BridgeAR commented Sep 4, 2017

@benjamingr yes, thanks. I updated the link.

The tests are failing badly. Probably older GCC versions? I did not look further into it.

In general I feel there is little chance that this will land as there are multiple -1.

@seishun
Copy link
Contributor Author

seishun commented Sep 4, 2017

The tests are failing badly. Probably older GCC versions? I did not look further into it.

Yes, some test machines use gcc versions older than the required 4.9.4. I've been working on that for the last 1.5 months, see nodejs/build#797 and nodejs/build#809.

In general I feel there is little chance that this will land as there are multiple -1.

It seems all of them are caused by the performance issue on Windows, so it'll probably have to wait until Microsoft improves their compiler.

@addaleax
Copy link
Member

addaleax commented Sep 4, 2017

@seishun You’re right, I think this would have to wait for that – the question is, do you really want to wait until that happens + we make that new compiler version our supported baseline?

Edit: Like, that is totally your choice. But I think those would be the prerequisites for this landing.

@seishun
Copy link
Contributor Author

seishun commented Sep 4, 2017

do you really want to wait until that happens + we make that new compiler version our supported baseline?

What's the alternative, forget about this?

@addaleax
Copy link
Member

addaleax commented Sep 4, 2017

@seishun Yeah, I think so. :( Or maybe leave a TODO comment, but either way, it’s going to be a while…

@benjamingr
Copy link
Member

What's the alternative, forget about this?

You can also ask for a CTC vote on this - but I'm afraid it'll be a hard sell since the current objections raised by reviewers are valid - and people won't buy "some nicer code" in exchange for compiler support.

I really appreciate the effort by the way.

@BridgeAR
Copy link
Member

I am closing this due to the mentioned reasons and the multiple -1.

@BridgeAR BridgeAR closed this Sep 19, 2017
Makes it debuggable and easier to reason about.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.