Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: flakiness in building addons with make #22006

Closed
refack opened this issue Jul 27, 2018 · 25 comments · Fixed by #23733
Closed

test: flakiness in building addons with make #22006

refack opened this issue Jul 27, 2018 · 25 comments · Fixed by #23733
Labels
addons Issues and PRs related to native addons. build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests.

Comments

@refack
Copy link
Contributor

refack commented Jul 27, 2018

  • Version: master
  • Platform: macOS
  • Subsystem: build,test

test phase fails around building the test addons with ENOENT, probably while looking for the node binary, while it's clearly built and at the correct place since previous steps use is (e.g. node-gyp)

This happened either during the test/addons/.buildstamp target
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1012/20041/console

10:54:27 gyp info spawn make
10:54:27 gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build', '--jobs', 2 ]
10:54:27 gyp info ok 
10:54:27 Building addon in /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/test/addons/zlib-binding
10:54:27 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/tools/build-addons.js:58
10:54:27 main(process.argv[3]).catch((err) => setImmediate(() => { throw err; }));
10:54:27                                                           ^
10:54:27 
10:54:27 Error: spawn /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node ENOENT
10:54:27     at Process.ChildProcess._handle.onexit (internal/child_process.js:229:19)
10:54:27     at onErrorNT (internal/child_process.js:406:16)
10:54:27     at process._tickCallback (internal/process/next_tick.js:63:19)
10:54:27 make[1]: *** [test/addons/.buildstamp] Error 1

Or right after
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1012/20042/console

13:28:08 gyp info spawn make
13:28:08 gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build', '--jobs', 2 ]
13:28:08 gyp info ok 
13:28:08 touch test/addons/.buildstamp
13:28:09 if [ -x /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/./node ] && [ -e /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/./node ]; then /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/./node  tools/doc/generate.js --node-version=v11.0.0 --analytics= doc/api/addons.md --output-directory=out/doc/api; elif [ -x `which node` ] && [ -e `which node` ] && [ `which node` ]; then `which node`  tools/doc/generate.js --node-version=v11.0.0 --analytics= doc/api/addons.md --output-directory=out/doc/api; else echo "No available node, cannot run \"node  tools/doc/generate.js --node-version=v11.0.0 --analytics= doc/api/addons.md --output-directory=out/doc/api\""; exit 1; fi;
13:28:09 No available node, cannot run "node  tools/doc/generate.js --node-version=v11.0.0 --analytics= doc/api/addons.md --output-directory=out/doc/api"
13:28:09 make[2]: *** [out/doc/api/addons.html] Error 1
@refack refack added build Issues and PRs related to build files or the CI. test Issues and PRs related to the tests. addons Issues and PRs related to native addons. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jul 27, 2018
@refack
Copy link
Contributor Author

refack commented Jul 27, 2018

  • Recently moving to a new macOS provider in CI and the machines are a bit thinner on resources than they used to be, so builds are failing more often.
  • Recently changing addons to build in parallel rather than in series. It's possible all these failures in building addons are revealing a race condition there that isn't showing up elsewhere.

Since it's happening near the end of the addons build (either last addon or next to last in the directory) it might be bad interaction between node, make and bash

@refack
Copy link
Contributor Author

refack commented Jul 27, 2018

I couldn't find any occurrences on macOS 10.10 we stopped testing master on 10.10

But I spotted one that is "far" from target start or finish (addons/async-hello-world):
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1011/20035/console

Looking at the surrounding make targets being built it seems like it is sandwiched between relinkin of the binary. This might be because we're recursively spawning make while JOBS=2, so it might race with itself.

06:31:36 rm ab44d261d5eee77f30a9729f8502ae1eca1260e4.intermediate 5942863992b65f87a6b23335633e527c76a5419b.intermediate 8bde6021d7f76f37155f81b6949420e6304169c0.intermediate
06:31:36 if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
06:31:36 /Library/Developer/CommandLineTools/usr/bin/make test-ci
06:31:36 # Clean up any leftover processes but don't error if found.

@refack
Copy link
Contributor Author

refack commented Jul 27, 2018

Since this is first time I've seen this (https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1011/19994/ canary build), my current working assumption is that this is due to some new dependency cycle introduced by V8 6.4

@rubys
Copy link
Member

rubys commented Jul 27, 2018

Are we seeing this with any other pull request?

@refack
Copy link
Contributor Author

refack commented Jul 27, 2018

Are we seeing this with any other pull request?

@rubys yes, for example the last link I posted (job 19994) was run 2 days ago on the v8 canary branch.

@rubys
Copy link
Member

rubys commented Jul 29, 2018

Back home. Did 8 parallel clones of node from github on a macOS High Sierra 10.13.5 machine (4 GHz Intel Core i7 iMac late 2014). Ran ./configure; make test-ci concurrently on each. All completed successfully.

@Trott
Copy link
Member

Trott commented Jul 29, 2018

Back home. Did 8 parallel clones of node from github on a macOS High Sierra 10.13.5 machine (4 GHz Intel Core i7 iMac late 2014). Ran ./configure; make test-ci concurrently on each. All completed successfully.

Here's the specs on one of the machines where we're seeing failures:

Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
Processor Speed: 2.79 GHz
Number of Processors: 2
Memory: 4 GB

@refack
Copy link
Contributor Author

refack commented Jul 29, 2018

@rubys, I think the problem is more pronounced with the run-ci target:

node/Makefile

Lines 483 to 484 in e3a4702

run-ci: build-ci
$(MAKE) test-ci

First the build-ci sub-target actually builds a node binary. Then the recursive call to test-ci start to in parallel re-build the binary, and bootstrap the testing. Those two task unsurprisingly clash.

That following was apparently just a ninja bug with long paths.
I believe the crux of the problem is a self referencing dependency in the new V8 patch that causes it to always rebuild. For example calling ninja torque again after it completes, shows a constant dirtiness detection.

>ninja -v -d explain -C out/Release torque
ninja: Entering directory `out\Release'
ninja explain: output obj/deps/v8/third_party/antlr4/runtime/Cpp/runtime/src/torque.ANTLRErrorListener.obj doesn't exist
ninja explain: obj/deps/v8/third_party/antlr4/runtime/Cpp/runtime/src/torque.ANTLRErrorListener.obj is dirty
...
ninja explain: torque.exe is dirty


>ninja -v -d explain -C out/Release torque
ninja: Entering directory `out\Release'
ninja explain: output obj/deps/v8/third_party/antlr4/runtime/Cpp/runtime/src/torque.ANTLRErrorListener.obj doesn't exist
ninja explain: obj/deps/v8/third_party/antlr4/runtime/Cpp/runtime/src/torque.ANTLRErrorListener.obj is dirty
...
ninja explain: torque.exe is dirty

@targos
Copy link
Member

targos commented Jul 29, 2018

The dependency on antlr4 was removed in a recent V8 commit. Maybe backporting it could help?

@refack
Copy link
Contributor Author

refack commented Jul 29, 2018

The dependency on antlr4 was removed in a recent V8 commit. Maybe backporting it could help?

I'm doing some testing trying to figure out where and if there's a self referencing dependency (my current guess is it's one of the -I include directories, so anyway we need to make sure the codegen part doesn't "dirty" itself) Nope

@rubys
Copy link
Member

rubys commented Jul 30, 2018

TL;DR: Possible courses of action (non-exclusive):

  1. work to ensure that make following make does nothing as everything is up to date
  2. examine the build process and ensure that NODE_EXE is build exactly once
  3. make a hard dependency so that building of addons waits until the NODE_EXE is built.

Analysis:

$ grep -a out\/Release\/node consoleText
  LD_LIBRARY_PATH=/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/lib.host:/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../.; mkdir -p /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release; tools/specialize_node_d.py "/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node.d" src/node.d mac x64
  c++ -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnode.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_base.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libzlib.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libuv.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libopenssl.a -Wl,-no_pie -Wl,-search_paths_first -mmacosx-version-min=10.7 -arch x86_64 -L/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release -stdlib=libc++  -o "/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node" /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/obj.target/node/src/node_main.o /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnode.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libplatform.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicui18n.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libzlib.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libhttp_parser.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libcares.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libuv.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnghttp2.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libopenssl.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_base.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libbase.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libsampler.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicuucx.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicudata.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicustubdata.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_snapshot.a -framework CoreFoundation -lm
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
  c++ -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnode.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_base.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libzlib.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libuv.a -Wl,-force_load,/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libopenssl.a -Wl,-no_pie -Wl,-search_paths_first -mmacosx-version-min=10.7 -arch x86_64 -L/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release -stdlib=libc++  -o "/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node" /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/obj.target/node/src/node_main.o /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnode.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libplatform.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicui18n.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libzlib.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libhttp_parser.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libcares.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libuv.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libnghttp2.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libopenssl.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_base.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libbase.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_libsampler.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicuucx.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicudata.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libicustubdata.a /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/libv8_snapshot.a -framework CoreFoundation -lm
Error: spawn /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node ENOENT
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi

Commentary:

  • First match is a false match: it is for node.d
  • Second match links the executable
  • Third match links the executable to the root directory
  • Fourth match links the executable again
  • Fifth match is the error message: executable not found
  • Sixth match links the executable to the root directory again

Possible theory: the second link takes a while, during which gyp is quickly run and then Building addon starts. The fact that the error occurs between the second c++ linker step and the creation of the symbolic link is consistent with this theory.

Further searches in consoleText finds two occurrences of the following text:

/Library/Developer/CommandLineTools/usr/bin/make -C out BUILDTYPE=Release V=1

The second rebuilds a number of targets. This is unsurprising given there is a number of steps marked with FORCE_DO_CMD, and the fact that stamp option in CheckProtocolCompatibility.py doesn't update the timestamp, possible fix:

diff --git a/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py b/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
index c70162a2a4..886ffbed7f 100755
--- a/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
+++ b/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
@@ -473,7 +473,7 @@ def main():
 
     if arg_options.stamp:
         with open(arg_options.stamp, 'a') as _:
-            pass
+            os.utime(arg_options.stamp, None)
 
 if __name__ == '__main__':
     sys.exit(main())

Related: this is an excerpt from the root level Makefile:

.PHONY: build-addons
# .buildstamp needs $(NODE_EXE) but cannot depend on it
# directly because it calls make recursively.  The parent make cannot know
# if the subprocess touched anything so it pessimistically assumes that
# .buildstamp is out of date and need a rebuild.
# Just goes to show that recursive make really is harmful...
# TODO(bnoordhuis) Force rebuild after gyp update.
build-addons: | $(NODE_EXE) test/addons/.buildstamp

@rubys
Copy link
Member

rubys commented Jul 30, 2018

Update: also from the Makefile:

# The -r/-L check stops it recreating the link if it is already in place,
# otherwise $(NODE_EXE) being a .PHONY target means it is always re-run.
# Without the check there is a race condition between the link being deleted
# and recreated which can break the addons build when running test-ci

Apparently, the build of the addons is supposed to use ./node instead of ./out/Release/node.

@rubys
Copy link
Member

rubys commented Jul 30, 2018

Testing the following change now:

diff --git a/Makefile b/Makefile
index c3da8e88bf..4097bf67b0 100644
--- a/Makefile
+++ b/Makefile
@@ -318,7 +318,7 @@ benchmark/napi/function_args/build/Release/binding.node: all \
 # it always triggers a rebuild due to it being a .PHONY rule.  See the comment
 # near the build-addons rule for more background.
 test/gc/build/Release/binding.node: test/gc/binding.cc test/gc/binding.gyp
-       $(NODE) deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
+       ./node deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
                --python="$(PYTHON)" \
                --directory="$(shell pwd)/test/gc" \
                --nodedir="$(shell pwd)"
@@ -352,7 +352,7 @@ test/addons/.buildstamp: config.gypi \
        src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
        test/addons/.docbuildstamp
        env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
-               npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+               npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
                "$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" "$$PWD/test/addons"
        touch $@
 
@@ -382,7 +382,7 @@ test/addons-napi/.buildstamp: config.gypi \
        src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
        src/node_api.h src/node_api_types.h
        env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
-               npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+               npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
                "$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" \
                "$$PWD/test/addons-napi"
        touch $@

@rubys
Copy link
Member

rubys commented Jul 30, 2018

Testing the following change now

Test passes, but I'm skeptical that it fixes the problem. ./node is a symbolic link, so it won't protect you from the underlying file being deleted.

@richardlau
Copy link
Member

Testing the following change now:

diff --git a/Makefile b/Makefile
index c3da8e88bf..4097bf67b0 100644
--- a/Makefile
+++ b/Makefile
@@ -318,7 +318,7 @@ benchmark/napi/function_args/build/Release/binding.node: all \
 # it always triggers a rebuild due to it being a .PHONY rule.  See the comment
 # near the build-addons rule for more background.
 test/gc/build/Release/binding.node: test/gc/binding.cc test/gc/binding.gyp
-       $(NODE) deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
+       ./node deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
                --python="$(PYTHON)" \
                --directory="$(shell pwd)/test/gc" \
                --nodedir="$(shell pwd)"
@@ -352,7 +352,7 @@ test/addons/.buildstamp: config.gypi \
        src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
        test/addons/.docbuildstamp
        env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
-               npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+               npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
                "$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" "$$PWD/test/addons"
        touch $@
 
@@ -382,7 +382,7 @@ test/addons-napi/.buildstamp: config.gypi \
        src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
        src/node_api.h src/node_api_types.h
        env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
-               npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+               npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
                "$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" \
                "$$PWD/test/addons-napi"
        touch $@

Unless $(NODE) is being set outside of the Makefile it should be equivalent to ./node on non-Windows platforms so I'd be very surprised if the change made a difference.

node/Makefile

Lines 42 to 47 in afc5636

# Determine EXEEXT
EXEEXT := $(shell $(PYTHON) -c \
"import sys; print('.exe' if sys.platform == 'win32' else '')")
NODE_EXE = node$(EXEEXT)
NODE ?= ./$(NODE_EXE)

@rubys
Copy link
Member

rubys commented Jul 30, 2018

Unless $(NODE) is being set outside of the Makefile it should be equivalent to ./node on non-Windows platforms so I'd be very surprised if the change made a difference.

I'm puzzled by the specificity of the error message:

Error: spawn /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1012/out/Release/node ENOENT

I've verified that the c++ step deletes the target during the link. We can reduce (but as near as I can tell not eliminate) the window by creating a hard link to the output before calling g++, changing the symbolic link to point to the hard link, running g++, then changing the symbolic link back, and finally deleting the hard link. I just haven't found any documentation that suggests that changing a symbolic link can be done atomically on Mac OS/X.

This still feels like a kludge. Probably worth doing, but probably worthwhile to ensure that runs of make on up to date outputs doesn't do anything, and/or outputs are produced at most once during the course of the make and at at a time guaranteed to be before usage.

@richardlau
Copy link
Member

If you're referring to the full qualified path to out/Release/node, I believe that's because the error is coming from tools/build-addons.js which uses process.execPath, which will be resolved to an absolute path:

await execFile(process.execPath, [nodeGyp, 'rebuild', `--directory=${dir}`],

@rubys
Copy link
Member

rubys commented Jul 30, 2018

@richardlau in that case, the problem is worse. The executable is being deleted while running. Assuming that has no ill-effects, the hard links workaround I described above won't work; instead the linker needs to produce output at a new location, and then that new file needs to be renamed upon successful completion. I've provided a pull request that does that. I still view it as a workaround, and the root cause should be addressed.

@refack
Copy link
Contributor Author

refack commented Jul 30, 2018

The executable is being deleted while running. Assuming that has no ill-effects, the hard links workaround I described above won't work; instead the linker needs to produce output at a new location, and then that new file needs to be renamed upon successful completion. I've provided a pull request that does that. I still view it as a workaround, and the root cause should be addressed.

I agree with what you wrote above, we need to make sure make node (a.k.a make out/Release/node) is idempotent. That AFAICT will solve most of the ill effects.

@rubys
Copy link
Member

rubys commented Jul 30, 2018

@refack be aware that that would require changing not only v8, but also v8's third party dependencies. See deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py patch above for an example. As currently coded, it creates a timestamp if one does not exist, but will not update the timestamp if the file already exists. This is problematic for us, but may not be for v8 or the third party tool.

Net: the fix itself is straightforward, but the process to push it upstream is uncertain and with an outcome unknown. Alternatively, maintaining our own patches creates its own headache.

I'm certainly willing to work towards that as a goal, but meanwhile I do feel that we need immediate relief from the intermittent failures that we are seeing.

@mhdawson
Copy link
Member

mhdawson commented Aug 1, 2018

Since this causes the nightlies to sometimes not build the headers it is also causing the n-api jobs to fail: https://ci.nodejs.org/view/x%20-%20Abi%20stable%20module%20API/job/node-test-node-addon-api/488/

@mhdawson
Copy link
Member

mhdawson commented Aug 1, 2018

Unless there is a straightforward safe fix I think we should back out the v8 upgrade until we can find a solution.

@refack
Copy link
Contributor Author

refack commented Sep 18, 2018

Just found a similar on macOS cousin freebsd11-x64

15:17:19 rm 963c22820f466518f8d3027cd2fa5173e73fb881.intermediate 07e7fc21bc8186f21549155b9cac0303d820db5d.intermediate 8aee6c816e03cc2085aec3267c634cc05c87824b.intermediate
15:17:19 if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
15:17:19 gmake test-ci
15:17:23 Clean up any leftover processes but don't error if found.
15:17:23 ps awwx | grep Release/node | grep -v grep | cat
15:17:23 gmake -C out BUILDTYPE=Release V=1
15:17:23 57870  0  S+       0:00.18 out/Release/node /usr/home/freebsd/ofrobots/test/parallel/test-trace-events-fs-sync.js
15:17:23 57956  0  R+       0:00.10 /usr/home/freebsd/ofrobots/out/Release/node --trace-events-enabled --trace-event-categories node.fs.sync -e fs.writeFileSync("fs.txt", "123", "utf8");fs.statSync("fs.txt");fs.unlinkSync("fs.txt")
15:17:23 kill: 57870: Operation not permitted
15:17:23 kill: 57964: Operation not permitted
15:17:23 Makefile:415: recipe for target 'clear-stalled' failed
15:17:23 gmake[1]: *** [clear-stalled] Error 1
15:17:23 gmake[1]: *** Waiting for unfinished jobs....
15:17:41   touch 07e7fc21bc8186f21549155b9cac0303d820db5d.intermediate
15:17:41   touch 8aee6c816e03cc2085aec3267c634cc05c87824b.intermediate
15:17:41   LD_LIBRARY_PATH=/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.host:/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../.; mkdir -p /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/src/node/inspector/protocol; python tools/inspector_protocol/CodeGenerator.py --jinja_dir tools/inspector_protocol/.. --output_base "/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/src/" --config "/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/node_protocol_config.json"
15:17:42   LD_LIBRARY_PATH=/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.host:/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../deps/v8/gypfiles; mkdir -p /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/src/inspector/protocol /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/include/inspector; python ../third_party/inspector_protocol/CodeGenerator.py --jinja_dir ../third_party --output_base "/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/src/inspector" --config ../src/inspector/inspector_protocol_config.json
15:17:43   touch 963c22820f466518f8d3027cd2fa5173e73fb881.intermediate
15:17:43   LD_LIBRARY_PATH=/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.host:/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../deps/v8/gypfiles; mkdir -p /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/torque-generated; "/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/torque" ../src/builtins/base.tq ../src/builtins/array.tq ../src/builtins/array-foreach.tq ../src/builtins/array-sort.tq ../src/builtins/typed-array.tq ../src/builtins/data-view.tq -o "/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/obj/gen/torque-generated"
15:17:45 rm 963c22820f466518f8d3027cd2fa5173e73fb881.intermediate 07e7fc21bc8186f21549155b9cac0303d820db5d.intermediate 8aee6c816e03cc2085aec3267c634cc05c87824b.intermediate
15:17:45 if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
15:17:45 Makefile:490: recipe for target 'run-ci' failed

https://ci.nodejs.org/job/node-test-commit-freebsd/20589/nodes=freebsd11-x64/console

But I now know how to solve this. There's a solution possible by fixing GYP/some-gyp-files.

refack added a commit to refack/node that referenced this issue Oct 20, 2018
All the sub targets have internal parallelism, so no performance loss.
Also `make` doesn't to a good enough job of combining the output
streams, or eliminate races.

PR-URL: nodejs#23733
Fixes: nodejs#22006
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Matheus Marchini <[email protected]>
jasnell pushed a commit that referenced this issue Oct 21, 2018
All the sub targets have internal parallelism, so no performance loss.
Also `make` doesn't to a good enough job of combining the output
streams, or eliminate races.

PR-URL: #23733
Fixes: #22006
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Matheus Marchini <[email protected]>
MylesBorins pushed a commit that referenced this issue Oct 29, 2018
All the sub targets have internal parallelism, so no performance loss.
Also `make` doesn't to a good enough job of combining the output
streams, or eliminate races.

PR-URL: #23733
Fixes: #22006
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Matheus Marchini <[email protected]>
@refack
Copy link
Contributor Author

refack commented Jan 22, 2019

Issue is still open, and can impact all platforms using make

@targos
Copy link
Member

targos commented Dec 27, 2020

Haven't seen this in a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests.
Projects
None yet
6 participants