-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: flakiness in building addons with make
#22006
Comments
Since it's happening near the end of the addons build (either last addon or next to last in the directory) it might be bad interaction between |
But I spotted one that is "far" from target start or finish ( Looking at the surrounding make targets being built it seems like it is sandwiched between relinkin of the binary. This might be because we're recursively spawning
|
Since this is first time I've seen this (https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1011/19994/ canary build), my current working assumption is that this is due to some new dependency cycle introduced by V8 6.4 |
Are we seeing this with any other pull request? |
@rubys yes, for example the last link I posted (job 19994) was run 2 days ago on the v8 canary branch. |
Back home. Did 8 parallel clones of node from github on a macOS High Sierra 10.13.5 machine (4 GHz Intel Core i7 iMac late 2014). Ran |
Here's the specs on one of the machines where we're seeing failures: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz |
@rubys, I think the problem is more pronounced with the Lines 483 to 484 in e3a4702
First the build-ci sub-target actually builds a node binary. Then the recursive call to test-ci start to in parallel re-build the binary, and bootstrap the testing. Those two task unsurprisingly clash.
That following was apparently just a ninja bug with long paths.
|
The dependency on antlr4 was removed in a recent V8 commit. Maybe backporting it could help? |
|
TL;DR: Possible courses of action (non-exclusive):
Analysis:
Commentary:
Possible theory: the second link takes a while, during which Further searches in consoleText finds two occurrences of the following text:
The second rebuilds a number of targets. This is unsurprising given there is a number of steps marked with diff --git a/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py b/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
index c70162a2a4..886ffbed7f 100755
--- a/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
+++ b/deps/v8/third_party/inspector_protocol/CheckProtocolCompatibility.py
@@ -473,7 +473,7 @@ def main():
if arg_options.stamp:
with open(arg_options.stamp, 'a') as _:
- pass
+ os.utime(arg_options.stamp, None)
if __name__ == '__main__':
sys.exit(main()) Related: this is an excerpt from the root level
|
Update: also from the
Apparently, the build of the addons is supposed to use |
Testing the following change now: diff --git a/Makefile b/Makefile
index c3da8e88bf..4097bf67b0 100644
--- a/Makefile
+++ b/Makefile
@@ -318,7 +318,7 @@ benchmark/napi/function_args/build/Release/binding.node: all \
# it always triggers a rebuild due to it being a .PHONY rule. See the comment
# near the build-addons rule for more background.
test/gc/build/Release/binding.node: test/gc/binding.cc test/gc/binding.gyp
- $(NODE) deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
+ ./node deps/npm/node_modules/node-gyp/bin/node-gyp rebuild \
--python="$(PYTHON)" \
--directory="$(shell pwd)/test/gc" \
--nodedir="$(shell pwd)"
@@ -352,7 +352,7 @@ test/addons/.buildstamp: config.gypi \
src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
test/addons/.docbuildstamp
env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
- npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+ npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
"$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" "$$PWD/test/addons"
touch $@
@@ -382,7 +382,7 @@ test/addons-napi/.buildstamp: config.gypi \
src/node.h src/node_buffer.h src/node_object_wrap.h src/node_version.h \
src/node_api.h src/node_api_types.h
env npm_config_loglevel=$(LOGLEVEL) npm_config_nodedir="$$PWD" \
- npm_config_python="$(PYTHON)" $(NODE) "$$PWD/tools/build-addons" \
+ npm_config_python="$(PYTHON)" ./node "$$PWD/tools/build-addons" \
"$$PWD/deps/npm/node_modules/node-gyp/bin/node-gyp.js" \
"$$PWD/test/addons-napi"
touch $@ |
Test passes, but I'm skeptical that it fixes the problem. |
Unless Lines 42 to 47 in afc5636
|
I'm puzzled by the specificity of the error message:
I've verified that the This still feels like a kludge. Probably worth doing, but probably worthwhile to ensure that runs of make on up to date outputs doesn't do anything, and/or outputs are produced at most once during the course of the make and at at a time guaranteed to be before usage. |
If you're referring to the full qualified path to Line 34 in afc5636
|
@richardlau in that case, the problem is worse. The executable is being deleted while running. Assuming that has no ill-effects, the hard links workaround I described above won't work; instead the linker needs to produce output at a new location, and then that new file needs to be renamed upon successful completion. I've provided a pull request that does that. I still view it as a workaround, and the root cause should be addressed. |
I agree with what you wrote above, we need to make sure |
@refack be aware that that would require changing not only v8, but also v8's third party dependencies. See Net: the fix itself is straightforward, but the process to push it upstream is uncertain and with an outcome unknown. Alternatively, maintaining our own patches creates its own headache. I'm certainly willing to work towards that as a goal, but meanwhile I do feel that we need immediate relief from the intermittent failures that we are seeing. |
Since this causes the nightlies to sometimes not build the headers it is also causing the n-api jobs to fail: https://ci.nodejs.org/view/x%20-%20Abi%20stable%20module%20API/job/node-test-node-addon-api/488/ |
Unless there is a straightforward safe fix I think we should back out the v8 upgrade until we can find a solution. |
Just found a similar on macOS cousin freebsd11-x64
https://ci.nodejs.org/job/node-test-commit-freebsd/20589/nodes=freebsd11-x64/console But I now know how to solve this. There's a solution possible by fixing GYP/some-gyp-files. |
All the sub targets have internal parallelism, so no performance loss. Also `make` doesn't to a good enough job of combining the output streams, or eliminate races. PR-URL: nodejs#23733 Fixes: nodejs#22006 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Matheus Marchini <[email protected]>
All the sub targets have internal parallelism, so no performance loss. Also `make` doesn't to a good enough job of combining the output streams, or eliminate races. PR-URL: #23733 Fixes: #22006 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Matheus Marchini <[email protected]>
All the sub targets have internal parallelism, so no performance loss. Also `make` doesn't to a good enough job of combining the output streams, or eliminate races. PR-URL: #23733 Fixes: #22006 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Matheus Marchini <[email protected]>
Issue is still open, and can impact all platforms using |
make
Haven't seen this in a long time. |
master
macOS
test phase fails around building the test addons with
ENOENT
, probably while looking for the node binary, while it's clearly built and at the correct place since previous steps use is (e.g. node-gyp)This happened either during the
test/addons/.buildstamp
targethttps://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1012/20041/console
Or right after
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1012/20042/console
The text was updated successfully, but these errors were encountered: