-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-child-process-fork-net #21012
test: fix flaky test-child-process-fork-net #21012
Conversation
CI https://ci.nodejs.org/job/node-test-pull-request/15139/ OSX Failure (looks like a flaky test):
Windows:
|
Update: the test is still failing... |
072ce71
to
46371ce
Compare
Looking at the test case and the failing stack (from original issue tracker):
I believe the server error handler may not be able to handle this, unless the child process percolates it to server, which I don't think would be the case. So probably we need a completion handler to the server close that filters --- a/test/parallel/test-child-process-fork-net.js
+++ b/test/parallel/test-child-process-fork-net.js
@@ -175,7 +175,10 @@ if (process.argv[2] === 'child') {
connect.on('close', function() {
console.log('CLIENT: closed');
assert.strictEqual(store, 'echo');
- server.close();
+ server.close((err) => {
+ if (err && err.code !== 'EPIPE')
+ throw err;
+ });
});
});
} |
Patch inspired on 397eceb to fix flakyness on test-child-process-fork-net. Ref: nodejs#20973
46371ce
to
bd3dc9d
Compare
@gireeshpunathil thanks, I updated the PR with your suggestion. Yet-another-CI: https://ci.nodejs.org/job/node-test-pull-request/15148/ |
`flaky-test-child-process-fork-net` has been failing constantly for the past few days, and all solutions suggestes so far were didn't work. Marking it as faky while the issue is not fixed. Ref: nodejs#21012 Ref: nodejs#20973 Ref: nodejs#20973
`flaky-test-child-process-fork-net` has been failing constantly for the past few days, and all solutions suggestes so far were didn't work. Marking it as faky while the issue is not fixed. Ref: #21012 Ref: #20973 Ref: #20973 PR-URL: #21018 Refs: #21012 Refs: #20973 Refs: #20973 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Anatoli Papirovski <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trivikram Kamat <[email protected]>
I think I found the issue: the error is being thrown by the ChildProcess inside the server, and there's no way of catching errors thrown by it (which IMO is a bug). The patch below fixes the problem with this test (stress-test on test-child-process-fork-net with the patch applied: https://ci.nodejs.org/job/node-stress-single-test/1884/nodes=win2016-1p-vs2017/) but breaks test-child-process-fork-net2. There might be a better way to propagate errors thrown by the inner ChildProcess, but I'm stuck right now. /cc @nodejs/streams do you have any suggestions here? diff --git a/lib/internal/socket_list.js b/lib/internal/socket_list.js
index 55077af1305..32615c4b8cb 100644
--- a/lib/internal/socket_list.js
+++ b/lib/internal/socket_list.js
@@ -17,7 +17,7 @@ class SocketListSend extends EventEmitter {
var self = this;
if (!this.child.connected) return onclose();
- this.child.send(msg);
+ this.child.send(msg, callback);
function onclose() {
self.child.removeListener('internalMessage', onreply); |
Agree with @mmarchini . I too spent a lot of time on this, and have these observations. three issues:
This might have been discussed in the past (but I haven't seen one): What is the specification for an asynchronous API in terms of the control flow in its return path? This is my current understanding:
Internal functions that support async APIs should follow only last two options. In any case, a program should not be directly handling internal faults. Given the premise of tests as to catch issues in the code, I recommend not to workaround the test instead leaave it as flaky while we address the code issue. |
`flaky-test-child-process-fork-net` has been failing constantly for the past few days, and all solutions suggestes so far were didn't work. Marking it as faky while the issue is not fixed. Ref: #21012 Ref: #20973 Ref: #20973 PR-URL: #21018 Refs: #21012 Refs: #20973 Refs: #20973 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Jon Moss <[email protected]> Reviewed-By: Anatoli Papirovski <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Trivikram Kamat <[email protected]>
Pinging @AndreasMadsen (original author of the test in dceebbfa31a). Probably OK to split the test into two files, one to test the server object and one to test the socket object? Thoughts on what's going on here? |
Split test-child-process-fork-net into test-child-process-fork-net-server and test-child-process-fork-net-socket. Rename test-child-process-fork-net2.js to test-child-process-fork-net.js. Refs: nodejs#21012
@targos and @apapirovski did a bunch of debugging and experimentation on this today. Here's what they've found:
This feels like a Node.js bug on Windows. It seems like a race condition. We swallow Calling in the |
More thoughts on |
Ping @addaleax |
node/lib/internal/socket_list.js Lines 40 to 43 in f86e5fc
I see that the callback that we internally use above is Lines 1620 to 1625 in 65b17d4
It's keeping track of the number of worker processes. Somehow, one is going away without it noticing? But only on this one Windows variant and only sometimes? |
Perhaps this is a clue too? The test has this listener: node/test/parallel/test-child-process-fork-net.js Lines 152 to 154 in 65b17d4
It never fires on successful runs or failed runs. It just never fires. |
Split test-child-process-fork-net into test-child-process-fork-net-server and test-child-process-fork-net-socket. Rename test-child-process-fork-net2.js to test-child-process-fork-net.js. Refs: #21012 PR-URL: #21095 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Trivikram Kamat <[email protected]>
@mmarchini Can you rebase this? |
@Trott I strongly feel this is a Node.js core bug and should be fixed as so instead of writing yet another workaround for |
This is fixed by #21108 so we can just close. |
Split test-child-process-fork-net into test-child-process-fork-net-server and test-child-process-fork-net-socket. Rename test-child-process-fork-net2.js to test-child-process-fork-net.js. Refs: #21012 PR-URL: #21095 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Trivikram Kamat <[email protected]>
Patch inspired on 397eceb to fix
flakyness on test-child-process-fork-net.
Ref: #20973
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes