test: improve test-cluster-disconnect-suicide-race #4739

Trott · 2016-01-18T08:07:33Z

Previously, test-cluster-disconnect-suicide-race had two issues:

Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...
Non-determinism: The test seems to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the disconnect
event will fire on a subsequent tick. So we check for that and the test
still fails when the fix is not in the code base and succeeds when it
is.

Advantages of this approach include:

The test runs much faster.
The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

cc @santigimeno @iwuzhere

@santigimeno

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 cc @santigimeno @iwuzhere

Trott · 2016-01-18T08:18:54Z

CI with this new test but the code to cluster.js that was commited with the old test reverted. (In other words, new test, old code with the bug, does the test still fail? Yes it does!):

https://ci.nodejs.org/job/node-test-commit/1811/

Red is good in this case.

Trott · 2016-01-18T08:19:57Z

CI for this PR: https://ci.nodejs.org/job/node-test-pull-request/1286/

bnoordhuis · 2016-01-18T08:50:58Z

test/sequential/test-cluster-disconnect-suicide-race.js

-  cluster.worker[process.env.action]();
+  cluster.on('exit', (worker, code) => {
+    if (code)
+      common.fail('worker exited with error');


Why not assert.notEqual(code, 0)?

bnoordhuis · 2016-01-18T08:54:31Z

LGTM with a suggestion. Unrelated failures on the OS X buildbot.

santigimeno · 2016-01-18T10:24:29Z

LGTM. Thanks again!

jasnell · 2016-01-18T16:38:48Z

CI failure on OSX with test-cluster-disconnect-leak => https://ci.nodejs.org/job/node-test-commit-osx/1847/nodes=osx1010/console
I wouldn't think that this change could have affected that test but given that they're testing the same area, it would be best to confirm.

Trott · 2016-01-18T16:45:49Z

OS X issue does indeed look unrelated to this (and would be partially or perhaps even completely fixed by #4736). But, you know, just to make sure there's not some weird interaction that wouldn't be suspected (because when has that ever happened, amirite?!), another CI run:

https://ci.nodejs.org/job/node-test-commit/1822/

All green! \o/

Although, uh, now I wish I had thrown in Ben's suggested change before running the test...

OK, so, now with that change:

https://ci.nodejs.org/job/node-test-commit/1825/

All green again! \o/

jasnell · 2016-01-18T16:58:36Z

LGTM :-)

Trott · 2016-01-19T23:43:10Z

Landed in 44aba1a

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. PR-URL: #4739 Ref: #4674 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. PR-URL: #4739 Ref: #4674 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: #4736 Ref: #4739 PR-URL: #4774 Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. PR-URL: #4739 Ref: #4674 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. PR-URL: nodejs#4739 Ref: nodejs#4674 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: #4736 Ref: #4739 PR-URL: #4774 Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Previously, test-cluster-disconnect-suicide-race had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test seems to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the disconnect event will fire on a subsequent tick. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test should be reliable on any new platform regardless of CPU and RAM. PR-URL: nodejs#4739 Ref: nodejs#4674 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Trott added cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests. lts-watch-v4.x labels Jan 18, 2016

bnoordhuis reviewed Jan 18, 2016
View reviewed changes

fixup: improve assert

898a226

Trott closed this Jan 19, 2016

Trott mentioned this pull request Jan 20, 2016

test: move cluster tests to parallel #4774

Closed

Trott added a commit to Trott/io.js that referenced this pull request Jan 20, 2016

test: move cluster tests to parallel

097ea9f

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739

MylesBorins added land-on-v4.x and removed lts-watch-v4.x labels Jan 28, 2016

rvagg mentioned this pull request Feb 8, 2016

Release proposal: v5.5.1 (Stable) #5141

Closed

MylesBorins mentioned this pull request Feb 11, 2016

V4.3.1 proposal #5200

Merged

Trott deleted the improve-race-test branch January 13, 2022 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: improve test-cluster-disconnect-suicide-race #4739

test: improve test-cluster-disconnect-suicide-race #4739

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

bnoordhuis Jan 18, 2016

bnoordhuis commented Jan 18, 2016

santigimeno commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 19, 2016

test: improve test-cluster-disconnect-suicide-race #4739

test: improve test-cluster-disconnect-suicide-race #4739

Conversation

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

bnoordhuis Jan 18, 2016

Choose a reason for hiding this comment

bnoordhuis commented Jan 18, 2016

santigimeno commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 19, 2016