Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: ignore queryServer msgs on disconnection #4465

Closed
wants to merge 1 commit into from

Conversation

santigimeno
Copy link
Member

It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in test-cluster-disconnect-race on FreeBSD
and OS X:

assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: Resource leak detected.
    at removeWorker (cluster.js:321:7)
    at ChildProcess.<anonymous> (cluster.js:356:9)
    at ChildProcess.g (events.js:264:16)
    at emitTwo (events.js:88:13)
    at ChildProcess.emit (events.js:173:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

The problem is that the worker2.disconnect is being called on the
master before the queryServer is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when removeWorker is called from the exit handler, there are no
workers left, but one handle, thus the AssertionError.

Modify test-cluster-disconnect-race to check there are no leaks on
exit.

@mscdex mscdex added the cluster Issues and PRs related to the cluster subsystem. label Dec 29, 2015
@mscdex
Copy link
Contributor

mscdex commented Dec 29, 2015

/cc @Trott

@Trott
Copy link
Member

Trott commented Dec 29, 2015

@Trott
Copy link
Member

Trott commented Dec 29, 2015

The altered test doesn't fail with current master. Is there an easy way to devise a test that will fail reliably without this change?

@Trott
Copy link
Member

Trott commented Dec 29, 2015

This is not necessarily a bad thing, but this change greatly increases the frequency of EPIPE flakiness in test-cluster-shared-leak.js on Windows. There's a PR in to swallow those errors anyway so maybe it's not a problem at all. But, FYI...

Stress test against master (should fail here and there): https://ci.nodejs.org/job/node-stress-single-test/257/nodes=win2012r2/console

Stress test with this change (fails a lot more): https://ci.nodejs.org/job/node-stress-single-test/260/nodes=win2012r2/console

(By the way, that PR could use a LGTM. Hint, hint!)

@santigimeno
Copy link
Member Author

@Trott I've updated the test so it always fails in my OS X computers without this change. Thanks for the review!

@santigimeno
Copy link
Member Author

Finally I have not modified test-cluster-disconnect-race but created a separate test based on it (but creating lots of it), that fails consistently in Debian Jessie 64 and OS X.

}));

const cpus = os.cpus().length;
const tries = cpus * 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe move these declarations up above the worker1.on('message', ...) because someone reading the test will probably expect tries to be defined before it is used on line 22?

@Trott
Copy link
Member

Trott commented Dec 31, 2015

LGTM. One minor nit that you can ignore if you want.

CI: https://ci.nodejs.org/job/node-test-commit/1587/

Nice work!

@Trott
Copy link
Member

Trott commented Dec 31, 2015

The test is timing out on SmartOS (which is based on Solaris):

@jbergstroem Might this have a cause that lies elsewhere? I feel like there's been an uptick in SmartOS issues lately, but I might be imagining it...

@jbergstroem
Copy link
Member

@Trott the "only" changes made recently that would affect all tests are yesterdays land of where sockets are written and today's change of where we write the temporary directory ($HOME/node-tmp).

@Trott
Copy link
Member

Trott commented Dec 31, 2015

So the test is probably legitimately timing out (probably hanging) on SmartOS and only SmartOS. That's kind of a bummer. But that's why we test...

@jasnell
Copy link
Member

jasnell commented Jan 4, 2016

LGTM

@Trott
Copy link
Member

Trott commented Jan 5, 2016

This still needs a tweak so the test doesn't time out on SmartOS on CI. (Just putting this note here so no one merges this without realizing that or something.)

It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`:

```
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: Resource leak detected.
    at removeWorker (cluster.js:321:7)
    at ChildProcess.<anonymous> (cluster.js:356:9)
    at ChildProcess.g (events.js:264:16)
    at emitTwo (events.js:88:13)
    at ChildProcess.emit (events.js:173:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
```

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.
@santigimeno
Copy link
Member Author

@Trott I've updated the PR to limit the number of workers. Let's see if now it passes in all platforms. Thanks

@jasnell
Copy link
Member

jasnell commented Jan 7, 2016

@Trott
Copy link
Member

Trott commented Jan 8, 2016

CI is green!

@jbergstroem
Copy link
Member

✅ LGTM

Trott pushed a commit to Trott/io.js that referenced this pull request Jan 8, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: nodejs#4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@Trott
Copy link
Member

Trott commented Jan 8, 2016

Landed in f9f1dd9

@Trott Trott closed this Jan 8, 2016
MylesBorins pushed a commit that referenced this pull request Jan 11, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: #4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Jan 11, 2016
MylesBorins pushed a commit that referenced this pull request Jan 12, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: #4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jan 28, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: #4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: #4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Feb 11, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: nodejs#4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Feb 11, 2016
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Feb 15, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: nodejs#4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
scovetta pushed a commit to scovetta/node that referenced this pull request Apr 2, 2016
It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in `test-cluster-disconnect-race` on `FreeBSD`
and `OS X`.

The problem is that the `worker2.disconnect` is being called on the
master before the `queryServer` is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when `removeWorker` is called from the `exit` handler, there are no
workers left, but one handle, thus the `AssertionError`.

Add a new `test/sequential/test-cluster-disconnect-leak` based on
`test-cluster-disconnect-race` that creates lots of workers and fails
consistently without this patch.

PR-URL: nodejs#4465
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants