Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: wait for all servers closing before disconnect #1400

Closed
wants to merge 1 commit into from

Conversation

Olegas
Copy link
Contributor

@Olegas Olegas commented Apr 11, 2015

Fix for #1305
Before this, cluster bahaves not the way it is docummented
Then disconnect is triggered, worker must wait for every server is closed
before doing disconnect actually.

See test case and discussion in the above mentioned issue

@Olegas
Copy link
Contributor Author

Olegas commented Apr 11, 2015

/cc @bnoordhuis

@mscdex mscdex added the cluster Issues and PRs related to the cluster subsystem. label Apr 11, 2015
// Wait for any data, then close connection
socket.on('data', socket.end.bind(socket));

}).listen(common.PORT, '127.0.0.1');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change 127.0.0.1 to common.localhostIPv4?

@Olegas
Copy link
Contributor Author

Olegas commented Apr 11, 2015

@brendanashworth done

@@ -625,12 +625,26 @@ function workerInit() {

Worker.prototype.disconnect = function() {
this.suicide = true;
var waitingHandles = 0;

function check() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use a more descriptive name. Maybe checkRemainingHandles.

@Olegas
Copy link
Contributor Author

Olegas commented Apr 13, 2015

@cjihrig tests are the same, except scheduling policy. But there are note in the ...schedrr test. I don't think it is a bug, but it is a consequence of SHCED_RR implementation.

@Fishrock123
Copy link
Contributor

@cjihrig is this (still?) lgty?

@cjihrig
Copy link
Contributor

cjihrig commented May 30, 2015

@Fishrock123 the code changes themselves look good to me. I don't love the tests. Seems like a single test should be adequate, and it would be nice to simplify the test a bit if possible.

@Olegas
Copy link
Contributor Author

Olegas commented May 31, 2015

@cjihrig I think two tests are required to ensure both scheduling policies are working correct.

@cjihrig
Copy link
Contributor

cjihrig commented Jun 1, 2015

I don't think the code being tested here is really reliant on the scheduling policy. Granted, the cluster module's behavior relies on the scheduling policy, but we don't need to duplicate every test to run under both policies. Maybe I'm missing something, but your changes to Worker.prototype.disconnect() only involve looping over a collection of handles. There are (or at least should be) other tests in place to ensure that collection of handles is generated correctly.

@sam-github
Copy link
Contributor

fwiw, I agree with @cjihrig about the tests.

As far as this change goes, it looks OK to me, though I haven't run or tested it. In particular, it looks like the ProgressTracker was lost in the cluster rewrite: https://github.com/joyent/node/blob/v0.10.38-release/lib/cluster.js#L520-L528, and this PR brings something like it back.

@Olegas
Copy link
Contributor Author

Olegas commented Jun 2, 2015

@sam-github @cjihrig now there are only one test. If it is ok now I'll do rebase/squash to a single commit.

@sam-github
Copy link
Contributor

Please squash.

I just ran the test _without_ your changes, and the test passes when it should not. Can you check this?

Fix for iojs/io,js#1305
Before this, cluster bahaves not the way it is docummented
Then disconnect is triggered, worker must wait for every server is closed
before doing disconnect actually.

See test case and discussion in the above mentioned issue
@Olegas Olegas force-pushed the disconnect-waits-for-server-close branch from a923192 to 35854ae Compare June 2, 2015 20:56
@Olegas
Copy link
Contributor Author

Olegas commented Jun 2, 2015

@sam-github commits squashed, test fixed.

});

process.once('exit', function() {
assert.ok(checks.disconnectedOnClientsEnd, 'The worker disconnected before all clients are ended');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run make lint, this line is too long.

@sam-github
Copy link
Contributor

I would just do the lint fixups and merge, but I'd like you to confirm that I'm not missing some subtlety about the exit event testing.

@Olegas
Copy link
Contributor Author

Olegas commented Jun 9, 2015

@sam-github Yes, exit event testing is not necessary. It is copy-paste from another cluster test (
I'll fix everything today evening and add docs too

sam-github pushed a commit to sam-github/node that referenced this pull request Jun 9, 2015
Before this, cluster behaves not the way it is documented.  When
disconnect is triggered, worker must wait for every server is closed
before doing disconnect actually.

Reviewed-By: Sam Roberts <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
PR-URL: nodejs#1400
Fixes: nodejs#1305
@sam-github
Copy link
Contributor

Cleaned up and landed in 9c0a1b8, thanks @Olegas

@Olegas
Copy link
Contributor Author

Olegas commented Jun 9, 2015

@sam-github I'm sorry, but tests now broken. First setTimeout is required.

If I run the test without my fixes, it will run forever. But if I set scheduling policy to SCHED_NONE test will fail (as expected)

This is because of default SCHED_RR scheduling policy on Linux/Mac/etc...

Then scheduling policy is SCHED_RR, connection is accepted inside of master and then passed to child process. If child process disconnects BEFORE response handler is completed, connection will not get accepted. So, child process never receives data and them keep running forever.

@Fishrock123
Copy link
Contributor

@sam-github I think this probably needed a CI run, you have access to the CI, right?

Anyways, here's a run on current master: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/788/

}
process.disconnect();

if (waitingHandles === 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this block necessary? Won't checkRemainingHandles itself handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have no any handles yet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks for clarifying :-)

@rvagg rvagg mentioned this pull request Jun 11, 2015
sam-github added a commit that referenced this pull request Jun 12, 2015
Wait for data to arrive from worker before doing a disconnect. Without
this, whether the disconnect arrives at the worker before the master
accepts and forwards the connection descriptor to the worker is a race.

Reviewed-By: Sam Roberts <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
PR-URL: #1953
Fixes: #1933
Fixes: #1400
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants