Cluster 2.0 #2038

AndreasMadsen · 2011-11-07T15:22:12Z

In 0.6.x the cluster module was just a small extension on the node_process.fork() function. The propose of this pull request is to make the cluster module easier to setup, without removing the basic functionality. It also add a lot of new functionality there make it possible for userland plugins to interact with the cluster module.

Documentation:
The documentation can be found here.

This request fix the following issues:

issue node clusters cannot be killed #2060 – when the master gets an kill/quit signal it will do the same with the workers.
issue cluster should be silent #2073 – made cluster silent.
issue How to end cluster's worker process gracefully? #2088 – added method for gracefull worker and master shutdown.

This request contain the following changes:

Added .eachWorker(fn) method.
Allow workers to be another file, using .setupMaster()
Allow parsing arguments to workers, using .fork([env])
Internal messages won't be send to the message event
Allow workers to commit suicide, using worker.destroy()
Moved process from the child_process.fork to cluster.fork().process.
Make the API in master and workers equal using an internal new Worker() object.
Easy acces to worker details, inside the worker.
Added .disconnect() method to make a gracefully shutdown
Added events: fork, listening, disconnect
Made cluster silent
You can use SIGTERM, SIGINT and SIGQUIT on a worker
Add cluster.workerOnline property to get how many workers there are online
Support SIGTERM, SIGINT, SIGQUIT and SIGCHLD on master.
Create a cluster.disconnect() method.
Create a cluster.destroy() method.
cluster.isWorker and cluster.isMaster is now protected
Kill all workers the moment the master exit
Allow echo callback from both worker and master
Allow echo callback to be used by userland
Added a workerID, it's a number there will be reused when the worker spawn/respawn.
Added a uniqueID, if is a unique number there will change for each spawn/respawn.
Throw error when conflict between .autoFork() and manual .fork().
Prevent and detect respawn infinite loops, caused by errors in workers.
Emit a citicalError event when a respawn infinite loops is detected.
Added startup property to worker object.
Added a silent option to the setupMaster to prevent output from workers to be showen in the master as outout.
Add a zero-downtime restart method.
When getting a SIGHUP signal the cluster will restart graceful.
Added settings object to the cluster
Added setup to cluster, this will emit when setupMaster execute.
You now kill worker using destroy not kill
The worker event kill is changed to death
Updated and improve documentation
Added testcases (a lot of testcases)

This changes was made in other modules:

To support silent option, the child_process.fork() has been updated to take silent as a option in the options argument. This is a very small change.

This changes will be included in the near future:

Fix issue UDP4 socket fails to bind with clustering #2194 – But before that UDP port sharing must be suppored
Allow that the number of CPUs can change. This depend on issue Better multiply CPU support #2265

…ster, added options argument; HandleWorkerMessage: emit events, echo allwas queryID if required, added listening command; fork: give each worker a protected workerID; queryMaster: made public, renamed to worker.send; _getServer: send a listening command to master.

AndreasMadsen · 2011-11-07T17:26:30Z

I'm in the process of writing the documentation.

AndreasMadsen · 2011-11-07T19:31:21Z

I finished documenting the changes i have made.
However in the documentation i have:

renamed cluster.worker.send to cluster.worker.respond
added a message event cluster.worker.on('message')
changed the behavouore cluster.fork().on('message')

This changes has not yet been made in the cluster.js native module.

AndreasMadsen · 2011-11-08T09:49:21Z

I has added @visionmedia commit 58b558a to this pull request.

AndreasMadsen · 2011-11-08T11:03:37Z

I am considering to make the Worker class useful to both master and worker. In that way the API will almost be the same for the master and worker.

In master
A Worker object could in the master be obtained by cluster.workers[id].

Its message event will be emitted when it receive non-internal data form the worker.
Its kill method would set a suicide state, and kill the process
Its send method would send a message, and call a callback when a echo is received from the worker.

This is all ready done.

In worker
The Worker object could in the worker be obtained by cluster.worker.

Its message event will be emitted when it receive non-internal data form the master.
Its kill method would send a suicide state to the master, when the callback is called it will kill itself
Its send (currently respond) method would send a message to the master, and call a callback when a echo is received from the master.

AndreasMadsen · 2011-11-08T11:15:40Z

I noticed that my commits don't fit the make jslint should i fix that or do it in another pull request when this is pulled?

…aster and worker

AndreasMadsen · 2011-11-08T20:35:12Z

I have updated the pull request so both the worker and master are using the Worker class.
This had the side effekt that the code got a lot simpler.
At 5c1d481 I was worried about the code would some jiberish, since i had to make individual changes for both worker and master, but I do not this this is a problem any more.

I do not have any more plans for changes, but i would like to discuss:

Why do new workers need to have a new id, when they can just use the one from the dead worker.
Preventing evil error wheels.
The finalized event (when all workers are listening).
Additional testcases, there was not a single one before my commit

Update
I will try to fix the issues there are reported

AndreasMadsen · 2011-11-14T14:11:28Z

As proposed in issue #2088 i have added .disconnect() method in commit 25f5793. It is extremely difficult to test if the worker do exit when/if all connections stops, since there are no IPC.

tomyan · 2012-01-02T21:20:17Z

Really liking the look of these changes :-) Wondering if it will be possible to set the exec option per worker that's spawned. I have a server that behaves differently depending on command line options. I'd like to be able to exercise these different behaviours from a single set of tests. I could use child_process.fork to run the server multiple times, but maybe it would be more convenient to be able to run workers that do different jobs from the same master?

Thanks

Tom

AndreasMadsen · 2012-01-02T21:29:56Z

@tomyan sure this is possible :)
However I'm not sure what the purpose is (could you write a simple example or API change suggestion).

Please note: This module is not made for easy testing. And too have different workers running do not mak much sense, since there is no way to know how the OS will balance this and then clients will be treaded differently :/

Can you not use the env option in .fork(env) or use child_process.fork to spawn not a worker but a master/cluster.

I'm open to suggestion but I do not see the purpose of this yet.

…especially in case of a cirtical error

AndreasMadsen · 2012-01-05T18:49:16Z

There was some errors related to when there is a critical error in the worker and autoFork is on. When the error was detected it tried to disconnect all workers (good) but did it multiply times on the same worker, resulting on some also random errors. Also the critical error handling is tricky but I think the latest patch to a good job.

AndreasMadsen · 2012-01-24T08:01:38Z

That was a difficult merge!

isaacs · 2012-03-19T21:42:46Z

Closing this. Everything has been split up into separate reqs.

AndreasMadsen · 2012-03-20T06:42:44Z

@isaacs true, but could you also close this #2060

AndreasMadsen added 5 commits November 7, 2011 16:01

send tcp self to cluster._getServer

256ec65

renamed function from _startWorker to _setupWorker

0afde5b

added testcase for cluster events

50820e7

Documented cluster events

c97c232

AndreasMadsen added 6 commits November 7, 2011 18:49

Documented .fork and .autoFork

8405853

Documented cluster.worker.send and cluster.worker.on('message');

c96a8fb

Ups, removed event title

e3d4d9c

Documented worker object

5fea68b

Documented setupMaster

5a2b365

Now fit the documentation pattern

492a261

AndreasMadsen added 4 commits November 8, 2011 10:26

Created a worker class

a37e413

rearrange log

1681943

sync API with documentation

5c1d481

allow env object to be passed to cluster.fork()

f0ba47e

AndreasMadsen added 2 commits November 8, 2011 21:25

fit test after removing chain

4f9c938

Dramatically simplifyed code, and made Worker class usefull in both m…

c2a074e

…aster and worker

AndreasMadsen added 5 commits November 8, 2011 21:47

fixed bug, and set state to dead when worker die

d79ccef

updated documentation

29647e8

added disconnect method and event

25f5793

testcase for disconnect method, could be better

1f1153f

ups, reactivated uncaughtException

6192fb4

jslint

b766a21

AndreasMadsen added 6 commits December 30, 2011 19:37

Merge remote-tracking branch 'upstream/master'

ec18b6f

fix spaceing

aca3847

update worker.send doc

8a257c3

fix spaceing

9a81b59

sync code with step 4

7cbf805

sync testcases with step 4

57d16ff

AndreasMadsen added 2 commits January 5, 2012 11:33

merged 'master' intro AndreasMadsen/master

27137d4

sync with step 4

b38bdc8

AndreasMadsen mentioned this pull request Jan 5, 2012

Cluster 2.0 – step 4+ : fix spelling and _queryEcho overkill #2465

Closed

AndreasMadsen added 2 commits January 5, 2012 17:20

debugging

f57f1e4

refactor much of the cluster.disconnect handling to fix rare issues, …

f4d11fb

…especially in case of a cirtical error

AndreasMadsen mentioned this pull request Jan 5, 2012

Clean up in Child_process module #2463

Closed

remove outdated testcase

29ccb75

AndreasMadsen mentioned this pull request Jan 5, 2012

Cluster 2.0 – step 5 : Make setupMaster public API #2470

Closed

This was referenced Jan 17, 2012

Crash if exception was thrown in cluster "master". #2556

Closed

Add IPC disconnect method when using fork #2591

Closed

merged upstream/master intro this branch

eebab94

AndreasMadsen added 3 commits January 31, 2012 20:10

Merge remote-tracking branch 'upstream/master'

987f07c

cleanup after adding propper disconnect support in child_process

6d8231d

Merge remote-tracking branch 'upstream/master'

1e4b165

This was referenced Feb 12, 2012

Cluster 2.0 - step 6 : Add disconnect methods to graceful shutdown #2740

Closed

cluster2: cleanup #2661

Closed

AndreasMadsen mentioned this pull request Mar 11, 2012

cluster 2.0 - step 7: kill workers when master dies #2908

Closed

isaacs closed this Mar 19, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster 2.0 #2038

Cluster 2.0 #2038

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 14, 2011

tomyan commented Jan 2, 2012

AndreasMadsen commented Jan 2, 2012

AndreasMadsen commented Jan 5, 2012

AndreasMadsen commented Jan 24, 2012

isaacs commented Mar 19, 2012

AndreasMadsen commented Mar 20, 2012

Cluster 2.0 #2038

Cluster 2.0 #2038

Conversation

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 7, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 8, 2011

AndreasMadsen commented Nov 14, 2011

tomyan commented Jan 2, 2012

AndreasMadsen commented Jan 2, 2012

AndreasMadsen commented Jan 5, 2012

AndreasMadsen commented Jan 24, 2012

isaacs commented Mar 19, 2012

AndreasMadsen commented Mar 20, 2012