-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto launch of additional workers depending on number of cores at a host. #9202
Conversation
|
No. Usually the master process launches all workers. In this case, the first worker on a host launches the additional workers. Open to suggestions for a better name - |
This sounds good (output from workers is a nice catch I missed in my version), however, I think the same could be accomplished without an additional keyword like :copy_worker. Starting a worker with a combination of -p N --worker (via :exeflags= |
I'll give it a shot.
We should also probably make the first parameter to addprocs optional, since the actual number of workers started will depend on the number of cores detected dynamically. |
What about the situation where one can detect the number of cores already in use (through LSF etc.), say U cores and then use the rest of the cores for Julia workers via -p -U, to play nice with other existing users of a host. When used in conjunction with --worker it could subtract one always, or like you propose. |
|
OK, fine. Then scripts for e.g., LSF should not only query the "used" cores but also "total" core counts, to calculate the "remaining" cores. |
I have the same confusion as @ViralBShah --- there isn't really any cloning or copying, so a different name should be used. Maybe "multiplicity" or just "count". More importantly, it seems like this could be integrated more deeply. Instead of just adding a keyword argument, we could make arrays of |
For starters, I'll do away with the new keyword and go with @kourzanov 's suggestion of reusing Yes, we can integrate the |
Making the core representation (host, count) seems natural. Another consideration is how this will in the future interact with threading. Is that yet another level of hierarchy? |
Threading should probably be an additional hierarchy level. When I was working in bioinformatics, I found GATK's cluster framework very (I'd like to offer something more concrete than a pointer, but I'm finding Cheers, On Tuesday, December 2, 2014, Stefan Karpinski [email protected]
|
Superceded by #9309 |
This PR is inspired by https://groups.google.com/d/msg/julia-dev/J8plDsw76dI/VQbedWJXi20J and https://groups.google.com/d/msg/julia-dev/Ui7G-99jBpI/VDmO-s0YFiMJ
addprocs
has a new keyword argumentclone_worker
. Default value is 0. Any positive value 'n' will result in 'n' additional workers to be created for every new worker created.clone_worker="auto"
will launchBase.CPU_CORES - 1
additional workers.Users will use this by having ClusterManagers launch a single worker per host. With
clone_worker="auto"
,addprocs
will automatically launch additional workers depending on the number of cores available on the host.For example, on 2 hosts, each with 8 cores,
addprocs(["host1", "host2"]; clone_worker="auto")
will first launch a single worker each on "host1" and "host2". Next it will launch 7 additional workers on each host via the 2 newly created workers. Thus only a single ssh session is used to connect to a host.One issue is that the process which launches the clones does not know the pids of the clones. And hence console output from the clones (which is channeled back to the master via the launching worker) is displayed as shown below:
But other than the above, I think it is a good option to launch required number of workers per host.