-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework of cluster manager. support additional launches via a worker #9309
Conversation
@@ -79,52 +79,28 @@ end | |||
|
|||
abstract ClusterManager | |||
|
|||
type Worker | |||
host::ByteString | |||
port::UInt16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host/port information are maintained by the respective cluster managers and "config" dict
I would really like to go ahead and merge this. @JeffBezanson any comments? |
Any idea why both travis and appveyor are failing? |
It appears that the test failures are related to this PR. |
Sorry about that. Fixed. |
AppVeyor error seems unrelated. |
Looks like a fluke, |
end | ||
end | ||
w.bind_addr | ||
w.config[:bind_addr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The w.bind_addr
field still exists. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, an oversight. Should be removed. Will do so.
Why the switch from passing |
The change came because while implementing #9046 , the worker-to-worker connections too will be implemented by the cluster manager. And in the worker, a ClusterManager object may not be appropriate. If we support user defined transports in the future, custom ClusterManagers will implement their own
|
I don't get it; it seems awkward to have this hidden coupling where It also might make sense to use types to separate which fields are mandatory, or common to all workers, from manager-specific custom fields. |
No, we need the type to dispatch to the correct implementation of
My current thinking is to keep it this way and just document these fields in more detail. The For example, the |
Ok, we can keep the Dict. My thinking was that you could write
and then we would have normal dispatch on instances. It's very odd to have the object you're actually dispatching on hidden inside a dictionary. |
OK. I'll revert it to use instances instead. It is more logical. Also, rethinking now, your idea of replacing config with a type with Nullable fields is not a bad idea at all. One of the fields in this can be |
Have made changes and rebased as discussed. Will merge this in a few days if no objections are raised. One of the AppVeyor runs failed. Seems unrelated to these changes. |
rework of cluster manager. support additional launches via a worker
The improvements deserve a mention in NEWS.md. |
Looks good! Great to have steps in place towards using different transports. |
Hi -- I'm interested in using the updated SSHManager with the features added by this pull request. I'm currently on 0.3.6 and from what I can tell base/managers.jl is not present in this version (correct me if I am wrong). If i update my head node to use the current master branch, but leave my ssh nodes to be on 0.3.6, does anyone know if that is likely to work (basically I am looking for an efficient way to start many instances per remote box and take advantage of:
thanks! |
Unfortunately, you will need 0.4 on the ssh nodes too. Backporting this to 0.3 (which is on a bugfix/maintenance schedule) is not planned - as it involves changes to the cluster manager interface itself. |
Thanks for the info! |
This PR does the following:
need for unecessary ssh connections for each worker launch.
addprocs
can accept an array of hosts / tuples. Each tuple of the type(host, count)
.count
can be"auto"
or:auto
, in which case it will launch as many workers as the numberof cores on
host
. machinefile too supports synatxauto * [email protected]
for a machine definition.Consequently, it supersedes #9202 and #9046