Cannot start a simple local cluster using the config.yaml - workers are not found #42128
Labels
bug
Something that is supposed to be working; but isn't
core
Issues that should be addressed in Ray Core
core-clusters
For launching and managing Ray clusters/jobs/kubernetes
P1
Issue that should be fixed within a few weeks
stability
What happened + What you expected to happen
I have multiple pcs that are connected and can be accesses easily through ssh. Going manually inside a pc, that is the node, and defining it to be the head or the worker is working fine. The issue arises, when I try to do the very same thing using the config.yaml.
First, the manual procedure:
now ssh into all the other machines that shall be the workers and perform
ray start --address=head-node-address:port
Using ray status or viewing the dashboard, it can be observed that all the desired nodes are online.
Now this shall be replicated with a config.yaml. However, sometimes when I have luck it will find the workers and mostly it will not find the workers.
Versions / Dependencies
Reproduction script
Please see the description above, that is the config.yaml
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: