-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for provision pipeline #400
Comments
Sounds good at a high level. The upsides are solving the mentioned problems by having more control. The only downsides I can see are
We should have a meeting to talk about this! |
I think this is a good idea. Also if I understand correctly, this is proposing we bring back Another upside to adding the |
Yes, good point! Basically, it is bringing back |
We currently had quite many problems with
ray up
not updating file_mounts (mostly for multi-node), the logging is not informative enough #384, long wait time during file_mounts for large files,workdir
problem in #389, ssh config set too late problem #385, and also the problem when the user's setup fails the cluster will remainINIT
state (not distinguishing the whether the instance is successfully launched or not).The following is my proposal for a new API:
For the Backend, we add a new
setup
function, and during execution, we add thissetup()
after thesync_file_mounts
function. Theprovision
function is now only responsible for launching the instances and setup ray, just like the first step of gang_scheduling. Also, before the provisioning, our backend can tryray exec handle.yaml -- ray status
to check the availability of the cluster and skip theray up
if ready. All the user-customized file_mounts and setups are then handled by our own functions.The text was updated successfully, but these errors were encountered: