Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler][local node provider] Make sure docker state is completely cleaned up #17689

Open
2 tasks
DmitriGekhtman opened this issue Aug 10, 2021 · 6 comments
Open
2 tasks
Labels
bug Something that is supposed to be working; but isn't infra autoscaler, ray client, kuberay, related issues P2 Important issue, but not time-critical
Milestone

Comments

@DmitriGekhtman
Copy link
Contributor

DmitriGekhtman commented Aug 10, 2021

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS):

Some users report trouble due to incomplete docker clean-up with the local node provider.
See discussion here:
https://discuss.ray.io/t/how-do-i-troubleshoot-nodes-that-remaining-uninitialized/2991/5
Might need some LocalNodeProvider-specific termination logic to fix that. Related to #17022.

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@DmitriGekhtman DmitriGekhtman added bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical labels Aug 10, 2021
@DmitriGekhtman DmitriGekhtman added this to the Serverless Autoscaling milestone Aug 10, 2021
@AmeerHajAli AmeerHajAli added the infra autoscaler, ray client, kuberay, related issues label Mar 26, 2022
@olly-writes-code
Copy link

This still seems to be a problem :/

@DmitriGekhtman
Copy link
Contributor Author

I believe local node provider has been deprecated for a while, so this issue can probably be closed.

@DmitriGekhtman
Copy link
Contributor Author

Cc @anyscalesam

@DmitriGekhtman
Copy link
Contributor Author

DmitriGekhtman commented Oct 18, 2024

Correction: local node provider is not yet officially deprecated. It just hasn't been maintained in a while. Fine to keep this issue open.

For anyone else who stumbles on this thread: For on-premises clusters, it's simpler to start Ray by manually ssh-ing ray start into each of the nodes in the cluster.

@olly-writes-code
Copy link

That defeats the point of using Ray. There are much simpler alternatives if Ray can't handle orchestration.

@DmitriGekhtman
Copy link
Contributor Author

Cluster management is not a feature of Ray itself.
To manage infrastructure for Ray, the best options are to use a cluster manager (Kubernetes and KubeRay) or the managed solution (Anyscale).

But yeah, don't use Ray if there's a simpler solution for your application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't infra autoscaler, ray client, kuberay, related issues P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants