Skip to content

Conversation

@virtualluke
Copy link
Contributor

What do these changes do?

Added 'type: ray' to the deployments' metadata and keyed off that type for pod anti-affinity on hosts. This will cause kubernetes to not schedule more than one pod of type 'ray' onto the same host.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12228/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the soft form instead?preferredDuringSchedulingIgnoredDuringExecution

@virtualluke
Copy link
Contributor Author

Seems like during execution and scheduling you would always want node anti-affinity, not just during execution.

@ericl
Copy link
Contributor

ericl commented Feb 25, 2019 via email

@robertnishihara robertnishihara self-assigned this Mar 6, 2019
@virtualluke
Copy link
Contributor Author

I definitely prefer the harder (requiredDuringSchedulingIgnoreDuringExecution) form over the softer (preferredDuringSchedulingIgnoredDuringExecution). I am looking at cluster stability and want a hard rule about scheduling of ray pods. If others want it as a cluster scheduling suggestion (which is the softer form) than I am ok with that, just will use the hard version on our cluster.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think either is fine if you think there's a strong benefit, so this LGTM.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@robertnishihara
Copy link
Collaborator

I just tried this out and it works for me.

One issue I ran into when running this out of the box (pre-existing I think, and probably unrelated to this PR) is that kubectl create -f ray/kubernetes/submit.yaml didn't succeed because it required too much memory on the workers. We could increase the amount of memory requested by the pods, but then it will be harder to run out of the box (e.g., on minikube the pods don't get scheduled when more resources are requested.)

@robertnishihara robertnishihara merged commit 08a4769 into ray-project:master Mar 11, 2019
@robertnishihara
Copy link
Collaborator

Thanks @virtualluke!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants