-
Notifications
You must be signed in to change notification settings - Fork 833
[k8s] Consolidate Pod Labels #7724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/smoke-test --kubernetes |
|
@Michaelvll @romilbhardwaj this PR will modify some labels in the helm charts, and honestly there is no straightforward way in maintaining backwards compatibility. Would like to confirm with you guys whether this is ok |
|
Thanks @kyuds! Two clarification questions:
|
|
Addressing the above:
|
|
We should not add a label for SkyPilot cluster name, as it could include the invalid chars for label values |
|
I could do a sanitization to optionally include a cluster name if its valid (ie: optionally add labels, as we expect most users to use english characters for cluster name). We will also include heavy documentation on it (in the tips section for kubernetes), saying that only cluster names with valid chars will be added to labels. Just a suggestion |
|
/smoke-test --kubernetes |
|
Let's don't include the cluster name at all. Otherwise, it could be quite confusing that some clusters have the label but others do not |
|
/quicktest-core --kubernetes -k test_server_downgrade_upgrade_compatibility |
|
/smoke-test --kubernetes |
|
/quicktest-core --kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kyuds! One suggestion on comments and this PR is good to go.
For pods, I'm not worried about deprecating the skypilot-cluster label on v0.11.0 (or even now) because the alternate skypilot-cluster-name has existed basically forever.
However, for services where skypilot-cluster-name is just being added now, I'm a little concerned that deprecation at v0.11.0 may be a bit too hasty. The reason I am worried is because it is entirely likely for some of our users to spin up / use a long-running cluster that continues to run across API server version upgrades, and while we do more or less enforce the client and server version be in sync, we do try to let clusters stay running across versions.
All that to say, I suggest we extend the deprecation timeline for codepaths that query services using the cluster name label - the deprecation timeline can stay as is for pod cases.
|
@SeungjinYang any suggestions for the extended deprecation timeline? |
I'm going to refer to https://docs.skypilot.co/en/latest/developers/CONTRIBUTING.html#backward-compatibility-guidelines which seems to suggest v0.12.0 to deprecate anything for v0.10.x |
|
will merge once the smoke tests pass |
|
/smoke-test --kubernetes |
|
/quicktest-core --kubernetes |
|
smoketest failure unrelated: failed on current master too: https://buildkite.com/skypilot-1/smoke-tests/builds/5198#_ pinging @zpoint |
|
/quicktest-core --kubernetes -k test_managed_jobs |
|
ok quicktest kubernetes test_managed_jobs is definitely weird... |
Related to #7256
Note I cannot provide a fix for the issue above without breaking backwards compatibility, so will be fully resolving post v0.11.0
Tested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)