Skip to content

Conversation

@kyuds
Copy link
Collaborator

@kyuds kyuds commented Oct 24, 2025

Related to #7256

Note I cannot provide a fix for the issue above without breaking backwards compatibility, so will be fully resolving post v0.11.0

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@kyuds
Copy link
Collaborator Author

kyuds commented Oct 24, 2025

/smoke-test --kubernetes

@kyuds kyuds changed the title [WIP][k8s] De-Duplicate Pod Labels [k8s] De-Duplicate Pod Labels Oct 24, 2025
@kyuds
Copy link
Collaborator Author

kyuds commented Oct 24, 2025

@Michaelvll @romilbhardwaj this PR will modify some labels in the helm charts, and honestly there is no straightforward way in maintaining backwards compatibility. Would like to confirm with you guys whether this is ok

@romilbhardwaj
Copy link
Collaborator

romilbhardwaj commented Oct 27, 2025

Thanks @kyuds! Two clarification questions:

  1. skypilot-cluster-name will this label value be cluster_name or cluster_name_on_cloud? Looks like it will be cluster_name_on_cloud.
  2. ^ Should we also add a label for cluster_name? IIRC we've had user requests wanting to query k8s pod name based on cluster name.

@kyuds
Copy link
Collaborator Author

kyuds commented Oct 27, 2025

Addressing the above:

  1. skypilot_cluster_name is the label name for cluster_name_on_cloud for not only kubernetes, but aws, gcp, azure, vsphere, etc. I think its good to be consistent on this.
  2. I'm assuming we want a label for cluster_name for kubernetes in particular? Adding labels for that should be easy, determining a suitable label name is hard. For now, I'm thinking of skypilot_canonical_cluster_name, but maybe its too verbose? Also, a bit more on that, I'm assuming we only want labels for pods? how about the different services (eg: ingress)?
    wdyt @romilbhardwaj ?

@Michaelvll
Copy link
Collaborator

We should not add a label for SkyPilot cluster name, as it could include the invalid chars for label values

@kyuds
Copy link
Collaborator Author

kyuds commented Oct 27, 2025

I could do a sanitization to optionally include a cluster name if its valid (ie: optionally add labels, as we expect most users to use english characters for cluster name). We will also include heavy documentation on it (in the tips section for kubernetes), saying that only cluster names with valid chars will be added to labels. Just a suggestion

@kyuds kyuds requested review from kevinmingtarja and removed request for Michaelvll and romilbhardwaj October 28, 2025 00:06
@kyuds
Copy link
Collaborator Author

kyuds commented Oct 29, 2025

/smoke-test --kubernetes
/quicktest-core --kubernetes

Copy link
Collaborator

Let's don't include the cluster name at all. Otherwise, it could be quite confusing that some clusters have the label but others do not

@kyuds
Copy link
Collaborator Author

kyuds commented Oct 31, 2025

/quicktest-core --kubernetes -k test_server_downgrade_upgrade_compatibility

@kyuds kyuds changed the title [k8s] De-Duplicate Pod Labels [k8s] Consolidate Pod Labels Nov 2, 2025
@kyuds
Copy link
Collaborator Author

kyuds commented Nov 2, 2025

/smoke-test --kubernetes

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 2, 2025

/quicktest-core --kubernetes

@kyuds kyuds requested a review from SeungjinYang November 2, 2025 14:41
Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyuds! One suggestion on comments and this PR is good to go.

For pods, I'm not worried about deprecating the skypilot-cluster label on v0.11.0 (or even now) because the alternate skypilot-cluster-name has existed basically forever.
However, for services where skypilot-cluster-name is just being added now, I'm a little concerned that deprecation at v0.11.0 may be a bit too hasty. The reason I am worried is because it is entirely likely for some of our users to spin up / use a long-running cluster that continues to run across API server version upgrades, and while we do more or less enforce the client and server version be in sync, we do try to let clusters stay running across versions.

All that to say, I suggest we extend the deprecation timeline for codepaths that query services using the cluster name label - the deprecation timeline can stay as is for pod cases.

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

@SeungjinYang any suggestions for the extended deprecation timeline?

@SeungjinYang
Copy link
Collaborator

@SeungjinYang any suggestions for the extended deprecation timeline?

I'm going to refer to https://docs.skypilot.co/en/latest/developers/CONTRIBUTING.html#backward-compatibility-guidelines which seems to suggest v0.12.0 to deprecate anything for v0.10.x

@SeungjinYang
Copy link
Collaborator

will merge once the smoke tests pass

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

/smoke-test --kubernetes

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

/quicktest-core --kubernetes

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

smoketest failure unrelated: failed on current master too: https://buildkite.com/skypilot-1/smoke-tests/builds/5198#_ pinging @zpoint

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

/quicktest-core --kubernetes -k test_managed_jobs

@kyuds
Copy link
Collaborator Author

kyuds commented Nov 4, 2025

ok quicktest kubernetes test_managed_jobs is definitely weird...

@kyuds kyuds merged commit 16cfa24 into master Nov 4, 2025
23 checks passed
@kyuds kyuds deleted the kyuds/dup-k8s-labels branch November 4, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants