diff --git a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md index 89240c7c4c9..ee62056ebbb 100644 --- a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md +++ b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md @@ -90,7 +90,7 @@ tags, and then generate with `hack/update-toc.sh`. - [Possible misuse](#possible-misuse) - [The update to labels specified at matchLabelKeys isn't supported](#the-update-to-labels-specified-at-matchlabelkeys-isnt-supported) - [Design Details](#design-details) - - [[v1.33] design change and a safe upgrade path](#v133-design-change-and-a-safe-upgrade-path) + - [[v1.34] design change and a safe upgrade path](#v134-design-change-and-a-safe-upgrade-path) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit tests](#unit-tests) @@ -401,13 +401,13 @@ kube-apiserver modifies the `labelSelector` like the following: In addition, kube-scheduler will handle `matchLabelKeys` within the cluster-level default constraints in the scheduler configuration in the future (see https://github.com/kubernetes/kubernetes/issues/129198). -Finally, the feature will be guarded by a new feature flag. If the feature is +Finally, the feature will be guarded by a new feature flag `MatchLabelKeysInPodTopologySpread`. If the feature is disabled, the field `matchLabelKeys` and corresponding `labelSelector` are preserved if it was already set in the persisted Pod object, otherwise new Pod with the field creation will be rejected by kube-apiserver. Also kube-scheduler will ignore `matchLabelKeys` in the cluster-level default constraints configuration. -### [v1.33] design change and a safe upgrade path +### [v1.34] design change and a safe upgrade path Previously, kube-scheduler just internally handled `matchLabelKeys` before the calculation of scheduling results. But, we changed the implementation design to the current form to make the design align with PodAffinity's `matchLabelKeys`. (See the detailed discussion in [the alternative section](#implement-matchlabelkeys-in-only-either-the-scheduler-plugin-or-kube-apiserver)) @@ -415,11 +415,15 @@ But, we changed the implementation design to the current form to make the design However, this implementation change could break `matchLabelKeys` of unscheduled pods created before the upgrade because kube-apiserver only handles `matchLabelKeys` at pods creation, that is, it doesn't handle `matchLabelKeys` at existing unscheduled pods. -So, for a safe upgrade path from v1.32 to v1.33, kube-scheduler would handle not only `matchLabelKeys` -from the default constraints, but also all incoming pods during v1.33. -We're going to change kube-scheduler to only concern `matchLabelKeys` from the default constraints at v1.34 for efficiency, +So, for a safe upgrade path from v1.33 to v1.34, kube-scheduler would handle not only `matchLabelKeys` +from the default constraints, but also all incoming pods during v1.34. +We're going to change kube-scheduler to only concern `matchLabelKeys` from the default constraints at v1.35 for efficiency, assuming kube-apiserver handles `matchLabelKeys` of all incoming pods. +Also, in case of bugs in this new design, users can disable this feature through a new feature flag, +`MatchLabelKeysInPodTopologySpreadSelectorMerge` (enabled by default). +(See more details in [Feature Enablement and Rollback](#feature-enablement-and-rollback)) + ### Test Plan +- `MatchLabelKeysInPodTopologySpread` feature flag enables the `MatchLabelKeys` feature in `TopologySpreadConstraint`. +- `MatchLabelKeysInPodTopologySpreadSelectorMerge` feature flag enables the new design described at + [[v1.34] design change and a safe upgrade path](#v134-design-change-and-a-safe-upgrade-path). + - If `MatchLabelKeysInPodTopologySpreadSelectorMerge` is disabled while `MatchLabelKeysInPodTopologySpread` is enabled, + Kubernetes handles `MatchLabelKeys` with the classic design, kube-scheduler handles it. + However, that's basically not recommended unless you encounter a bug in a new design behavior. + - This flag cannot be enabled on its own, and has to be enabled together with `MatchLabelKeysInPodTopologySpread`. + Enabling `MatchLabelKeysInPodTopologySpreadSelectorMerge` alone has no effect, and `matchLabelKeys` will be ignored. + +The `MatchLabelKeysInPodTopologySpreadSelectorMerge` feature flag has been added in v1.34 and enabled by default. +This flag can be disabled to revert [the implementation design change in v1.34](#v134-design-change-and-a-safe-upgrade-path) +and go back to the previous behavior in case of bug. + ###### How can this feature be enabled / disabled in a live cluster? -Yes. It's helpful if we have the metrics to see which plugins affect to scheduler's decisions in Filter/Score phase. -There is the related issue: https://github.com/kubernetes/kubernetes/issues/110643 . It's very big and still on the way. +Yes, [there were](https://github.com/kubernetes/kubernetes/issues/110643), and it's been implemented in +[#115082](https://github.com/kubernetes/kubernetes/pull/115082) and [#118025](https://github.com/kubernetes/kubernetes/pull/118025). ### Dependencies @@ -1061,6 +1081,7 @@ Major milestones might include: - 2022-06-08: KEP merged - 2023-01-16: Graduate to Beta - 2025-01-23: Change the implementation design to be aligned with PodAffinity's `matchLabelKeys` + - 2025-04-07: Add a new feature flag `MatchLabelKeysInPodTopologySpreadSelectorMerge` and update milestone ## Drawbacks @@ -1086,7 +1107,7 @@ and scheduler plugin shouldn't have special treatment for any labels/fields. Technically, we can implement this feature within the PodTopologySpread plugin only; merging the key-value labels corresponding to `MatchLabelKeys` into `LabelSelector` internally within the plugin before calculating the scheduling results. -This is the actual implementation up to 1.32. +This is the actual implementation up to 1.33. But, it may confuse users because this behavior would be different from PodAffinity's `MatchLabelKeys`. Also, we cannot implement this feature only within kube-apiserver because it'd make it diff --git a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml index cff792ac5c0..733e6c15d8f 100644 --- a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml +++ b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml @@ -7,8 +7,14 @@ status: implementable creation-date: 2022-03-17 reviewers: - "@ahg-g" + - "@sanposhiho" + - "@macsko" + - "@dom4ha" approvers: - "@ahg-g" + - "@sanposhiho" + - "@macsko" + - "@dom4ha" see-also: - "/keps/sig-scheduling/895-pod-topology-spread" @@ -22,13 +28,13 @@ stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.33" +latest-milestone: "v1.34" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.25" beta: "v1.27" - stable: "v1.35" + stable: "v1.36" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled @@ -37,6 +43,9 @@ feature-gates: components: - kube-apiserver - kube-scheduler + - name: MatchLabelKeysInPodTopologySpreadSelectorMerge + components: + - kube-apiserver disable-supported: true