-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Pod Topology Spread blog post (KEP-3022 KEP-3094 KEP-3243) #39777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
bd64321
528e113
ba1d6bd
c35cd20
dcfe5ae
bbe2382
4e1b4f7
81482cf
a141976
ae626b9
2d14bb7
f03ac42
1841857
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||||||||||
| --- | ||||||||||||||
| layout: blog | ||||||||||||||
| title: "Kubernetes 1.27: More fine-grained pod topology spread policies reached beta" | ||||||||||||||
| date: 2023-04-11 | ||||||||||||||
| slug: fine-grained-pod-topology-spread-features-beta | ||||||||||||||
| evergreen: true | ||||||||||||||
|
||||||||||||||
| evergreen: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents a commitment by the SIG to keep the content current.
Ah, OK. I'll delete then.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It is the feature to control how Pods are spread to each failure-domain (regions, zones, nodes etc). | |
| It is the feature to control how Pods are spread in the cluster topology or failure domains (regions, zones, nodes etc). |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| As time passed, we received feedback from users, | |
| As time passed, we - SIG Scheduling - received feedback from users, |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Then, we introduced the `minDomains` parameter in the Pod Topology Spread. | |
| Kubernetes v1.24 introduced the `minDomains` parameter for pod topology spread constraints, | |
| as an alpha feature. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| and a newly created replicaset has the following `topologySpreadConstraints` in template. | |
| and a newly created ReplicaSet has the following `topologySpreadConstraints` in its Pod template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| topologySpreadConstraints: | |
| ... | |
| topologySpreadConstraints: |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| minDomains: 5 # requires 5 Nodes at least. | |
| minDomains: 5 # requires 5 Nodes at least (because each Node has a unique hostname) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The cluster autoscaler provisions new Nodes based on these unschedulable Pods, | |
| and as a result, the replicas are finally spread over 5 Nodes. | |
| You can imagine that the cluster autoscaler provisions new Nodes based on these unschedulable Pods, | |
| and as a result, the replicas are finally spread over 5 Nodes. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## KEP-3094: Take taints/tolerations into consideration when calculating PodTopologySpread skew | |
| ## KEP-3094: Take taints/tolerations into consideration when calculating podTopologySpread skew |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| take all inclined nodes(satisfied with pod nodeAffinity and nodeSelector) into consideration | |
| take the Nodes that satisfy the Pod's nodeAffinity and nodeSelector into consideration |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This may lead to a node with untolerated taint best fit the pod in podTopologySpread plugin, and as a result, | |
| This may lead to a node with untolerated taint as the only candidate for spreading, and as a result, |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| the pod will stuck in pending for it violates the nodeTaint plugin. | |
| the pod will stuck in Pending if it doesn't tolerate the taint. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right?
| To allow more fine-gained decisions about which Nodes to account for when calculating spreading skew, we introduced | |
| two new fields in `TopologySpreadConstraint` to define node inclusion policies including nodeAffinity and nodeTaint. | |
| To allow more fine-gained decisions about which Nodes to account for when calculating spreading skew, | |
| Kubernetes 1.25 introduced two new fields within `topologySpreadConstraints` to define node inclusion policies: | |
| `nodeAffinityPolicy` and `nodeTaintPolicy`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, what's your concern here?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **nodeAffinityPolicy** indicates how we'll treat Pod's nodeAffinity/nodeSelector in pod topology spreading. | |
| The `nodeAffinityPolicy` field indicates how Kubernetes treats a Pod's `nodeAffinity` or `nodeSelector` for | |
| pod topology spreading. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If `Honor`, kube-scheduler will filter out nodes not matching nodeAffinity/nodeSelector in the calculation of spreading skew. | |
| If `Ignore`, all nodes will be included, regardless of whether they match the Pod's nodeAffinity/nodeSelector or not. | |
| If `Honor`, kube-scheduler filters out nodes not matching `nodeAffinity`/`nodeSelector` in the calculation of | |
| spreading skew. | |
| If `Ignore`, all nodes will be included, regardless of whether they match the Pod's `nodeAffinity`/`nodeSelector` | |
| or not. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For backwards-compatibility, nodeAffinityPolicy defaults to `Honor`. | |
| For backwards compatibility, `nodeAffinityPolicy` defaults to `Honor`. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **nodeTaintsPolicy** indicates how we'll treat node taints in pod topology spreading. | |
| The `nodeTaintsPolicy` field defines how Kubernetes considers node taints for pod topology spreading. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For backwards-compatibility, nodeTaintsPolicy defaults to the `Ignore`. | |
| For backwards compatibility, `nodeTaintsPolicy` defaults to `Ignore`. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The feature was introduced in v1.25 as alpha level. By default, it was disabled, so if you want to use this feature in v1.25, | |
| you have to enable the feature gate `NodeInclusionPolicyInPodTopologySpread` actively. In the following v1.26, we graduated | |
| this feature to beta and it was enabled by default since. | |
| The feature was introduced in v1.25 as alpha. By default, it was disabled, so if you want to use this feature in v1.25, | |
| you had to explictly enable the feature gate `NodeInclusionPolicyInPodTopologySpread`. In the following v1.26 | |
| release, that associated feature graduated to beta and is enabled by default. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## KEP-3243: Respect PodTopologySpread after rolling upgrades | |
| ## KEP-3243: Respect Pod topology spread after rolling upgrades |
This isn't the exact title, but it's close enough
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| podTemplate and the `labelSelector` in the `topologySpreadConstraints`). | |
| Pod template and the `labelSelector` in the `topologySpreadConstraints`). |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To solve this problem once and for all, and to make more accurate decisions in scheduling, we added a new named | |
| To solve this problem with a simpler API, we added a new field named |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not explained why you would do that. When describing the original problem (two paragraphs above) you can say that an alternative would be to add a different label value every time the deployment changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yeah. I add this in the paragraphs above.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The controller/operator just needs to set different values to the same label key for different revisions. | |
| The controller or operator managing rollouts just needs to set different values to the same label key for different revisions. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| These features are managed by the [SIG/Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling). | |
| These features are managed by Kubernetes [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the filename to match.