Skip to content

Kubernetes version skew check for machine pools #12732

@AndiDog

Description

@AndiDog

What would you like to be added (User Story)?

The version skew policy should be respected when rolling out nodes of a machine pool. For example, if the control plane nodes aren't up to date yet, workers with a newer version must not be created.

Detailed Description

The implementation in infra providers can vary quite a lot, so this issue is to try and find a common ground. For instance, contract/API fields, a helper function or documented implementation recommendation may help here.

Example:

  • Before: KCP on Kubernetes version v1.29.0, MachinePool on v1.29.0
  • User applies v1.30.0 on both objects at once
  • If the machine pool controller doesn't check for version skew, or still sees the old MachinePool object's spec.version (= v1.29.0) in its version skew check, it might already roll out workers with v1.30.0 which can lead to an outage if these versions aren't compatible (which the policy doesn't guarantee at all: "kubelet must not be newer than kube-apiserver").

At time of writing (2025-09-04), there was no known MachinePool controller implementation with code for version skew check. Initial attempts have been made in infra providers: in this CAPA PR, we found that the CAPA controller might see an old MachinePool object, which happens because kubectl apply or helm {install,upgrade}, for example, deploy objects in mostly random order and without time guarantees. There's no good way to find out whether MachinePool.spec.version and the Kubernetes version implicitly defined by AWSMachinePool.spec.awsLaunchTemplate.ami.id (the VM image) are in sync. Only a new field/label on the <Infra>MachinePool would help in that case.

If the infra provider pauses reconciliation of the machine pool cloud resources while the control plane is outdated/skewed, it may, depending on implementation, still need to ensure that the bootstrap token for newly-created workers gets refreshed (e.g. AWS: EC2 user data in launch template). This can be tricky to implement and test.

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
/area machinepool

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machinepoolIssues or PRs related to machinepoolskind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions