Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 KCP: remove etcd member in pre-terminate hook #11137

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Sep 5, 2024

What this PR does / why we need it:

Note, this fix required the introduction of a pre-terminate hook that is automatically added and managed by the KCP controller for KCP control plane Machines. If your control plane Machines are using Kubernetes 1.31, KCP will make sure that its pre-terminate hook is run last. This is done to ensure that the terminating Node has a working kubelet / Node
while other pre-terminate hooks are executed.

More details about the issue can be found in Drain not being performed for KCP machines with K8s v1.31.x .

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #11138

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels Sep 5, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@sbueringer: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-latestk8s-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-31-1-32-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@k8s-ci-robot k8s-ci-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@fabriziopandini fabriziopandini added the area/provider/control-plane-kubeadm Issues or PRs related to KCP label Sep 5, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Sep 5, 2024
util/collections/machine_collection.go Outdated Show resolved Hide resolved
// Return early if there are other pre-terminate hooks for the Machine.
// The KCP pre-terminate hook should be the one executed last, so that kubelet
// is still working while other pre-terminate hooks are run.
if machineHasOtherPreTerminateHooks(deletingMachine) {
Copy link
Member

@fabriziopandini fabriziopandini Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should somehow document that KCP controlled machines are adding a hook that expects to be always run as last (to ensure kubelet is running).

This should go in the PR description first but probably in a few other places too (the latter could be part of this PR or a quick follow-up, up to you); Probably this should go:

  • At the end of the go comment for PreTerminateDeleteHookAnnotation
  • 1.9 upgrade instruction
  • release notes for the patch release

(and we should bring this up at the office hours)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a follow-up to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can take that one. Probably easier because you know better where you want to add it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a follow-up PR (#11153) + added it to the agenda

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-main

Failed on old mgmt version (flake not affected by this PR)

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-mink8s-main
unrelated self-hosted flake where machine pool did not upgrade, the new machine pool machine failed because CAPD was not able to preload an image:

DockerMachine":{"name":"worker-8w8ht2","namespace":"self-hosted-9670wf"},"namespace":"self-hosted-9670wf","name":"worker-8w8ht2","reconcileID":"a593a27e-a1cd-4819-83ff-8acbc1af8d7d","err":"failed to pre-load images into the DockerMachine: failed to load image \"gcr.io/k8s-staging-cluster-api/cluster-api-controller-amd64:dev\": error creating container exec: Error response from daemon: container 7254f307428922c75bd758f33c04b8092ea8e593bbf4582f4bd72b2947b4d2aa is not running

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-latestk8s-main

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 5, 2024
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 5, 2024
@sbueringer sbueringer changed the title [WIP] 🐛 KCP: remove etcd member in pre-terminate hook 🐛 KCP: remove etcd member in pre-terminate hook Sep 5, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main

@fabriziopandini
Copy link
Member

Great work!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7846e7efa4c01e76b9d2ac602660759c05efbe46

@chrischdi
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2024
@chrischdi
Copy link
Member

/hold

on your convenience

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2024
@sbueringer
Copy link
Member Author

/hold cancel

All tests green

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit 518fce7 into kubernetes-sigs:main Sep 5, 2024
27 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Sep 5, 2024
@sbueringer sbueringer deleted the pr-kcp-pre-terminate branch September 5, 2024 15:16
@Sunnatillo Sunnatillo mentioned this pull request Nov 19, 2024
54 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/control-plane-kubeadm Issues or PRs related to KCP cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drain not being performed for KCP machines with K8s v1.31.x
5 participants