Skip to content

Conversation

@cici37
Copy link
Contributor

@cici37 cici37 commented Feb 7, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implement the ordered namespace deletion.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added a alpha feature gate `OrderedNamespaceDeletion`. When enabled, the pods resources are deleted before all other resources while namespace deletion to ensure workload security. 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/5080

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 7, 2025
@k8s-ci-robot k8s-ci-robot added area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 7, 2025
@cici37
Copy link
Contributor Author

cici37 commented Feb 7, 2025

/sig api-machinery

@cici37 cici37 force-pushed the nsDeletion branch 2 times, most recently from 747a6c9 to d5058ba Compare February 7, 2025 14:24
@aojea
Copy link
Member

aojea commented Feb 15, 2025

overall looks good, there is some comments to address #130035 (comment) and the e2e test commits may be squashed togeher

@cici37
Copy link
Contributor Author

cici37 commented Feb 18, 2025

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 18, 2025
@cici37
Copy link
Contributor Author

cici37 commented Feb 21, 2025

All comments have been addressed. Thank you!

/assign @aojea @thockin

@cici37
Copy link
Contributor Author

cici37 commented Feb 21, 2025

/test pull-kubernetes-e2e-gce-cos-alpha-features

@cici37
Copy link
Contributor Author

cici37 commented Feb 21, 2025

Note: The test failure is not related with current PR. Ref: #130339

@k8s-ci-robot
Copy link
Contributor

@cici37: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-cos-alpha-features e1b3c8f link false /test pull-kubernetes-e2e-gce-cos-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

logger.V(5).Info("Namespace controller - OrderedNamespaceDeletion feature gate is enabled", "namespace", namespace)
// Ensure all pods in the namespace are deleted first
podsGVR := schema.GroupVersionResource{Group: "", Version: "v1", Resource: "pods"}
gvrDeletionMetadata, err := d.deleteAllContentForGroupVersionResource(ctx, podsGVR, namespace, namespaceDeletedAt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere we discussed whether we need to wait for actual delete or just that all pods are "deleting" and have been stopped. That would require actually looking into the Pod object itself and knowing about the kubelet state.

I think this (waiting for deleted) is fine for now, but do you think we should consider the above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current behavior, the pod are deleting and have been stopped would prevent the pod from serving traffic. I am not quite sure about the necessity of the use case of "must actual delete pod before deleting other resources". Is there a concern of "deleting pod" be exposure to security risk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can't permanently stop a pod from being stopped, but they can stop it from being deleted (via finalizers).

It may not matter, since the NS deletion is "stuck" either way.

And we don't have a general way to know that an arbitrary object is "stopped" (it's pod-specific), much less "stopped and guaranteeed not to restart".

OK, you talked me out of it.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5bb36a49c0381faa7824e43a03104f4dc7a7410c

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cici37, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member

thockin commented Feb 26, 2025

/retest

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2025
@k8s-ci-robot k8s-ci-robot merged commit b38bf6c into kubernetes:master Feb 26, 2025
15 of 16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Feb 26, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Apps Feb 26, 2025
@cici37 cici37 deleted the nsDeletion branch February 26, 2025 23:17
// Check if any pods remain before proceeding to delete other resources
if numRemainingTotals.gvrToNumRemaining[podsGVR] > 0 {
logger.V(5).Info("Namespace controller - pods still remain, delaying deletion of other resources", "namespace", namespace)
return estimate, utilerrors.NewAggregate(errs)
Copy link
Member

@liggitt liggitt Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that returning early here means we're not reaching the conditionUpdater.Update / nsClient.UpdateStatus calls below, so any errors deleting pods, or any delay in waiting for pods to complete graceful deletion / get finalizers removed / etc, will be silent (not reflected in namespace conditions)

That doesn't impact the namespace deletion, but it does regress usability improvements made in #73405 and #82189 for #82084 / #64002, #60807, #66735 for issues with pod cleanup specifically

It could be enough to duplicate the condition / ns status update into this short-circuit block so we report that pods remain or pod deletion errors occurred if we're blocked on that:

	if hasChanged := conditionUpdater.Update(ns); hasChanged {
		if _, err = d.nsClient.UpdateStatus(ctx, ns, metav1.UpdateOptions{}); err != nil {
			utilruntime.HandleError(fmt.Errorf("couldn't update status condition for namespace %q: %v", namespace, err))
		}
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! Here is the PR: #130617

k8s-ci-robot added a commit that referenced this pull request Mar 8, 2025
…35-upstream-release-1.30

Automated cherry pick of #130035: Add the feature gate `OrderedNamespaceDeletion` for
k8s-ci-robot added a commit that referenced this pull request Mar 8, 2025
…35-upstream-release-1.31

Automated cherry pick of #130035: [KEP-5080]Ordered Namespace Deletion
k8s-ci-robot added a commit that referenced this pull request Mar 8, 2025
…35-upstream-release-1.32

Automated cherry pick of #130035: [KEP-5080]Ordered Namespace Deletion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants