Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero Namespace stucks in terminating state #8409

Open
ansultan1 opened this issue Nov 15, 2024 · 9 comments
Open

Velero Namespace stucks in terminating state #8409

ansultan1 opened this issue Nov 15, 2024 · 9 comments
Assignees
Labels
Area/CLI related to the command-line interface

Comments

@ansultan1
Copy link

with Velero uninstall --force command the resources in the namespace gets deleted but the namespace stucks in terminating state. tested for both IBM ROKS and IKS cluster . For both same issue is occurring.

@blackpiglet blackpiglet added the Area/CLI related to the command-line interface label Nov 15, 2024
@blackpiglet
Copy link
Contributor

Could you give more information about the hanged namespace?
It's better to show the output of this CLI.

kubectl get ns velero -o yaml

@ansultan1
Copy link
Author

ansultan1 commented Nov 15, 2024

apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2024-11-15T07:27:19Z"
  deletionTimestamp: "2024-11-15T07:29:01Z"
  labels:
    component: velero
    kubernetes.io/metadata.name: velero
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/audit-version: latest
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: privileged
    pod-security.kubernetes.io/warn-version: latest
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:component: {}
          f:kubernetes.io/metadata.name: {}
          f:pod-security.kubernetes.io/audit: {}
          f:pod-security.kubernetes.io/audit-version: {}
          f:pod-security.kubernetes.io/enforce: {}
          f:pod-security.kubernetes.io/enforce-version: {}
          f:pod-security.kubernetes.io/warn: {}
          f:pod-security.kubernetes.io/warn-version: {}
    manager: velero
    operation: Update
    time: "2024-11-15T07:27:19Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .: {}
          k:{"type":"NamespaceContentRemaining"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionContentFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionDiscoveryFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionGroupVersionParsingFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceFinalizersRemaining"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
    manager: kube-controller-manager
    operation: Update
    subresource: status
    time: "2024-11-15T07:29:08Z"
  name: velero
  resourceVersion: "2878"
  uid: 6a7a010c-e5c6-4a1d-94ac-0c1585ef3ada
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2024-11-15T07:29:06Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.eks.amazonaws.com/v1: stale GroupVersion discovery: metrics.eks.amazonaws.com/v1'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating

@blackpiglet
Copy link
Contributor

  - lastTransitionTime: "2024-11-15T07:29:06Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.eks.amazonaws.com/v1: stale GroupVersion discovery: metrics.eks.amazonaws.com/v1'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure

Depending on this event, I think this is caused by the metric server not working in the cluster.
Because the PodMetrics resource is namespace-scoped, the velero namespace also contains that resources.
If the metric-server stops working, then the k8s cluster cannot serve the PodMetrics resource anymore, and then the left resource in the namespace velero prevents the namespace deletion.

That is not a Velero issue. This is how k8s works.

@blackpiglet blackpiglet self-assigned this Nov 15, 2024
@ansultan1
Copy link
Author

ansultan1 commented Nov 15, 2024

The metrics server is working fine in the cluster . the namespce is still in terminating and command is stucked

image
image

@blackpiglet
Copy link
Contributor

OK, but there should still be some resources left in the velero namespace, could you check which resource was left?

  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2024-11-15T07:29:08Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining

@ansultan1
Copy link
Author

NO resource in velero namespace

image

@ansultan1
Copy link
Author

ansultan1 commented Nov 15, 2024

one pod in the kube system namespace is in pending . if issue is because of this may be

image

image

@blackpiglet
Copy link
Contributor

kubectl get all doesn't literally retrieve all resources.

Try this CLI kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n velero.

@ansultan1
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CLI related to the command-line interface
Projects
None yet
Development

No branches or pull requests

2 participants