Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespace deletion hanging #135

Open
mcandio opened this issue Feb 15, 2022 · 7 comments
Open

Namespace deletion hanging #135

mcandio opened this issue Feb 15, 2022 · 7 comments

Comments

@mcandio
Copy link

mcandio commented Feb 15, 2022

Hello all.

I'm experiencing namespace termination hang when using helmchart.helm.cattle.io CRD.
I'm Applying the following helm resource:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: rabbitmq
  namespace: candio-helm-controller-issue
spec:
  chart: https://charts.bitnami.com/bitnami/rabbitmq-8.18.0.tgz
  # https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq/#installing-the-chart
  valuesContent: |-
    replicaCount: 1
    auth:
      username: rabbit
      password: password
    persistence:
      enabled: true
      accessMode: ReadWriteOnce
      ## If you change this value, you might have
      ## to adjust `rabbitmq.diskFreeLimit` as well.
      size: 8Gi
    service:
      managerPortEnabled: false
    metrics:
      enabled: true
    volumePermissions:
      enabled: false
    clustering:
      forceBoot: true

this file creates the resource and works perfectly fine but the problem arise when trying to delete the namespace.

root@8:/home/broker# kubectl get pods -n candio-helm-controller-issue
NAME                             READY   STATUS      RESTARTS   AGE
helm-install-rabbitmq--1-sb88h   0/1     Completed   0          38s
rabbitmq-0                       1/1     Running     0          36s
root@dev-office-inference-8:/home/agot/broker# kubectl delete ns candio-helm-controller-issue
namespace "candio-helm-controller-issue" deleted
^C
root@8:/home/broker# kubectl get ns
NAME                           STATUS        AGE
broker                         Active        3d23h
candio-helm-controller-issue   Terminating   3m8s
root@8:/home/broker# kubectl api-resources --verbs=list --namespaced -o name \
>   | xargs -n 1 kubectl get --show-kind --ignore-not-found -n candio-helm-controller-issue
NAME                                AGE
helmchart.helm.cattle.io/rabbitmq   108s

K3s version

root@8:/home/broker# k3s --version
k3s version v1.22.2+k3s2 (3f5774b4)
go version go1.16.8

What should be the exact behavior?
Is there any documentation related to the CRD and the deletion process?

Any help here will be appreciated.

@cubinet-code
Copy link

The Helmchart resource should be in the kube-system namespace, as otherwise the finalizers hang.
You then specify the namespace you want the deployment to happen in the spec.targetNamespace:

---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: rabbitmq
  namespace: kube-system
spec:
  chart: https://charts.bitnami.com/bitnami/rabbitmq-8.18.0.tgz
  targetNamespace: candio-helm-controller-issue
  valuesContent: |-

To release the broken HelmChart resource in the wrong namespace you can manually patch the finalizers on the HelmChart resource. This will release the namespace:

  finalizers:
    - wrangler.cattle.io/helm-controller

to:

  finalizers: []

@mcandio
Copy link
Author

mcandio commented Mar 11, 2022

@cubinet-code Thank you so much, I really appreciate your answer, this solves a lot of things in our company.

@toms-place
Copy link

What needs to be done, so we can deploy a Kind HelmChart in other namespaces than kube-system?
Not every User is able to access this namespace..

@piotrminkina
Copy link
Contributor

piotrminkina commented Dec 1, 2023

I have the same problem. We implement HelmChart to the same namespace on which we implement Helm Charts. This makes it easier to deploy using the GitOps method with Anthos Config Manager.

Does it say somewhere in the documentation that you have to apply HelmChart to kube-system? I don't remember that.

There is a script for fast patching:

namespace=hanging-namespace
kubectl --namespace="$namespace" get helmchart --output=name \
	| xargs -rI{} kubectl patch --namespace="$namespace" {} --type=merge --patch='{"metadata": {"finalizers": []}}'

@brandond
Copy link
Contributor

brandond commented Dec 1, 2023

There is no requirement that you deploy charts to any particular namespace. I'm not sure why the finalizer would be getting stuck when deploying to different namespaces.

@piotrminkina
Copy link
Contributor

piotrminkina commented Dec 4, 2023

@brandond I checked what the issue might be. I created a namespace hec-local where Helm Controller in version 0.15.4 is installed. There are no problems in the logs:

[helm-controller] time="2023-12-04T09:08:36Z" level=info msg="Applying CRD helmcharts.helm.cattle.io"
[helm-controller] time="2023-12-04T09:08:36Z" level=info msg="Applying CRD helmchartconfigs.helm.cattle.io"
[helm-controller] I1204 09:08:36.463283       1 controllers.go:93] Starting helm controller with 3 threads
[helm-controller] I1204 09:08:36.463336       1 controllers.go:98] Starting helm controller in namespace hec-stubs
[helm-controller] I1204 09:08:36.463412       1 leaderelection.go:248] attempting to acquire leader lease hec-stubs/helm-controller-lock...
[helm-controller] I1204 09:08:36.534016       1 leaderelection.go:258] successfully acquired lease hec-stubs/helm-controller-lock
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting /v1, Kind=Secret controller"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting helm.cattle.io/v1, Kind=HelmChartConfig controller"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting batch/v1, Kind=Job controller"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting helm.cattle.io/v1, Kind=HelmChart controller"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"
[helm-controller] I1204 09:08:37.030812       1 controllers.go:105] All controllers have been started
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"

Then I started namespace hec-stubs, in which I implemented HelmChart podinfo. Here also in the logs clean.

[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10898\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"
[helm-controller] time="2023-12-04T09:08:37Z" level=error msg="error syncing 'hec-stubs/podinfo': handler dep-hec-helm-controller-chart-registration: helmcharts.helm.cattle.io \"podinfo\" not found, requeuing"
[helm-controller] time="2023-12-04T09:08:37Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10898\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"
[helm-controller] time="2023-12-04T09:08:38Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10909\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"
[helm-controller] time="2023-12-04T09:08:38Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10909\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"
[helm-controller] time="2023-12-04T09:08:42Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10909\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"
[helm-controller] time="2023-12-04T09:08:42Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"fe95d22a-343b-4068-b0ee-d0eefb71d177\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"10909\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job hec-stubs/helm-install-podinfo"

I then wanted to delete the entire namespace hec-stubs with the kubectl delete ns hec-stubs command, but you can see there is a problem with that. In the logs quite a few errors appeared:

[helm-controller] time="2023-12-04T09:09:56Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: failed to create hec-stubs/chart-values-podinfo /v1, Kind=Secret for helm-chart-registration hec-stubs/podinfo: secrets \"chart-values-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/chart-content-podinfo /v1, Kind=ConfigMap for helm-chart-registration hec-stubs/podinfo: configmaps \"chart-content-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-podinfo /v1, Kind=ServiceAccount for helm-chart-registration hec-stubs/podinfo: serviceaccounts \"helm-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-delete-podinfo batch/v1, Kind=Job for helm-chart-registration hec-stubs/podinfo: jobs.batch \"helm-delete-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, requeuing"
[... above 4 times ...]
[helm-controller] E1204 09:09:57.139671       1 leaderelection.go:334] error initially creating leader election record: configmaps "helm-controller-lock" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated
[helm-controller] time="2023-12-04T09:09:57Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: failed to create hec-stubs/chart-values-podinfo /v1, Kind=Secret for helm-chart-registration hec-stubs/podinfo: secrets \"chart-values-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/chart-content-podinfo /v1, Kind=ConfigMap for helm-chart-registration hec-stubs/podinfo: configmaps \"chart-content-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-podinfo /v1, Kind=ServiceAccount for helm-chart-registration hec-stubs/podinfo: serviceaccounts \"helm-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-delete-podinfo batch/v1, Kind=Job for helm-chart-registration hec-stubs/podinfo: jobs.batch \"helm-delete-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, requeuing"
[... above 4 times ...]
[helm-controller] E1204 09:09:59.166354       1 leaderelection.go:334] error initially creating leader election record: configmaps "helm-controller-lock" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated
[... above 3 times ...]
[helm-controller] time="2023-12-04T09:10:03Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: failed to create hec-stubs/chart-values-podinfo /v1, Kind=Secret for helm-chart-registration hec-stubs/podinfo: secrets \"chart-values-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/chart-content-podinfo /v1, Kind=ConfigMap for helm-chart-registration hec-stubs/podinfo: configmaps \"chart-content-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-podinfo /v1, Kind=ServiceAccount for helm-chart-registration hec-stubs/podinfo: serviceaccounts \"helm-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-delete-podinfo batch/v1, Kind=Job for helm-chart-registration hec-stubs/podinfo: jobs.batch \"helm-delete-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, requeuing"
[helm-controller] E1204 09:10:05.167275       1 leaderelection.go:334] error initially creating leader election record: configmaps "helm-controller-lock" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated
[... above 5 times ...]
[helm-controller] time="2023-12-04T09:10:14Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: failed to create hec-stubs/chart-values-podinfo /v1, Kind=Secret for helm-chart-registration hec-stubs/podinfo: secrets \"chart-values-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/chart-content-podinfo /v1, Kind=ConfigMap for helm-chart-registration hec-stubs/podinfo: configmaps \"chart-content-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-podinfo /v1, Kind=ServiceAccount for helm-chart-registration hec-stubs/podinfo: serviceaccounts \"helm-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, failed to create hec-stubs/helm-delete-podinfo batch/v1, Kind=Job for helm-chart-registration hec-stubs/podinfo: jobs.batch \"helm-delete-podinfo\" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated, requeuing"
[helm-controller] E1204 09:10:15.162280       1 leaderelection.go:334] error initially creating leader election record: configmaps "helm-controller-lock" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated
[... above 6 times ...]
[helm-controller] I1204 09:10:27.118320       1 leaderelection.go:283] failed to renew lease hec-stubs/helm-controller-lock: timed out waiting for the condition
[helm-controller] E1204 09:10:27.118405       1 leaderelection.go:306] Failed to release lock: resource name may not be empty
[helm-controller] time="2023-12-04T09:10:27Z" level=fatal msg="leaderelection lost for helm-controller-lock"
INFO[0118] Streaming logs from pod: dep-hec-helm-controller-9998876f6-nxx4c container: helm-controller  subtask=-1 task=DevLoop
[helm-controller] time="2023-12-04T09:10:28Z" level=info msg="Applying CRD helmcharts.helm.cattle.io"
[helm-controller] time="2023-12-04T09:10:28Z" level=info msg="Applying CRD helmchartconfigs.helm.cattle.io"
[helm-controller] I1204 09:10:28.258992       1 controllers.go:93] Starting helm controller with 3 threads
[helm-controller] I1204 09:10:28.259005       1 controllers.go:98] Starting helm controller in namespace hec-stubs
[helm-controller] I1204 09:10:28.259020       1 leaderelection.go:248] attempting to acquire leader lease hec-stubs/helm-controller-lock...
[helm-controller] E1204 09:10:28.262808       1 leaderelection.go:334] error initially creating leader election record: configmaps "helm-controller-lock" is forbidden: unable to create new content in namespace hec-stubs because it is being terminated
[... further just repeated the above error message, and so on endlessly ...]

I applied the patch from my previous post, then the namespace was removed.

For further testing I reinstalled namespace hec-stubs, then tried again to remove namespace, but this time just before removing namespace I ran a command that first removed the HelmChart resource, i.e. kubectl delete helmchart podinfo, which produced the following logs:

[helm-controller] time="2023-12-04T09:18:45Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: waiting for delete of helm chart for hec-stubs/podinfo by helm-delete-podinfo, requeuing"
[helm-controller] time="2023-12-04T09:18:49Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"c6735340-a323-4626-baff-ebeede84edc6\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"11690\", FieldPath:\"\"}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job hec-stubs/helm-delete-podinfo, removing resources"
[helm-controller] time="2023-12-04T09:18:52Z" level=error msg="error syncing 'hec-stubs/podinfo': handler on-helm-chart-remove: waiting for delete of helm chart for hec-stubs/podinfo by helm-delete-podinfo, requeuing"
[helm-controller] time="2023-12-04T09:18:55Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"hec-stubs\", Name:\"podinfo\", UID:\"c6735340-a323-4626-baff-ebeede84edc6\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"11723\", FieldPath:\"\"}): type: 'Normal' reason: 'RemoveJob' Uninstalled HelmChart using Job hec-stubs/helm-delete-podinfo, removing resources"

And finally, I removed the namespace with the kubectl delete ns hec-stubs command, which removed the namespace, but produced the following logs in the Helm Controller logs:

[helm-controller] E1204 09:19:50.749821       1 leaderelection.go:334] error initially creating leader election record: namespaces "hec-stubs" not found
[... further just repeated the above error message ...]
[helm-controller] I1204 09:20:20.689513       1 leaderelection.go:283] failed to renew lease hec-stubs/helm-controller-lock: timed out waiting for the condition
[helm-controller] E1204 09:20:20.689995       1 leaderelection.go:306] Failed to release lock: resource name may not be empty
[helm-controller] time="2023-12-04T09:20:20Z" level=fatal msg="leaderelection lost for helm-controller-lock"
INFO[0290] Streaming logs from pod: dep-hec-helm-controller-857679455b-nbzfk container: helm-controller  subtask=-1 task=DevLoop
[helm-controller] time="2023-12-04T09:20:21Z" level=info msg="Applying CRD helmcharts.helm.cattle.io"
[helm-controller] time="2023-12-04T09:20:21Z" level=info msg="Applying CRD helmchartconfigs.helm.cattle.io"
[helm-controller] I1204 09:20:21.658102       1 controllers.go:93] Starting helm controller with 3 threads
[helm-controller] I1204 09:20:21.658143       1 controllers.go:98] Starting helm controller in namespace hec-stubs
[helm-controller] I1204 09:20:21.658199       1 leaderelection.go:248] attempting to acquire leader lease hec-stubs/helm-controller-lock...
[helm-controller] E1204 09:20:21.778432       1 leaderelection.go:334] error initially creating leader election record: namespaces "hec-stubs" not found
[... further just repeated the above error message, and so on endlessly ...]

It seems that before deleting namespace you must first delete HelmChart resources, although despite this order Helm Controller still doesn't quite get it right.

@jan-hudec
Copy link

jan-hudec commented Feb 23, 2024

There is no requirement that you deploy charts to any particular namespace. I'm not sure why the finalizer would be getting stuck when deploying to different namespaces.

The problem is not having the HelmChart in another namespace, the problem is deleting HelmChart resources implicitly by deleting the namespace that contains them. In that case the controller tries to uninstall the helm release, but it can't create the job (or something else it needs; I didn't check the exact sequence of events) to do it, because the namespace is in Terminating state and nothing can be created in it.

In the common case not uninstalling with helm won't matter, because the resources will also be deleted with the namespace, but in case helm installed some non-namespaced resources or installed resources explicitly in other namespaces, they will remain, so this is not ideal.

Unfortunately helm does not force annotating the resources it creates in any way, so they can't be deleted without running helm. So simply ignoring the error if the namespace is in Terminating state is the only thing the controller could do to avoid this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants