-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Checklist:
- I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I've included steps to reproduce the bug.
- I've pasted the output of
argocd version
.
Describe the bug
ArgoCD application controller enters an infinite sync failure loop when managed namespaces are deleted without first removing the argocd.argoproj.io/managed-by
label. The controller crashes during cluster cache synchronization every 10 minutes with 403 Forbidden errors when attempting to list resources in deleted namespaces, requiring manual controller restart to recover.
This occurs because the GitOps Engine's cluster cache sync process (pkg/cache/cluster.go
) iterates through the configured namespace list but has no mechanism to detect when those namespaces have been deleted externally. The sync fails at the Kubernetes API level when trying to list resources in non-existent namespaces.
To Reproduce
- Deploy ArgoCD with namespaced configuration managing specific namespaces
- Create a test namespace and add the managed-by label:
kubectl create namespace test-namespace kubectl label namespace test-namespace argocd.argoproj.io/managed-by=argocd
- Deploy an application to the namespace (optional, but makes issue more visible)
- Delete the namespace WITHOUT removing the label first:
kubectl delete namespace test-namespace
- Wait for the next cluster cache sync cycle (default: 10 minutes)
- Observe application controller logs showing sync failures
- Note that applications fail to sync, enter
Unkown
state and ArgoCD becomes unresponsive
Expected behavior
ArgoCD should gracefully handle deleted namespaces by:
- Detecting when managed namespaces no longer exist
- Automatically removing deleted namespaces from its configuration
- Continuing normal operation with remaining valid namespaces
- Self-healing without requiring manual intervention
Screenshots
Version
All ArgoCD versions are affected but I specifically encountered the error on
v2.14.7
Logs
time="2025-01-23T09:15:04Z" level=error msg="error synchronizing cache state : failed to sync cluster https://kubernetes.default.svc:443: failed to load initial state of resource apps.Deployment: deployments.apps is forbidden: User \"system:serviceaccount:argocd:argocd-application-controller\" cannot list resource \"deployments\" in API group \"apps\" in the namespace \"test-namespace\"" application=example-app
time="2025-01-23T09:25:04Z" level=error msg="error synchronizing cache state : failed to sync cluster https://kubernetes.default.svc:443: failed to load initial state of resource core.Pod: pods is forbidden: User \"system:serviceaccount:argocd:argocd-application-controller\" cannot list resource \"pods\" in API group \"\" in the namespace \"test-namespace\"" application=example-app