Skip to content

Cluster Cache Sync Fails When Managed Namespaces are Deleted Without Label Removal #24709

@tricktron

Description

@tricktron

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

ArgoCD application controller enters an infinite sync failure loop when managed namespaces are deleted without first removing the argocd.argoproj.io/managed-by label. The controller crashes during cluster cache synchronization every 10 minutes with 403 Forbidden errors when attempting to list resources in deleted namespaces, requiring manual controller restart to recover.

This occurs because the GitOps Engine's cluster cache sync process (pkg/cache/cluster.go) iterates through the configured namespace list but has no mechanism to detect when those namespaces have been deleted externally. The sync fails at the Kubernetes API level when trying to list resources in non-existent namespaces.

To Reproduce

  1. Deploy ArgoCD with namespaced configuration managing specific namespaces
  2. Create a test namespace and add the managed-by label:
    kubectl create namespace test-namespace
    kubectl label namespace test-namespace argocd.argoproj.io/managed-by=argocd
  3. Deploy an application to the namespace (optional, but makes issue more visible)
  4. Delete the namespace WITHOUT removing the label first: kubectl delete namespace test-namespace
  5. Wait for the next cluster cache sync cycle (default: 10 minutes)
  6. Observe application controller logs showing sync failures
  7. Note that applications fail to sync, enter Unkown state and ArgoCD becomes unresponsive

Expected behavior

ArgoCD should gracefully handle deleted namespaces by:

  • Detecting when managed namespaces no longer exist
  • Automatically removing deleted namespaces from its configuration
  • Continuing normal operation with remaining valid namespaces
  • Self-healing without requiring manual intervention

Screenshots

Version

All ArgoCD versions are affected but I specifically encountered the error on

v2.14.7

Logs

time="2025-01-23T09:15:04Z" level=error msg="error synchronizing cache state : failed to sync cluster https://kubernetes.default.svc:443: failed to load initial state of resource apps.Deployment: deployments.apps is forbidden: User \"system:serviceaccount:argocd:argocd-application-controller\" cannot list resource \"deployments\" in API group \"apps\" in the namespace \"test-namespace\"" application=example-app

time="2025-01-23T09:25:04Z" level=error msg="error synchronizing cache state : failed to sync cluster https://kubernetes.default.svc:443: failed to load initial state of resource core.Pod: pods is forbidden: User \"system:serviceaccount:argocd:argocd-application-controller\" cannot list resource \"pods\" in API group \"\" in the namespace \"test-namespace\"" application=example-app

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage/pendingThis issue needs further triage to be correctly classified

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions