Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional REDIS_PASSWORD env variable cause race condition and Redis NOAUTH failures #2836

Open
michaelvl opened this issue Jul 19, 2024 · 12 comments · Fixed by #2839
Open

Optional REDIS_PASSWORD env variable cause race condition and Redis NOAUTH failures #2836

michaelvl opened this issue Jul 19, 2024 · 12 comments · Fixed by #2839
Labels
argo-cd bug Something isn't working

Comments

@michaelvl
Copy link
Contributor

Describe the bug

In PR #2800 the REDIS_PASSWORD environment variable was made optional, i.e. several Deployment resource have something along these lines:

...
        env:
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-redis
                optional: true
                key: auth

The problem is if the secret does not exist when Pods start, then these Pods will be unable to authenticate to Redis. The Redis Pod itself does not have this env variable marked as optional, i,e, Redis will not startup until the secret is available.

The chart can generate the secret through a Job or the secret can be created by external means. Both of these can cause the secret not to be available when e.g. repo-server or application-controller Pods start.

Related helm chart

argo-cd

Helm chart version

Any after PR #2800, e.g. 7.3.7

To Reproduce

Install ArgoCD with redisSecretInit.enabled = false, let Pods startup - e.g. repo-server and application-controller will start successfully whereas redis will restart repeatedly with 'config error' due to missing secret. Finally, create secret to let redis start. After this, one can observe 'NOAUTH Authentication required' logs in repo-server and application-controller Pods.

Expected behavior

Pods that must authenticate towards Redis and thus relies on the Redis secret do not start until the secret is available - like the redis Pod itself does now.

Screenshots

No response

Additional context

No response

@michaelvl michaelvl added the bug Something isn't working label Jul 19, 2024
@mkilchhofer
Copy link
Member

mkilchhofer commented Jul 19, 2024

@shlomitubul
Could you please provide some details why you made this PR (it misses a motivation or a linked issue with context)?

Also it would make sense to involve @mbevc1 and @yu-croco as you approved the implementation.
When I reviewed this PR after coming back from my vacation (after PR was merged), I thought that exactly this happens in the use case provided by issue reporter.

@yu-croco
Copy link
Collaborator

yu-croco commented Jul 19, 2024

as you approved the implementation.

Oh,,, I just thought the PR was trying to fix a small mistake (I was the mistake after all🙃 ) since I didn't realize the case in this issue . 🫠

@yu-croco
Copy link
Collaborator

While waiting for @shlomitubul 's reply, I prepared draft PR for this issue. 🙏

@purajit
Copy link

purajit commented Jul 24, 2024

I just came across this after upgrading to v2.11.6 and trying to figure out if I have the same issue you do.

Were you able to verify that this was indeed the issue? I checked the environment of each of the pods, and they had REDIS_PASSWORD. I even recycled the pods and I still have the issue. Does recycling the pods work for you?

My gut tells me this is related to argoproj/argo-cd#15912 - something makes configuration inconsistent between components.

Trying to get this set up in my local cluster so I can iterate more.

EDIT: narrowed it to v2.11.1: argoproj/argo-cd@v2.11.0...v2.11.1
argoproj/argo-cd@f1a449e

@purajit
Copy link

purajit commented Aug 14, 2024

Can this be re-opened? #2839 did not fix this, as suspected.

argocd: v2.12.0+ec30a48
  BuildDate: 2024-08-05T13:46:43Z
  GitCommit: ec30a48bce7a60046836e481cd2160e28c59231d
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.12.0+ec30a48
  BuildDate: 2024-08-05T13:46:43Z
  GitCommit: ec30a48bce7a60046836e481cd2160e28c59231d
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64

@mkilchhofer
Copy link
Member

@purajit What is your issue? Your Argo CD components cannot connect to Redis?

@purajit
Copy link

purajit commented Aug 18, 2024

@mkilchhofer Yup, any path like argo app diff that relies on the cache fails with the NOAUTH error above; I've tried recycling the pods, I've checked their environments, and I don't see anything obviously wrong.

In particular, while basic Argo sync still works, argocd app diff is one functionality that depends on the cache and fails.

@mkilchhofer
Copy link
Member

Ah we are only talking about the CLI functionality? You didn't saw any suspicious log line(s) inside the running Argo CD pods?

@purajit
Copy link

purajit commented Aug 20, 2024

Sorry for the misunderstanding, no, since this is an issue with the overall setup, there are
logs even internally in Argo as well. I was just giving an example of a functionality that
obviously surfaces the issue and completely stops working - argocd app diff is completely
unfunctional right now without trying to turn off Redis auth.

Here are examples of internal logs:

1:S 14 Aug 2024 17:49:49.910 # MASTER aborted replication with an error: NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.910 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.906 # MASTER aborted replication with an error: NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.906 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.

@mkilchhofer
Copy link
Member

Hi @purajit

You shared logs of the redis-ha container or sentinel container, right?

I think the issue is related but a bit different than the one described in the initial post of this issue.

Nevertheless, could you share the values.yaml you use for your Argo CD setup, so we can try to reproduce your issue?

@mkilchhofer mkilchhofer reopened this Aug 20, 2024
@purajit
Copy link

purajit commented Aug 22, 2024

Haven't included applicationSet, notifications, repoServer, etc; the rest is here
(explicitly redacted values instead of removing them so you know all the contents):

server:
  extraArgs:
  - --insecure
  - --enable-gzip
  replicas: <...>
  resources: <...>
  serviceAccount: <...>
redis-ha:
  enabled: true
  exporter:
    enabled: false
  extraLabels:
    app.kubernetes.io/name: argocd-redis-ha-haproxy
  haproxy:
    image:
      repository: public.ecr.aws/docker/library/haproxy
  image:
    repository: public.ecr.aws/docker/library/redis
configs:
  clusterCredentials: <...>
  cm: 
    dex.config: <...>
    statusbadge.enabled: <...>
    url: <...>
  credentialTemplates: <...>
  params:
    redis.compression: none
  rbac: <...>
  repositories: <...>
  secret:
    githubSecret: <...>

@purajit
Copy link

purajit commented Sep 11, 2024

argoproj/argo-cd#19599 looks promising

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-cd bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants