Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One failing HelmRelease seems to block all other releases from being installed/upgraded #351

Closed
ilya-git opened this issue Nov 3, 2021 · 7 comments · Fixed by #409
Closed

Comments

@ilya-git
Copy link

ilya-git commented Nov 3, 2021

  1. Have a HelmRelease "A" that is successfully installed in k8s via flux
  2. Add one new HelmRelease "B" with an error (I had a deployment with an image tag that does not exist)
  3. Flux tries to create a new HelmRelease "B" which it never succeeds as pod is stuck in container creating (since image does not exist). HelmRelease "B" has an event: reconciliation failed: upgrade retries exhausted
  4. Now if we do changes to HelmRelease "A" (e.g. a new image discovered by an image policy), it would be never upgraded, and this event appears: Helm upgrade failed: another operation (install/upgrade/rollback) is in progress

If I understand it correctly a HelmRelease "B" is now effectively blocking any HelmRelease install/update since it can never finishes.

I don't know whether it is a bug or not, but it would be nice to not have one failing helm release blocking the others.

@hiddeco
Copy link
Member

hiddeco commented Nov 3, 2021

Can you try downgrading to v0.11.2 to see if the same behavior happens?

We think the Helm v3.7.x release range may contain some troublesome code originating from the Kubernetes core (through kubectl), causing memory (#345 #349) and locking issues due to the usage of i.a. various globals.

@nickperkins
Copy link

I think this bug might be in helm. The operation stalls for whatever reason and remains in a pending state. Manual requests for helm to do an install/upgrade/rollback will fail as well.

The workaround I have for the moment is to delete the helm release secret that is stuck in pending. This then allows you to reattempt the operation.

@ilya-git
Copy link
Author

I have now updated to the latest flux that should use the latest helm where these problems seems to be resolved. We will run for some time and see if the issue arises again

@ilya-git
Copy link
Author

The same bug already happened, so flux 0.25.2 did not seem to fix the issue unfortunately...

@artem-nefedov
Copy link

artem-nefedov commented Jan 20, 2022

I'm seeing same problem on flux v0.25.3/helm-controller v0.15.0.
Don't know if it's exactly related to other HelmReleases, but the end result it that we have HelmRelease permanently stuck in Helm upgrade failed: another operation (install/upgrade/rollback) is in progress state. After that, it can only be fixed by deleting and re-creating HelmRelease. Even the fact that we have unlimited install retries (-1) doesn't help.

@gladiatr72
Copy link

@ilya-git #149 is worth keeping an eye on.

@stefanprodan
Copy link
Member

This may be related to helm/helm#10486

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants