Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm Release does not reset itself after any error - shows "reconciliation failed: upgrade retries exhausted revision" #506

Closed
1 task done
scubakiz opened this issue Jul 6, 2022 · 4 comments

Comments

@scubakiz
Copy link

scubakiz commented Jul 6, 2022

Describe the bug

When a HelmRelease has a problem, the problem stays forever, even after it's been fixed at the source.

If there are any issues at all with a HelmRelease, there is no way to recover it without deleting it and then have the reconcile recreate it.

Once the new one is created, it retries the upgrade and sometime succeeds.

Steps to reproduce

Have a helm release with any problem in it. Fix the problem, reconcile the release. It won't try it again until the release is deleted.

Expected behavior

Every reconcile of a HelmRelease should be independent and should work if there are no issues with the chart.

Screenshots and recordings

{"level":"info","ts":"2022-07-06T06:25:39.086Z","logger":"controller.helmrelease","msg":"reconcilation finished in 39.416357ms, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"}
{"level":"error","ts":"2022-07-06T06:25:39.086Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"upgrade retries exhausted"}
{"level":"info","ts":"2022-07-06T06:25:45.166Z","logger":"controller.helmrelease","msg":"reconcilation finished in 78.830679ms, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"}
{"level":"error","ts":"2022-07-06T06:25:45.166Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"upgrade retries exhausted"}
{"level":"info","ts":"2022-07-06T06:25:57.279Z","logger":"controller.helmrelease","msg":"reconcilation finished in 111.76402ms, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"}
{"level":"error","ts":"2022-07-06T06:25:57.279Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"upgrade retries exhausted"}
{"level":"info","ts":"2022-07-06T06:26:21.324Z","logger":"controller.helmrelease","msg":"reconcilation finished in 44.912545ms, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"}
{"level":"error","ts":"2022-07-06T06:26:21.325Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"upgrade retries exhausted"}
{"level":"info","ts":"2022-07-06T06:27:09.371Z","logger":"controller.helmrelease","msg":"reconcilation finished in 44.789963ms, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"}
{"level":"error","ts":"2022-07-06T06:27:09.371Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"upgrade retries exhausted"}

OS / Distro

N/A

Flux version

v0.31.1

Flux check

► checking prerequisites
✗ flux 0.31.1 <0.31.3 (new version is available, please upgrade)
✔ Kubernetes 1.21.9 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.22.1
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.23.2
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.19.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.26.1
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.24.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.25.5
✔ all checks passed

Git provider

GitHub

Container Registry provider

Azure Container Registry

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@MarkLFT
Copy link

MarkLFT commented Jul 12, 2022

This is a problem for me also. It would be good to have a simple command that can reset the helm release so it starts the retries again, rather than fail and no way to recover other than removing the helm release.

@scubakiz
Copy link
Author

Upon further testing, I see this problem happen all the time, not just when errors occur. Basically, after the initial release, the HelmRelease goes into a permanent sleep. If you update the source repo a few days later, the HelmRelease never picks it up and applies the changes (even though the GitRepository gets the changes, as documented by the alert it fires).

If you delete the HelmRelease, it's replacement works fine.

In short: HelmRelease goes dormant after its initial run or two and never wakes up.

@stefanprodan stefanprodan transferred this issue from fluxcd/flux2 Jul 13, 2022
@hedwig2013
Copy link

It is a huge problem for me. This issue seems to be a duplicate of #454 .
Instead of deleteing the helmrelease, flux suspend + flux resume the helmrelease workarounds the problem for me.

@hiddeco
Copy link
Member

hiddeco commented Dec 12, 2023

In the v0.37.0 release of the helm-controller, two new annotations were introduced to reset the failure counters to allow the controller to retry according to the configured remediation strategy, and to allow a one-off forced Helm install or upgrade.

You can read more about this in this blog post.

@hiddeco hiddeco closed this as completed Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants