You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a module ends up in pending-upgrade status (for example, because of k8s api server being unreachable due to maintenance or whatsoever), it seems that consequent ModuleRun tasks for the module may fail to apply a new helm release and its changes, leaving the module in inconsistent state. Expected behavior (what you expected to happen):
Some sort of helm releases recovery could come in handy.
Actual behavior (what actually happened):
Steps to reproduce:
Environment:
Addon-operator version: v3.12
Helm version: v3.13.2
Kubernetes version: v1.26
Installation type (kubectl apply, helm chart, etc.):
Anything else we should know?:
Additional information for debugging (if necessary):
According to the logs, the operator somehow managed to get ModuleRun to success but no new helm release was applied
Perhaps, we should start there
Hook script
Logs
```
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Render helm templates for chart '/deckhouse/modules/300-prometheus' was successful","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:34Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Running helm upgrade for release 'prometheus' with chart '/deckhouse/modules/300-prometheus' in namespace 'd8-system' ...","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:34Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Running helm upgrade for release 'prometheus' with chart '/deckhouse/modules/300-prometheus' in namespace 'd8-system' ...","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:42Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 1. Error: helm upgrade failed: Kubernetes cluster unreachable: Get "https://10.222.0.1:443/version\": dial tcp 10.222.0.1:443: connect: connection refused\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:42Z"}
{"level":"error","msg":"Error occurred during the module "prometheus" status update: Get "https://10.222.0.1:443/apis/deckhouse.io/v1alpha1/modules/prometheus\": dial tcp 10.222.0.1:443: connect: connection refused","time":"2024-06-18T16:38:42Z"}
{"bindin
{"binding":"beforeHelm","event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","hook":"300-prometheus/hooks/custom_logo.go","hook.type":"module","level":"info","module":"prometheus","msg":"Module hook start prometheus/300-prometheus/hooks/custom_logo.go","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:01Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 4. Error: 1 error occurred:\n\t* Delete object v1/ConfigMap/d8-monitoring/whitelabel-custom-logo: Delete "https://10.222.0.1:443/api/v1/namespaces/d8-monitoring/configmaps/whitelabel-custom-logo?timeout=10s\": dial tcp 10.222.0.1:443: connect: connection refused\n\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:01Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 5. Error: 1 error occurred:\n\t* Delete object v1/ConfigMap/d8-monitoring/whitelabel-custom-logo: Delete "https://10.222.0.1:443/api/v1/namespaces/d8-monitoring/configmaps/whitelabel-custom-logo?timeout=10s\": dial tcp 10.222.0.1:443: connect: connection refused\n\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:10Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Render helm templates for chart '/deckhouse/modules/300-prometheus' was successful","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:29Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"ModuleRun success, module is ready","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:42Z"}
root@candi-9568643786-1-con-1-25-master-0:~# ./linux-amd64/helm history prometheus --namespace d8-system
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Tue Jun 18 16:36:50 2024 deployed prometheus-0.1.0 Install complete
2 Tue Jun 18 16:38:35 2024 pending-upgrade prometheus-0.1.0 Preparing upgrade
root@candi-9568643786-1-con-1-25-master-0:~#
</pre></details>
The text was updated successfully, but these errors were encountered:
If a module ends up in pending-upgrade status (for example, because of k8s api server being unreachable due to maintenance or whatsoever), it seems that consequent ModuleRun tasks for the module may fail to apply a new helm release and its changes, leaving the module in inconsistent state.
Expected behavior (what you expected to happen):
Some sort of helm releases recovery could come in handy.
Actual behavior (what actually happened):
Steps to reproduce:
Environment:
Anything else we should know?:
Additional information for debugging (if necessary):
According to the logs, the operator somehow managed to get ModuleRun to success but no new helm release was applied
Perhaps, we should start there
Hook script
Logs
The text was updated successfully, but these errors were encountered: