Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm client: modules may stuck in pending-upgrade periodically #481

Open
miklezzzz opened this issue Jun 19, 2024 · 1 comment
Open

helm client: modules may stuck in pending-upgrade periodically #481

miklezzzz opened this issue Jun 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@miklezzzz
Copy link
Contributor

If a module ends up in pending-upgrade status (for example, because of k8s api server being unreachable due to maintenance or whatsoever), it seems that consequent ModuleRun tasks for the module may fail to apply a new helm release and its changes, leaving the module in inconsistent state.
Expected behavior (what you expected to happen):
Some sort of helm releases recovery could come in handy.

Actual behavior (what actually happened):

Steps to reproduce:

Environment:

  • Addon-operator version: v3.12
  • Helm version: v3.13.2
  • Kubernetes version: v1.26
  • Installation type (kubectl apply, helm chart, etc.):

Anything else we should know?:

Additional information for debugging (if necessary):
According to the logs, the operator somehow managed to get ModuleRun to success but no new helm release was applied
Perhaps, we should start there

Hook script

Logs
```
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Render helm templates for chart '/deckhouse/modules/300-prometheus' was successful","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:34Z"}

{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Running helm upgrade for release 'prometheus' with chart '/deckhouse/modules/300-prometheus' in namespace 'd8-system' ...","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:34Z"}

{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Running helm upgrade for release 'prometheus' with chart '/deckhouse/modules/300-prometheus' in namespace 'd8-system' ...","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:42Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 1. Error: helm upgrade failed: Kubernetes cluster unreachable: Get "https://10.222.0.1:443/version\": dial tcp 10.222.0.1:443: connect: connection refused\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:38:42Z"}
{"level":"error","msg":"Error occurred during the module "prometheus" status update: Get "https://10.222.0.1:443/apis/deckhouse.io/v1alpha1/modules/prometheus\": dial tcp 10.222.0.1:443: connect: connection refused","time":"2024-06-18T16:38:42Z"}

{"level":"error","msg":"Error occurred during the module "deckhouse" status update: Get "https://10.222.0.1:443/apis/deckhouse.io/v1alpha1/modules/deckhouse\": dial tcp 10.222.0.1:443: connect: connection refused","time":"2024-06-18T16:38:45Z"}

{"bindin
{"binding":"beforeHelm","event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","hook":"300-prometheus/hooks/custom_logo.go","hook.type":"module","level":"info","module":"prometheus","msg":"Module hook start prometheus/300-prometheus/hooks/custom_logo.go","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:01Z"}
{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 4. Error: 1 error occurred:\n\t* Delete object v1/ConfigMap/d8-monitoring/whitelabel-custom-logo: Delete "https://10.222.0.1:443/api/v1/namespaces/d8-monitoring/configmaps/whitelabel-custom-logo?timeout=10s\": dial tcp 10.222.0.1:443: connect: connection refused\n\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:01Z"}

{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"error","module":"prometheus","msg":"ModuleRun failed in phase 'CanRunHelm'. Requeue task to retry after delay. Failed count is 5. Error: 1 error occurred:\n\t* Delete object v1/ConfigMap/d8-monitoring/whitelabel-custom-logo: Delete "https://10.222.0.1:443/api/v1/namespaces/d8-monitoring/configmaps/whitelabel-custom-logo?timeout=10s\": dial tcp 10.222.0.1:443: connect: connection refused\n\n","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:10Z"}

{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"Render helm templates for chart '/deckhouse/modules/300-prometheus' was successful","operator.component":"helm3lib","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:29Z"}

{"event.id":"3f54400f-a72b-40fe-b885-c8c65af9d2d9","level":"info","module":"prometheus","msg":"ModuleRun success, module is ready","queue":"main","task.id":"28551cad-d9b6-41ce-b983-815dfb03d66b","time":"2024-06-18T16:39:42Z"}


root@candi-9568643786-1-con-1-25-master-0:~# ./linux-amd64/helm history prometheus --namespace d8-system  
REVISION	UPDATED                 	STATUS         	CHART           	APP VERSION	DESCRIPTION      
1       	Tue Jun 18 16:36:50 2024	deployed       	prometheus-0.1.0	           	Install complete 
2       	Tue Jun 18 16:38:35 2024	pending-upgrade	prometheus-0.1.0	           	Preparing upgrade
root@candi-9568643786-1-con-1-25-master-0:~# 


</pre></details>
@miklezzzz miklezzzz added the bug Something isn't working label Jun 19, 2024
@yalosev
Copy link
Contributor

yalosev commented Jun 20, 2024

We have some kind of handling this case
but probably it has some mistakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants