keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

crisp2u · 2022-03-28T10:13:21Z

Discussed in #2722

^{Originally posted by vkamlesh March 7, 2022}
Keda operator failed to elect leader after keda-operator pod restart. These restarts are not frequent but it's happening in a few days(6 days) time intervals.

KEDA Version: 2.6.1
Git Commit: efca71d

Kubernetes version: v1.20.9
Kubernetes Cluster : AKS


bash-3.2$ k get po -n keda
NAME                                      READY   STATUS    RESTARTS   AGE
keda-metrics-apiserver-649f4ddbbd-v4pjp   1/1     Running   0          12d
keda-operator-68ddbdcc8f-6h767            1/1     Running   3          12d
bash-3.2$ 


bash-3.2$ kubectl get --raw "/apis/coordination.k8s.io/v1/namespaces/keda/leases/operator.keda.sh"
{"kind":"Lease","apiVersion":"coordination.k8s.io/v1","metadata":{"name":"operator.keda.sh","namespace":"keda","uid":"edb18fd7-b95e-463f-81cf-6a1010073409","resourceVersion":"135421212","creationTimestamp":"2022-02-23T14:33:32Z","managedFields":[{"manager":"keda","operation":"Update","apiVersion":"coordination.k8s.io/v1","time":"2022-02-23T14:33:32Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:acquireTime":{},"f:holderIdentity":{},"f:leaseDurationSeconds":{},"f:leaseTransitions":{},"f:renewTime":{}}}}]},"spec":{"holderIdentity":"keda-operator-68ddbdcc8f-6h767_53e840b6-5466-484f-a6f6-16978b7ee12c","leaseDurationSeconds":15,"acquireTime":"2022-03-07T12:07:54.000000Z","renewTime":"2022-03-07T17:36:42.260717Z","leaseTransitions":142}}




bash-3.2$ k logs keda-operator-68ddbdcc8f-6h767 -n keda -f -p


1.6466548397264059e+09	INFO	controller.scaledobject	Reconciling ScaledObject	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "observationsprocessor-func", "namespace": "platform-api"}
E0307 12:07:33.812275       1 leaderelection.go:330] error retrieving resource lock keda/operator.keda.sh: Get "https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/keda/leases/operator.keda.sh": context deadline exceeded
I0307 12:07:33.812329       1 leaderelection.go:283] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
1.6466548538123553e+09	ERROR	setup	problem running manager	{"error": "leader election lost"}`

The text was updated successfully, but these errors were encountered:

zroubalik · 2022-03-29T08:11:50Z

This is most likely a problem in sigs.k8s.io/controller-runtime as it is responsible for leader election. We should investigate.

crisp2u · 2022-03-30T11:48:53Z

I've found this. What puzzles me is that I saw the same error message ("failed to renew lease" ) on the other controllers in the cluster that probably use
controller-runtime but they managed to recover. Maybe the default options are to optimistic in keda ?

zroubalik · 2022-04-01T12:09:49Z

Hard to say, could please try to tweak those settings on your setup?

vkamlesh · 2022-04-19T11:06:53Z

@crisp2u @zroubalik Where exactly do we need to tweak values?

wsugarman · 2022-06-08T21:18:08Z

I'm also seeing this issue, and it's leading to noisy pod restart alerts in our AKS cluster. We are only running 1 replica of the KEDA operator, but as of now we're seeing container restarts ~3-8 times a day thanks to "leader election lost"

leaderelection.go:367] Failed to update lock: Put ".../api/v1/namespaces/keda/configmaps/operator.keda.sh": context deadline exceeded
leaderelection.go:283] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
ERROR setup problem running manager {"error": "leader election lost"}

@zroubalik - Presumably you were talking previously about tweaking the lease-related settings? Perhaps there should be a hook in the helm chart for configuring the leasing options:

keda/main.go

Lines 87 to 95 in dcb9c1e

    
           mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{ 
        
           	Scheme:                 scheme, 
        
           	MetricsBindAddress:     metricsAddr, 
        
           	Port:                   9443, 
        
           	HealthProbeBindAddress: probeAddr, 
        
           	LeaderElection:         enableLeaderElection, 
        
           	LeaderElectionID:       "operator.keda.sh", 
        
           	Namespace:              namespace, 
        
           })

stale · 2022-08-07T23:12:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2022-08-14T23:40:37Z

This issue has been automatically closed due to inactivity.

tomkerkhove moved this to Proposed in Roadmap - KEDA Core Mar 28, 2022

tomkerkhove added this to Roadmap - KEDA Core Mar 28, 2022

ukclivecox mentioned this issue Jul 6, 2022

Seldon core operator is restarting due to failed renewal of lease SeldonIO/seldon-core#4147

Closed

This was referenced Jul 27, 2022

Add Leader Election Lease Options #3430

Merged

Add Leader Election Lease Options kedacore/charts#291

Closed

Document New Environment Variables for Configuring Leader Election kedacore/keda-docs#842

Merged

stale bot added the stale All issues that are marked as stale due to inactivity label Aug 7, 2022

stale bot closed this as completed Aug 14, 2022

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Aug 14, 2022

JorTurFer moved this from Ready To Ship to Done in Roadmap - KEDA Core Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

crisp2u commented Mar 28, 2022

zroubalik commented Mar 29, 2022

crisp2u commented Mar 30, 2022

zroubalik commented Apr 1, 2022

vkamlesh commented Apr 19, 2022

wsugarman commented Jun 8, 2022 •

edited

Loading

stale bot commented Aug 7, 2022

stale bot commented Aug 14, 2022

keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

Comments

crisp2u commented Mar 28, 2022

Discussed in #2722

zroubalik commented Mar 29, 2022

crisp2u commented Mar 30, 2022

zroubalik commented Apr 1, 2022

vkamlesh commented Apr 19, 2022

wsugarman commented Jun 8, 2022 • edited Loading

stale bot commented Aug 7, 2022

stale bot commented Aug 14, 2022

wsugarman commented Jun 8, 2022 •

edited

Loading