Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Once pod stops leading it cannot become the leader again #4506

Open
kate-osborn opened this issue Oct 11, 2023 · 2 comments
Open

Once pod stops leading it cannot become the leader again #4506

kate-osborn opened this issue Oct 11, 2023 · 2 comments
Assignees
Labels
backlog Pull requests/issues that are backlog items bug An issue reporting a potential bug
Milestone

Comments

@kate-osborn
Copy link
Contributor

Describe the bug
If the Ingress Controller pod stops being the leader, it cannot become the leader again. This becomes problematic when only one pod is running. Because after it stops being the leader, this means it will not report any statuses. And since only one pod is running, this means no statuses will be reported at all (until the pod is restarted).

To Reproduce
This problem was originally observed with NGINX Gateway Fabric: nginxinc/nginx-gateway-fabric#1100 when the pod lost connectivity with the API server, but can be reproduced on NIC following these steps:

  1. Deploy Ingress Controller with log level 3 and 1 replica

  2. Remove permissions to leases by editing the Ingress Controller clusterrole and removing the following section:

    - apiGroups:
      - coordination.k8s.io
      resources:
      - leases
      verbs:
      - get
      - list
      - watch
      - update
      - create
    

    This forces an API server error when Ingress Controller tries to renew its lease.

  3. Check the Ingress Controller logs and grep for "leader"

    I1010 22:34:52.856615       1 leaderelection.go:250] attempting to acquire leader lease nginx-ingress/my-release-nginx-ingress-leader-election...
    I1010 22:34:52.863761       1 leaderelection.go:260] successfully acquired lease nginx-ingress/my-release-nginx-ingress-leader-election
    I1010 22:34:52.864055       1 leader.go:59] started leading
    I1010 22:34:52.864083       1 leader.go:63] Updating status for 0 Ingresses
    I1010 22:34:52.864088       1 leader.go:72] updating VirtualServer and VirtualServerRoutes status
    E1010 22:37:08.053994       1 leaderelection.go:332] error retrieving resource lock nginx-ingress/my-release-nginx-ingress-leader-election: leases.coordination.k8s.io "my-release-nginx-ingress-leader-election" is forbidden: User "system:serviceaccount:nginx-ingress:my-release-nginx-ingress" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "nginx-ingress"
    E1010 22:37:15.555731       1 leaderelection.go:332] error retrieving resource lock nginx-ingress/my-release-nginx-ingress-leader-election: leases.coordination.k8s.io "my-release-nginx-ingress-leader-election" is forbidden: User "system:serviceaccount:nginx-ingress:my-release-nginx-ingress" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "nginx-ingress"
    I1010 22:37:23.031760       1 leaderelection.go:285] failed to renew lease nginx-ingress/my-release-nginx-ingress-leader-election: timed out waiting for the condition
    I1010 22:37:23.031886       1 leader.go:96] stopped leading
    
  4. Once you see the "stopped leading" message, restore the permissions to leases by editing the Ingress Controller's clusterrole again and adding it back in the snippet in step 2.

  5. Check the logs again and see that the Pod never starts leading again.

Expected behavior
The Ingress Controller Pod can become the leader again after the leader lease is lost.

Your environment

  • Version of the Ingress Controller: NGINX Ingress Controller Version=3.3.0 Commit=f255b03122e9a1e556c227172086b854cff6e4c3 Date=2023-09-26T18:32:05Z DirtyState=false Arch=linux/amd64 Go=go1.21.1
  • Version of Kubernetes: 1.28.0
  • Kubernetes platform (e.g. Mini-kube or GCP): kind
  • Using NGINX or NGINX Plus: NGINX nginx/1.25.2
@github-actions
Copy link

Hi @kate-osborn thanks for reporting!

Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this 🙂

Cheers!

@vepatel
Copy link
Contributor

vepatel commented Oct 23, 2023

@vepatel vepatel added bug An issue reporting a potential bug backlog Pull requests/issues that are backlog items labels Oct 23, 2023
@shaun-nx shaun-nx self-assigned this Nov 9, 2023
@shaun-nx shaun-nx added this to the v3.9.0 milestone Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Pull requests/issues that are backlog items bug An issue reporting a potential bug
Projects
Status: Prioritized backlog
Development

No branches or pull requests

3 participants