Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement delayed termination to achieve zero downtime upgrades #1155

Closed
pleshakov opened this issue Oct 17, 2023 · 0 comments · Fixed by #1159
Closed

Implement delayed termination to achieve zero downtime upgrades #1155

pleshakov opened this issue Oct 17, 2023 · 0 comments · Fixed by #1159
Assignees
Labels
enhancement New feature or request refined Requirements are refined and the issue is ready to be implemented. size/small Estimated to be completed within ~2 days
Milestone

Comments

@pleshakov
Copy link
Contributor

pleshakov commented Oct 17, 2023

To address this, a common approach is to introduce delay in pod termination to allow enough time for an external load balancer to drain its upstream servers.

Acceptance criteria:

  • Introduce configurable delayed termination.
  • Rerun zero downtime non-functional tests and update results for 1.0

More info:

@pleshakov pleshakov added enhancement New feature or request and removed enhancement New feature or request labels Oct 17, 2023
@mpstefan mpstefan added the refined Requirements are refined and the issue is ready to be implemented. label Oct 18, 2023
@mpstefan mpstefan added this to the v1.0.0 milestone Oct 18, 2023
@mpstefan mpstefan added size/small Estimated to be completed within ~2 days enhancement New feature or request labels Oct 18, 2023
@pleshakov pleshakov self-assigned this Oct 18, 2023
pleshakov added a commit to pleshakov/nginx-gateway-fabric that referenced this issue Oct 19, 2023
Problem:
During an upgrade of NGF, external clients can experience downtime.

Solution:
- Introduce configurable delayed termination.
  - Add sleep subcommand to gateway binary
  - Add lifecycle paramaters to helm to both nginx-gateway and nginx
    containers.
  - Add terminationGracePeriodSeconds parameter to helm.
  - Add affinity parameter to helm (primary needed for testing to
    prevent pods running on the same node).
- Rerun zero downtime non-functional tests.

Testing:
- Manual testing

SOLVES nginx#1155
pleshakov added a commit that referenced this issue Oct 20, 2023
Problem:
During an upgrade of NGF, external clients can experience downtime.

Solution:
- Introduce configurable delayed termination.
  - Add sleep subcommand to gateway binary
  - Add lifecycle paramaters to helm to both nginx-gateway and nginx
    containers.
  - Add terminationGracePeriodSeconds parameter to helm.
  - Add affinity parameter to helm (primary needed for testing to
    prevent pods running on the same node).
- Rerun zero downtime non-functional tests.

Testing:
- Manual testing

SOLVES #1155

Co-authored-by: Saylor Berman <[email protected]>
miledxz added a commit to miledxz/nginx-gateway-fabric that referenced this issue Jan 14, 2025
…x#1159)

Problem:
During an upgrade of NGF, external clients can experience downtime.

Solution:
- Introduce configurable delayed termination.
  - Add sleep subcommand to gateway binary
  - Add lifecycle paramaters to helm to both nginx-gateway and nginx
    containers.
  - Add terminationGracePeriodSeconds parameter to helm.
  - Add affinity parameter to helm (primary needed for testing to
    prevent pods running on the same node).
- Rerun zero downtime non-functional tests.

Testing:
- Manual testing

SOLVES nginx#1155

Co-authored-by: Saylor Berman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refined Requirements are refined and the issue is ready to be implemented. size/small Estimated to be completed within ~2 days
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants