Skip to content

Kubernetes API server rolling restarts experience client-side disruption #5475

@nrb

Description

@nrb

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

An automated test that forces kube API server restarts observed that roughly 30 seconds after the kube api-server returns HTTP 500s from /readyz pre-shutdown, clients are getting ECONNRESET when trying to read responses.

What did you expect to happen:

Clients would not have their connections interrupted.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

The target_health_state.unhealthy.draining_interval_seconds attribute should probably be set to something higher than the default (which is 0) in order to make client interactions more resilient to kube-api-server restarts.

This draining interval is separate from the health check interval seconds we currently set - that one just tells AWS how often to invoke the health check.

See the TargetGroupAttribute docs for specifics.

This was observed during OpenShift testing, but is applicable generally to anyone restarting their KAS.

/area networking
/triage accepted

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/networkingIssues or PRs related to networkingkind/bugCategorizes issue or PR as related to a bug.needs-prioritytriage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions