Skip to content

Kubernetes: Handle GOAWAY requests#61142

Merged
rosstimothy merged 1 commit intomasterfrom
tross/kube_goaway
Nov 11, 2025
Merged

Kubernetes: Handle GOAWAY requests#61142
rosstimothy merged 1 commit intomasterfrom
tross/kube_goaway

Conversation

@rosstimothy
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy commented Nov 7, 2025

This is an attempt to address #57766.

When a request is terminated because the upstream Kubernetes API Server GOAWAY chance is exceeded, clients are informed to retry by replying with a 429 status code and a Retry-After header.

This deviates from the approaches taken in #57881 and #60695 to favor simplicity and avoid buffering request data in a teleport process. The downside to this approach is that it requires clients to properly handle retry requests. Since we cannot guarantee that every Kubernetes client used by customers will properly retry a request I've opted not close the linked issue as a result of this change. Instead we'll wait for feedback from customers that have been experiencing this issue to see if this truly resolves the problem for them. At which time I'll circle back and close the issue. If this doesn't remediate the problems, then we can pursue more expensive solutions similar to those taken in the linked PRs.

changelog: GOAWAY errors received from Kubernetes API Servers configured with a non-zero --goaway-chance are now forward to clients to be retried.

Manual Testing

Testing was adapted from #57881 and #60695 on a local Kubernetes cluster and an EKS cluster.

--goaway-chance=0.0 prior to this change

  • manually smoke tested various kubectl commands
  • ran the test scripts from the linked PRs without errors

--goaway-chance=0.0 with this change

  • manually smoke tested various kubectl commands
  • ran the test scripts from the linked PRs without errors

--goaway-chance=0.2 prior to this change

  • manually smoke tested various kubectl commands
  • ran the test scripts from the linked PRs with errors

--goaway-chance=0.2 with this change

  • manually smoke tested various kubectl commands
  • ran the test scripts from the linked PRs without errors

rw.Header().Set("Retry-After", "1")
rw.WriteHeader(http.StatusTooManyRequests)
return
}
Copy link
Copy Markdown
Contributor

@smallinsky smallinsky Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
}
if isHTTP2GoawayError(respErr) {
// When Kubernetes API servers are configured with --goaway-chance they may send
// HTTP/2 GOAWAY frames to distribute load across replicas.
// If a request cannot be automatically retried because the request body was already sent,
// we return HTTP 429 with a Retry-After header to instruct clients to retry.
rw.Header().Set("Retry-After", "1")
rw.WriteHeader(http.StatusTooManyRequests)
return
}

This is an attempt to address #57766.

When a request is terminated because the upstream Kubernetes API
Server GOAWAY chance is exceeded, clients are informed to retry
by replying with a 429 status code and a Retry-After header.

This deviates from the approaches taken in
#57881 and
#60695 to favor
simplicity and avoid buffering request data in a teleport process.
The downside to this approach is that it requires clients to properly
handle retry requests.
@rosstimothy rosstimothy added this pull request to the merge queue Nov 11, 2025
Merged via the queue into master with commit 2a1ea7b Nov 11, 2025
43 checks passed
@rosstimothy rosstimothy deleted the tross/kube_goaway branch November 11, 2025 21:06
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@rosstimothy See the table below for backport results.

Branch Result
branch/v17 Create PR
branch/v18 Create PR

rosstimothy added a commit that referenced this pull request Nov 11, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
rosstimothy added a commit that referenced this pull request Nov 11, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
rosstimothy added a commit that referenced this pull request Nov 14, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
github-merge-queue bot pushed a commit that referenced this pull request Nov 14, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
rosstimothy added a commit that referenced this pull request Nov 17, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
rosstimothy added a commit that referenced this pull request Nov 17, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
rosstimothy added a commit that referenced this pull request Nov 17, 2025
Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
github-merge-queue bot pushed a commit that referenced this pull request Nov 17, 2025
* Kubernetes: Handle GOAWAY requests

This is an attempt to address #57766.

When a request is terminated because the upstream Kubernetes API
Server GOAWAY chance is exceeded, clients are informed to retry
by replying with a 429 status code and a Retry-After header.

This deviates from the approaches taken in
#57881 and
#60695 to favor
simplicity and avoid buffering request data in a teleport process.
The downside to this approach is that it requires clients to properly
handle retry requests.

* Populate GOAWAY response body (#61264)

Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
github-merge-queue bot pushed a commit that referenced this pull request Nov 17, 2025
* Kubernetes: Handle GOAWAY requests

This is an attempt to address #57766.

When a request is terminated because the upstream Kubernetes API
Server GOAWAY chance is exceeded, clients are informed to retry
by replying with a 429 status code and a Retry-After header.

This deviates from the approaches taken in
#57881 and
#60695 to favor
simplicity and avoid buffering request data in a teleport process.
The downside to this approach is that it requires clients to properly
handle retry requests.

* Populate GOAWAY response body (#61264)

Follow up to #61142 which
sets the response body so that clients which only look at the reason and
not the headers will behave appropriately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants