kube: handle net/http rewind body GOAWAY error#66605
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 079279cb2d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| isHTTP2RetryErr := strings.Contains(errString, `http2: Transport: cannot retry err`) && | ||
| strings.HasSuffix(errString, `after Request.Body was written; define Request.GetBody to avoid this error`) | ||
| isHTTP1RewindErr := strings.Contains(errString, `net/http: cannot rewind body after connection loss`) | ||
| if isHTTP2RetryErr || isHTTP1RewindErr { |
There was a problem hiding this comment.
We can fix this problem at least for create operations because we read the object payload into memory to ensure the user can actually see the created object.
teleport/lib/kube/proxy/url.go
Lines 261 to 303 in 5962850
For these, we can handle the GetBody.
For patch/update, we can do the same but people didn't like
There was a problem hiding this comment.
I've added handling for create requests in 1bf7429
Summary
Follow-up to #61142, which translated GOAWAY-related errors from the upstream Kubernetes API server into
429 Too Many Requestsresponses so kube clients retry. That fix matched the error string emitted bygolang.org/x/net/http2. After it shipped, #65611 reported that v17.7.21 still leaks a different error string from the same root cause:This PR extends the detection in
formatStatusResponseErrorto also match that string and produce the same 429 response.Why two strings, same cause
When an HTTP/2 GOAWAY interrupts an in-flight request whose body has already been read, the Go transport may attempt to retry on a new connection. Without
Request.GetBody, the retry can't replay the body. Two layers can hit this dead end:golang.org/x/net/http2returnshttp2: Transport: cannot retry err [...] after Request.Body was written; define Request.GetBody to avoid this error— already handled.net/httpreturnsnet/http: cannot rewind body after connection lossfromerrCannotRewindwhen the http1 retry path fires after the http2 conn pool is drained. This happens in concurrent-request scenarios (e.g. helmwave) and is what this PR adds.The kube agent intentionally does not buffer the request body, so
GetBodyis unset; previous PRs (#57881, #60695) explored buffering and rejected it. Detection by literal-string match continues to be the only available mechanism — both stdlib sentinels are unexported.Tests
Three tests now cover GOAWAY handling, complementary in scope:
TestGOAWAYHandling(existing): real*http2.Transportagainst the fakegoawayServer. Single request, error A end-to-end.TestKubeForwarder_GOAWAYErrors(new): stubhttp.RoundTripperplugged intostaticKubeCreds.transport, deterministic table over both error strings, exercises the full reverse-proxy error pipeline (formatForwardResponseError->formatStatusResponseError-> response writer).TestGOAWAYHandling_Concurrent(new): productionnewH2Transport(net/http.Transportupgraded withhttp2.ConfigureTransport) against the fakegoawayServer. Each invocation fires 50 concurrent requests with a 64KB body to provoke the GOAWAY race. The assertion is narrow: the rewind-body string must never reach a client — other unrelated 500s (broken pipe, force-closed conn, etc.) are tolerated. Without this PR's change the test fails reliably; with it, the test stays green across 100 sequential runs under-count=100 -race -shuffle on(matches the flaky-test detector's CI configuration).Test plan
go test ./lib/kube/proxy/ -run 'TestKubeForwarder_GOAWAYErrors|TestGOAWAYHandling$|TestGOAWAYHandling_Concurrent' -count=100 -race -shuffle ongo test ./lib/kube/proxy/...(full package)Notes on scope
Concurrent reproduction surfaced other GOAWAY-adjacent transport errors (
broken pipe,connection reset by peer,client connection force closed,ReverseProxy does an invalid Read on closed Body) that this PR does not translate — they're separate bugs the existing fix doesn't address either. TheTestGOAWAYHandling_Concurrentassertion is intentionally narrow: only the rewind-body string is required to stay off the wire.Closes #65611