-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: TestRequestLimit/h2 becomes significantly more expensive and slower after x/[email protected] #66668
Comments
The root cause of this is an unfortunate side-effect of the fix for #65051. It's arguably a bug in that fix. When the HTTP/2 server rejects a connection due to a bad request, it sends a GOAWAY frame and closes the connection. The GOAWAY includes the ID of the most recent request processed by the server. The problem is that the GOAWAY sent by the server in this case includes the ID of the previous request, not the one which caused us to close the connection. When the HTTP/2 client gets a GOAWAY with an error for a connection where it has sent only one request, it should assume that there's something wrong with that request and not retry it, even if the GOAWAY doesn't indicate that this request was seen by the server. It does not. This is #60636. The interaction of these two behaviors means that in One half of the fix will be to make the GOAWAY sent when rejecting a request correctly indicate that the server saw that request. The other will be to finally fix #60636, if I can figure out a good way to do so. (It's tricky, because the retry behavior is spread between |
Change https://go.dev/cl/576756 mentions this issue: |
Thank you for the RCA and the potential fix @neil, I really appreciate the detailed write-up! |
Hey folks, thank you for tracking this. I'm trying to understand the process better here, and apologies if I've missed reading about this somewhere, would the bug fix in https://go.dev/cl/576756 warrant another patch release for it? Or would this then be included whenever the next one happens to be planned in the future? Thanks! |
When closing a connection because a stream contained a request we didn't like (for example, because the request headers exceed the maximum we will accept), set the LastStreamID in the GOAWAY frame to include the offending stream. This informs the client that retrying the request is unlikely to succeed, and avoids retry loops. This change requires passing the stream ID of the offending stream from Framer.ReadFrame up to the caller. The most sensible way to do this would probably be in the error. However, ReadFrame currently returns a defined error type for connection-ending errors (ConnectionError), and that type is a uint32 with no place to put the stream ID. Rather than changing the returned errors, ReadFrame now returns an error along with a non-nil Frame containing the stream ID, when a stream is responsible for a connection-ending error. For golang/go#66668 Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4 Reviewed-on: https://go-review.googlesource.com/c/net/+/576756 Reviewed-by: Jonathan Amsterdam <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/576895 mentions this issue: |
@gopherbot please open backport issues. This is a bug/deficiency in a previous cherrypick, so we should backport the fix as well. |
Backport issue(s) opened: #66697 (for 1.21), #66698 (for 1.22). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
@MadhavJivrajani I don't believe this warrants an off-schedule minor release, but I do think we should backport https://go.dev/cl/576756 into the next minor releases. |
Change https://go.dev/cl/576679 mentions this issue: |
When a server sends a GOAWAY frame, it indicates the ID of the last stream it processed. We use this to mark any requests after that stream as being safe to retry on a new connection. Change this to not retry the first request on a connection if we get a GOAWAY with an error, even if the GOAWAY has a stream ID of 0 indicating that it didn't process that request. If we're getting an error as the first result on a new connection, then there's either something wrong with the server or something wrong with our request; either way, retrying isn't likely to be productive and may be unsafe. This matches the behavior of the HTTP/1 client, which also avoids retrying the first request on a new connection. For golang/go#66668 Fixes golang/go#60636 Change-Id: I90ea7cfce2974dd413f7cd8b78541678850376a5 Reviewed-on: https://go-review.googlesource.com/c/net/+/576895 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Jonathan Amsterdam <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
Pull in CL 576895: ec05fdcd http2: don't retry the first request on a connection on GOAWAY error For #66668. Fixes #60636. Change-Id: I9903607e3d432a5db0325da82eb7f4b378fbddde Reviewed-on: https://go-review.googlesource.com/c/go/+/576976 Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/578336 mentions this issue: |
Change https://go.dev/cl/578338 mentions this issue: |
…tream-caused GOAWAY When closing a connection because a stream contained a request we didn't like (for example, because the request headers exceed the maximum we will accept), set the LastStreamID in the GOAWAY frame to include the offending stream. This informs the client that retrying the request is unlikely to succeed, and avoids retry loops. This change requires passing the stream ID of the offending stream from Framer.ReadFrame up to the caller. The most sensible way to do this would probably be in the error. However, ReadFrame currently returns a defined error type for connection-ending errors (ConnectionError), and that type is a uint32 with no place to put the stream ID. Rather than changing the returned errors, ReadFrame now returns an error along with a non-nil Frame containing the stream ID, when a stream is responsible for a connection-ending error. Merge conflicts were avoided by cherry-picking CL 576235 (test deflake) prior to this, and then by squashing CL 576175 (typo fix) into this CL. For golang/go#66668. For golang/go#66697. Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4 Reviewed-on: https://go-review.googlesource.com/c/net/+/576756 Reviewed-by: Jonathan Amsterdam <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/net/+/578336 Reviewed-by: Damien Neil <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Dmitri Shuralyov <[email protected]>
…tream-caused GOAWAY When closing a connection because a stream contained a request we didn't like (for example, because the request headers exceed the maximum we will accept), set the LastStreamID in the GOAWAY frame to include the offending stream. This informs the client that retrying the request is unlikely to succeed, and avoids retry loops. This change requires passing the stream ID of the offending stream from Framer.ReadFrame up to the caller. The most sensible way to do this would probably be in the error. However, ReadFrame currently returns a defined error type for connection-ending errors (ConnectionError), and that type is a uint32 with no place to put the stream ID. Rather than changing the returned errors, ReadFrame now returns an error along with a non-nil Frame containing the stream ID, when a stream is responsible for a connection-ending error. Merge conflicts were avoided by cherry-picking CL 576235 (test deflake) prior to this, and then by squashing CL 576175 (typo fix) into this CL. For golang/go#66668. For golang/go#66698. Change-Id: Iba07ccbd70ab4939aa56903605474d01703ac6e4 Reviewed-on: https://go-review.googlesource.com/c/net/+/576756 Reviewed-by: Jonathan Amsterdam <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/net/+/578338 Reviewed-by: Than McIntosh <[email protected]> Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://go.dev/cl/578357 mentions this issue: |
Change https://go.dev/cl/578358 mentions this issue: |
Pull in CL 578338: db050b07 http2: send correct LastStreamID in stream-caused GOAWAY For #66668. Fixes #66698. Change-Id: Ie7cbc44cd559eb8bc34f6c4ad4ead678ec2f55ef Reviewed-on: https://go-review.googlesource.com/c/go/+/578358 Reviewed-by: Carlos Amedee <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Pull in CL 578336: ef58d90f http2: send correct LastStreamID in stream-caused GOAWAY For #66668. Fixes #66697. Change-Id: I91fc8a67f21fadcb1801ff29d5e2b0453db89617 Reviewed-on: https://go-review.googlesource.com/c/go/+/578357 Reviewed-by: Carlos Amedee <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Found new dashboard test flakes for:
2024-04-03 15:10 go1.21-linux-386 release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
2024-04-03 15:10 go1.21-linux-amd64-boringcrypto release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
2024-04-03 15:10 go1.21-linux-amd64-clang15 release-branch.go1.21@ae591334 net/http.TestRequestLimit/h1 [ABORT] (log)
2024-04-03 15:10 go1.21-linux-amd64-noopt release-branch.go1.21@ae591334 net/http.TestRequestLimit/h2 [ABORT] (log)
2024-04-03 15:10 go1.22-linux-386-softfloat release-branch.go1.22@e55d7cf8 net/http.TestRequestLimit/h2 [ABORT] (log)
2024-04-03 15:35 go1.21-linux-amd64-boringcrypto release-branch.go1.21@d8392e69 net/http.TestRequestLimit/h1 [ABORT] (log)
2024-04-03 15:35 go1.21-linux-amd64-goamd64v3 release-branch.go1.21@d8392e69 net/http.TestRequestLimit/h1 [ABORT] (log)
2024-04-03 15:35 go1.22-linux-386-softfloat release-branch.go1.22@dddf0ae4 net/http.TestRequestLimit/h2 [ABORT] (log)
2024-04-03 18:00 go1.22-linux-386-softfloat release-branch.go1.22@a65a2bbd net/http.TestRequestLimit/h2 [ABORT] (log)
|
I've initially observed this on various builders, most often linux-386 ones, but it seems to reproduce on
darwin/arm64
too.At tip without x/[email protected] (e.g., commit 61a3ee5),
TestRequestLimit/h2
consistently passes in under a second:When x/[email protected] is pulled in (e.g., CL 576295), it takes anywhere between 15 seconds and 200+ seconds:
I'm seeing around 125% CPU usage while this one test runs. This seems to cause it to exceed the 180 second timeout on some builders, especially when running all other tests and not just this one.
CC @neild.
The text was updated successfully, but these errors were encountered: