Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: http.Server.WriteTimeout does not fire if the http2 stream's window is out of space. #49741

Closed
davecheney opened this issue Nov 23, 2021 · 23 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@davecheney
Copy link
Contributor

davecheney commented Nov 23, 2021

What version of Go are you using (go version)?

$ go version 1.17.3

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

What did you do?

package main

import (
	"io"
	"log"
	"net/http"
	"net/http/httptest"
	"strings"
	"time"
)

func handler(w http.ResponseWriter, r *http.Request) {
	data := strings.Repeat("x", 1<<16)
	tick := time.NewTicker(1 * time.Millisecond)
	defer tick.Stop()
	for {
		select {
		case <-tick.C:
			n, err := io.WriteString(w, data)
			log.Printf("wrote %d, err %v", n, err)
			if err != nil {
				return
			}
		case <-r.Context().Done():
			log.Printf("context cancelled")
			return
		}
	}
}

func main() {
	sv := httptest.NewUnstartedServer(http.HandlerFunc(handler))
	sv.EnableHTTP2 = true
	sv.Config.WriteTimeout = 1 * time.Second
	sv.StartTLS()

	resp, err := sv.Client().Get(sv.URL + "/")
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	select {} // block forever
}

This program creates a HTTP2 service which fills the HTTP/2 stream's response window by failing to consume any data from the response body. The write deadline should fire after 1 second and terminate the response. It does not.

What did you expect to see?

If the tick rate is increased to 100ms, the write timeout fires

(~/devel/http2bug) % go run .
2021/11/23 13:03:12 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 65536, err <nil>
2021/11/23 13:03:13 wrote 4096, err http2: stream closed

What did you see instead?

If tick is left at 1 ms, permitting it to fill the stream's response window, handler blocks, no timeout is reported on either the client or the server side.

(~/devel/http2bug) % go run .
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>
2021/11/23 13:03:56 wrote 65536, err <nil>

// output stops

/cc @neild @bradfitz

@davecheney davecheney added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 24, 2021
@davecheney davecheney added this to the Go1.18 milestone Nov 24, 2021
@aojea
Copy link
Contributor

aojea commented Nov 24, 2021

If tick is left at 1 ms, permitting it to fill the stream's response window, handler blocks, no timeout is reported on either the client or the server side.

interestingly, you can increase the tick until 15ms, it always block when it writes 64 blocks. If I divide by 2 the data data := strings.Repeat("x", 1<<15) it blocks when it writes 128 blocks if I tick in less than 7 ms

It seems that something blocks when it has 64x65536 = 4MB, the bytes.Buffer?

const smallBufferSize = 64

@davecheney
Copy link
Contributor Author

davecheney commented Nov 24, 2021

It seems that something blocks when it has 64x65536 = 4MB, the bytes.Buffer?

I believe that is the stream's window, as the client is not reading from the response stream it will not be triggering window updates to refill the window

@aojea
Copy link
Contributor

aojea commented Nov 25, 2021

@davecheney you are right, I've found the problem, is the stream window as you say

I'll send a patch so you can review

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/367154 mentions this issue: http2: RST_STREAM frames write priority

@davecheney
Copy link
Contributor Author

@mknyszek I see you're landing a bunch of HTTP/2 fixes for 1.17.4. Could I ask you to consider this one for inclusion. Thank you.

@davecheney davecheney removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 1, 2021
@mknyszek
Copy link
Contributor

mknyszek commented Dec 1, 2021

@davecheney It might be a little late, we just met today to go over the backport issues to cherry-pick, and generally we only include those whose changes have already landed at tip. I think this should probably wait until the next minor release (that is in a month), but I'll see if @neild has any opinions on this. If it's severe enough it may warrant trying to get it in this one.

@neild
Copy link
Contributor

neild commented Dec 1, 2021

Nice catch for a subtle bug. CL looks good to me, but as this fix is just landing and the bug is long-standing I think it's too late for 1.17.4.

@davecheney
Copy link
Contributor Author

Thanks for considering it. I can stick with x/net/http2 til the next minor ships

@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 2, 2021
@mknyszek mknyszek modified the milestones: Go1.18, Backlog Dec 2, 2021
@mknyszek
Copy link
Contributor

mknyszek commented Dec 2, 2021

(Moving to backlog as part of triage, just because this isn't a new issue in 1.18.)

@mknyszek
Copy link
Contributor

mknyszek commented Dec 2, 2021

(Er, actually it seems like y'all plan to fix this relatively soon, I'm gonna guess in 1.19? Moving to 1.19 milestone.)

@mknyszek mknyszek modified the milestones: Backlog, Go1.19 Dec 2, 2021
@davecheney
Copy link
Contributor Author

davecheney commented Dec 2, 2021

@mknyszek i'd prefer to not have to wait til 1.19 for this to be backported to net/http. Is it possible this could be included in the next x/net/http2 backport into 1.17.6, or at least 1.18.1?

I agree this this is an existing issue that has been broken for a long time, but equally, its been broken for a long time and has quite serious impacts for anyone attempting to expose a Go http/2 service on the open internet. We have pretty good control of the client side on copilot, and still see requests with a 10 second WriteTimeout hang for the better part of an hour because of this issue.

@mknyszek
Copy link
Contributor

mknyszek commented Dec 2, 2021

@davecheney The backport for 1.17 will go in sooner, and we'll file a separate issue for that. It'll definitely go into 1.18.1 when the milestone is opened, I just mean that the fix on tip will land in the 1.19 cycle (so when the tree opens in February).

@gopherbot Please open a backport issue for Go 1.17.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #49921 (for 1.17), #49922 (for 1.18).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@toothrot
Copy link
Contributor

@neild @mknyszek @davecheney Because this issue was fixed before go1.18rc1, and we want it in the 1.18.1 release, we should actually just update the vendored x/net in the go repo for 1.18, and ensure this is part of the 1.18 release.

In that case, the steps would be:

@cagedmantis
Copy link
Contributor

@gopherbot Please open a backport issue for Go 1.16.

@cagedmantis
Copy link
Contributor

Backport issue(s) opened: #50449 (for 1.16)

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/375718 mentions this issue: [internal-branch.go1.16-vendor] http2: prioritize RST_STREAM frames in random write scheduler

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/375719 mentions this issue: [internal-branch.go1.17-vendor] http2: prioritize RST_STREAM frames in random write scheduler

@dmitshur dmitshur modified the milestones: Go1.19, Go1.18 Jan 5, 2022
gopherbot pushed a commit to golang/net that referenced this issue Jan 6, 2022
…n random write scheduler

The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

For golang/go#49741
Updates golang/go#50449

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit 04296fa)
Reviewed-on: https://go-review.googlesource.com/c/net/+/375718
Run-TryBot: Carlos Amedee <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
gopherbot pushed a commit to golang/net that referenced this issue Jan 6, 2022
…n random write scheduler

The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

For golang/go#49741
Updates golang/go#49921

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit 04296fa)
Reviewed-on: https://go-review.googlesource.com/c/net/+/375719
Run-TryBot: Carlos Amedee <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
@FrankReh
Copy link

FrankReh commented Jan 29, 2022

The original test case given above still does not timeout after a second when building with go1.17.6, even though the commit that fixed golang.org/x/net was cherry picked.

I've seen the failure with both Darwin and Linux builds.

I can get the test case to timeout correctly if I use the latest golang.org/x/net package and modify main with the ConfigureServer call:

package main

import (
	"io"
	"log"
	"net/http"
	"net/http/httptest"
	"strings"
	"time"

	"golang.org/x/net/http2"
)

func handler(w http.ResponseWriter, r *http.Request) {
	data := strings.Repeat("x", 1<<16)
	tick := time.NewTicker(1 * time.Millisecond)
	defer tick.Stop()
	for {
		select {
		case <-tick.C:
			n, err := io.WriteString(w, data)
			log.Printf("wrote %d, err %v", n, err)
			if err != nil {
				return
			}
		case <-r.Context().Done():
			log.Printf("context cancelled")
			return
		}
	}
}

func main() {
	sv := httptest.NewUnstartedServer(http.HandlerFunc(handler))
	err := http2.ConfigureServer(sv.Config, &http2.Server{})
	if err != nil {
		log.Fatal(err)
	}
	sv.EnableHTTP2 = true
	sv.Config.WriteTimeout = 1 * time.Second
	sv.StartTLS()

	resp, err := sv.Client().Get(sv.URL + "/")
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	select {} // block forever
}

The go.mod:

module main

go 1.17

require golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd

require golang.org/x/text v0.3.7 // indirect

But what's interesting is I can also get the hung write to timeout correctly without the golang.org/x/net package, but I had
to modify the go net package source (and rebuild go probably didn't have to rebuild go, the std packages aren't built into the binary) to export the http2ConfigureServer function. By calling that in place of the http2:ConfigureServer call above, that also causes the blocked write to timeout correctly.

@ianlancetaylor
Copy link
Member

Reopening. CC @neild

@aojea
Copy link
Contributor

aojea commented Jan 30, 2022

I've found the problem, there is an inconsistency on the default writeScheduler chosen:

The fix for the WriteTimeout issue was in the RandomWriteScheduler, that is chosen if you use configureTransport directly

go/src/net/http/h2_bundle.go

Lines 4114 to 4118 in a5c0b19

if s.NewWriteScheduler != nil {
sc.writeSched = s.NewWriteScheduler()
} else {
sc.writeSched = http2NewRandomWriteScheduler()
}

However, when http2 is configured automatically, it uses a different WriteScheduler, that may have the same issue than the other WriteScheduler

go/src/net/http/server.go

Lines 3296 to 3311 in a5c0b19

// onceSetNextProtoDefaults configures HTTP/2, if the user hasn't
// configured otherwise. (by setting srv.TLSNextProto non-nil)
// It must only be called via srv.nextProtoOnce (use srv.setupHTTP2_*).
func (srv *Server) onceSetNextProtoDefaults() {
if omitBundledHTTP2 || godebug.Get("http2server") == "0" {
return
}
// Enable HTTP/2 by default if the user hasn't otherwise
// configured their TLSNextProto map.
if srv.TLSNextProto == nil {
conf := &http2Server{
NewWriteScheduler: func() http2WriteScheduler { return http2NewPriorityWriteScheduler(nil) },
}
srv.nextProtoErr = http2ConfigureServer(srv, conf)
}
}

So, I think that there are 2 things to fix here:

  • a bug in NewPriorityWriteScheduler
  • consistency defaulting the WriteScheduler

@neild @ianlancetaylor I can take on this if you want.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/382014 mentions this issue: http2: prioritize RST_STREAM frames in priority write scheduler

@aojea
Copy link
Contributor

aojea commented Jan 30, 2022

Submitted a patch for fixing the scheduler http2NewPriorityWriteScheduler bug.

What should be the default scheduler http2NewPriorityWriteScheduler() or http2NewRandomWriteScheduler()

EDIT

It seems it has to be the PriorityWriteScheduler based on #18318

@heschi heschi modified the milestones: Go1.18, Go1.19 Mar 15, 2022
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

Fixes golang/go#49741

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
The http2 priority write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

Fixes golang/go#49741

Signed-off-by: Antonio Ojea <[email protected]>
Change-Id: Ie4462603380039f7aba8f701fe810d1d84376efa
Reviewed-on: https://go-review.googlesource.com/c/net/+/382014
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Run-TryBot: Dave Cheney <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
fedosgad pushed a commit to fedosgad/oohttp that referenced this issue Jun 22, 2022
…n random write scheduler

The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

For golang/go#49741
Updates golang/go#49921

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit 04296fa82e83b85317bd93ad50dd00460d6d7940)
Reviewed-on: https://go-review.googlesource.com/c/net/+/375719
Run-TryBot: Carlos Amedee <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
danbudris pushed a commit to danbudris/net that referenced this issue Sep 16, 2022
…n random write scheduler

The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

For golang/go#49741
Updates golang/go#50449

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit 04296fa)
Reviewed-on: https://go-review.googlesource.com/c/net/+/375718
Run-TryBot: Carlos Amedee <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
danbudris pushed a commit to danbudris/net that referenced this issue Sep 16, 2022
…n random write scheduler

The http2 random write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

For golang/go#49741
Updates golang/go#50449

Change-Id: I0940a76d1aad95f1c4d3856e4d79cf5ce2a78ff2
Reviewed-on: https://go-review.googlesource.com/c/net/+/367154
Trust: Dave Cheney <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit 04296fa)
Reviewed-on: https://go-review.googlesource.com/c/net/+/375718
Run-TryBot: Carlos Amedee <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
WeiminShang added a commit to WeiminShang/net that referenced this issue Nov 16, 2022
The http2 priority write scheduler should not queue RST_STREAM
frames with the DATA frames, and instead treat them as control frames.

There can be deadlock situations if data frames block the queue,
because if the sender wants to close the stream it sends an RST frame,
but if the client is not draining the queue, the RST frame is stuck
and the sender is not able to finish.

Fixes golang/go#49741

Signed-off-by: Antonio Ojea <[email protected]>
Change-Id: Ie4462603380039f7aba8f701fe810d1d84376efa
Reviewed-on: https://go-review.googlesource.com/c/net/+/382014
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Run-TryBot: Dave Cheney <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
@golang golang locked and limited conversation to collaborators May 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests