-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy fails with "use of closed network connection" error when using a slow proxy #2105
Comments
Thanks for your report.
The implementation always reads at least a few bytes from the source before initiating the upload to the destination; so it’s not that the source somehow spends a lot of time starting the download. The download is either delayed somewhere in between, or possibly at the very end at EOF. In both cases I think there’s little the destination can do about a slow source — and the design of the copy implementation (with a possible compression/decompresssion transformation) makes it very hard, in general, to somehow restart the upload on failure. It would progably be valuable to gather some more data about the timing / cause. Is it a delay when starting the copy? When finishing? Is it just slow? Is the cause the source or the destination? In this area, Skopeo is just a dumb wrapper around the c/image implementation, so I’m transferring the report there. I also rather suspect that this is an error handling bug in the standard Go library. There are ~two network connections involved: The destination one is purely internal to the standard-library HTTP stack, and not managed by the c/image code; the source one more of a responsibility of the c/image code, but errors from that one would include an WRT the workaround, I would recommend using |
Built with |
To clarify, I suspect a mostly-cosmetic issue: there is probably some underlying cause (“connection closed by the remote server”, “timed out”) which isn’t reported but should be — but the underlying cause wouldn’t go away just because it would be correctly reported. |
Randomly guessing a reproducer, yes, there is a standard library error handling problem: func TestUseOfClosedNetworkConnection(t *testing.T) {
s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var buf1 [5]byte
n, err := r.Body.Read(buf1[:])
require.NoError(t, err, "reading body")
logrus.Errorf("Got %d bytes: %s", n, string(buf1[:n]))
rc := http.NewResponseController(w)
c, buf, err := rc.Hijack()
require.NoError(t, err, "Hijack")
bufLen := buf.Reader.Buffered()
buf2 := make([]byte, bufLen)
n, err = buf.Read(buf2)
require.NoError(t, err)
logrus.Errorf("Buffered %d bytes: %s", n, buf2[:n])
logrus.Errorf("Closing")
err = c.Close()
require.NoError(t, err)
}))
defer s.Close()
sourceReader, sourceWriter := io.Pipe()
go func() {
_, err := sourceWriter.Write([]byte("line 1\n"))
require.NoError(t, err, "Writing line 1")
time.Sleep(1 * time.Second)
_, err = sourceWriter.Write([]byte("line 2\n"))
require.NoError(t, err, "Writing line 2")
err = sourceWriter.Close()
require.NoError(t, err, "Closing writer")
}()
defer sourceReader.CloseWithError(errors.New("Test exited"))
client := &http.Client{
Transport: tlsclientconfig.NewTransport(),
}
req, err := http.NewRequestWithContext(context.Background(), http.MethodPatch, s.URL, sourceReader)
require.NoError(t, err)
res, err := client.Do(req)
require.NoError(t, err)
defer res.Body.Close()
body, err := io.ReadAll(res.Body)
require.NoError(t, err)
logrus.Errorf("Response status %d, body \n%s\n", res.StatusCode, string(body))
} % go test -v -v -timeout 600s -run ^TestUseOfClosedNetworkConnection$ ./docker
=== RUN TestUseOfClosedNetworkConnection
time="2023-09-04T22:46:57+02:00" level=error msg="Got 5 bytes: line "
time="2023-09-04T22:46:57+02:00" level=error msg="Buffered 4 bytes: 1\n\r\n"
time="2023-09-04T22:46:57+02:00" level=error msg=Closing
docker_image_dest_test.go:84:
Error Trace: …/docker/docker_image_dest_test.go:84
Error: Received unexpected error:
Patch "http://127.0.0.1:64956": write tcp 127.0.0.1:64957->127.0.0.1:64956: use of closed network connection
Test: TestUseOfClosedNetworkConnection
--- FAIL: TestUseOfClosedNetworkConnection (1.00s)
FAIL
FAIL github.com/containers/image/v5/docker 1.457s
FAIL though the error text is not exactly the same. |
The basic cause is that the Go HTTP/1.1 implementation has a separate “read” and “write” loops. The read loop waits for responses; and that one terminates with “net/http: Transport failed to read from server: EOF” (compare golang/go#53472 ). And it’s the read end that is responsible for actually closing the socket. The read end does not know or care whether the write end is still sending the request; a “full duplex” operation where an (error) response can be sent before the full request is sent is, apparently, an intentionally desired feature. Meanwhile, the write loop is busy sending request data. And that code is unaware of the read end having closed the socket, hence the error text. |
I have filed golang/go#62447 against the upstream Go standard library’s error reporting behavior. Of course, that’s not directly helpful for making the copy work. I’m not sure that is reasonably possible; understanding the timing/cause, as mentioned above, would be a prerequisite. |
Related Quay issue: https://issues.redhat.com/browse/RFE-4643 |
Hi @mtrmac
The download is delayed by ZScaler at the end. Until this happens, its fast. This is how the skopeo copy kibana:6.7.1 image download output looks like for about 2m40s:
Thanks for the hint! |
Hum. Does the source actually deliver all data, and block before reporting EOF / success, or is there at least one byte outstanding that we have to block on? If all data is available, we could, in many but not all cases, finish processing the data without waiting for a successful EOF indication (probably defeating the point of the scan — we would consume the data even if the scan later detected an error). I suppose longer-term we should add an option for downloading to a file locally, and pushing from that file — this is not the only case where significantly different transport speeds between sources and destinations cause trouble, although it’s probably the most notable one so far. That would not be too much code — I’m mostly concerned that we need to understand the root cause enough to rule out some other bug in our code that should be fixed instead. |
We have an internal Quay registry where we mirror external images. We need to use a proxy (ZScaler). Sometimes some blobs take quite a while to finish downloading. Up to 3 minutes. I suspect they run some scans on the proxy before the download can be finished. Seems skopeo opens a connection to the target registry, wehre the TCP connection closes due to idleness before the download from the source layer finishes (just wild guessing).
Example of a failed copy:
Workaround
Copy to local storage first, then to the internal registry.
Versions
I used the following to versions:
Other things
I see the same error when using the Red Hat Quay Repo Mirror feature (Support-Case opened). Looking at the logs, it seems that skopeo is used in the background.
The text was updated successfully, but these errors were encountered: