Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPClient.remoteConnectionClosed errors #488

Closed
MahdiBM opened this issue Nov 21, 2021 · 33 comments
Closed

HTTPClient.remoteConnectionClosed errors #488

MahdiBM opened this issue Nov 21, 2021 · 33 comments

Comments

@MahdiBM
Copy link
Contributor

MahdiBM commented Nov 21, 2021

Current Behavior

After a few hours, e.g. 2.5 hours, my app starts to throw HTTPClient.remoteConnectionClosed errors.

Expected Behavior

No HTTPClient errors should be thrown.

What am i doing?

In Vapor, I have a piece of code like so:

let clientRequest = ClientRequest(url: uri, headers: headers)
return app.client.send(clientRequest)

This method is called 2.5/s, and basically crawls another API.
Though, when the errors start happening, the throwing happens even outside this code, and with every eligible API call.

More Info

  • I've tested with v1.7.0 and latest commit and 1.6.4.
    Everything works fine on 1.6.4, but using the other two results in the error.
  • Happens with heavier response bodies (~100kb +). I haven't seen it happen with a req with res body of like 10/20 kb, probably due to it having less chance to happen.
  • Happens only after a few hours of my app's start. Something in range of 2-4, usually like 2.5.
  • I haven't tested without the code that is calling another api 2.5 times per second. So not sure if the errors start to happen due to some kind of resource being exhausted, or other reasons.
  • Haven't tested with setting httpVersion from .automatic to .http1Only. Can try this if you need more info.

I can attempt to make a repro project if it seems necessary.

@fabianfett
Copy link
Member

Hi @MahdiBM, thanks for posting this issue. A couple of follow up questions:

  1. Does your remote API support http/2? I assume so, but I would like to be sure.

  2. Do you connect to to the api via a proxy?

  3. Did I understand you right, this error starts to appear after roughly ~22k requests?
    (= 3600 seconds per hour * 2.5 hours * 2.5 request per second)

  4. Did I understand you right, after this error has appeared once all subsequent requests fail with the same error? The http client does not recover from this error?

  5. Do you have access to the services logs? Would you mind changing this:

    let clientRequest = ClientRequest(url: uri, headers: headers)
    return app.client.send(clientRequest)

    to this:

    let clientRequest = HTTPClient.Request()
    var clientLogger = req.logger
    clientLogger.logLevel = .debug // or even: .trace (however this will produce a ton of logs)
    app.http.client.shared.execute(request: request, logger: clientLogger)

    This way, you use the pure AsyncHTTPClient apis and you can set the log level for the request.

  6. Repro:

    I can attempt to make a repro project if it seems necessary.

    This would be immensely helpful.

Notes:

  • HTTPClient.remoteConnectionClosed is thrown by the HTTPRequestStateMachine, if we get a channelInactive, while in a running request. In an h2 scenario, we might get a channelInactive if a stream gets closed, but the connection does not. (RST_STREAM?!)

@fabianfett
Copy link
Member

Okay, digging into this a little deeper. AWS Application Load Balancer seems to send a GOAWAY after 10k streams:

▿ FramePayload
  ▿ goAway : 3 elements
    ▿ lastStreamID : HTTP2StreamID(19999)
      - networkStreamID : 19999
    ▿ errorCode : HTTP2ErrorCode<0x0 No Error>
      - _networkCode : 0
    - opaqueData : nil

I wonder if there is a way to detect this behavior before sending out further streams. Also if we receive a GOAWAY we will fail the request either with a:

  • StreamClosed(streamID: HTTP2StreamID(20007),
  • HTTPClient.remoteConnectionClosed or
  • HTTPClientError.cancelled

error. Maybe we should do a better job in differentiating those.

@fabianfett
Copy link
Member

fabianfett commented Nov 21, 2021

Okay, this is seemingly defined behavior:

The load balancer sends a response code of 000

With HTTP/2 connections, if the compressed length of any of the headers exceeds 8K bytes or if the number of requests served through one connection exceeds 10,000, the load balancer sends a GOAWAY frame and closes the connection with a TCP FIN.

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Nov 21, 2021

  1. curl --http2 https://api.clashroyale.com/v1/locations/global/rankings/players -H "Authorization: Bearer [TOKEN]" -vso /dev/null results in
...
Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200 
< date: Sun, 21 Nov 2021 15:44:17 GMT
< content-type: application/json; charset=utf-8
...

so it supports http/2.

  1. No proxy. Direct access.
  2. Yes basically. Last time i checked the problem started happening almost exactly 2.5 after app launch, and it does roughly reqs 2.47/s which is basically 22-22.5K requests. Though not sure how consistent that number is.
  3. No, sorry for the confusion. Only a part of the requests fail. Haven't checked exactly, but should be something like 20-50% of the requests.
  4. and 6. : I'll try to put a repro together and if unsuccessful i'll change my client code in my app to see what are the logs.

@fabianfett
Copy link
Member

fabianfett commented Nov 21, 2021

Did I understand you right, after this error has appeared once all subsequent requests fail with the same error? The http
client does not recover from this error?

No, sorry for the confusion. Only a part of the requests fail. Haven't checked exactly, but should be something like 20-50% of the requests.

20-50% or 20 to 50 requests in total? How long does such a request normally take?

When I curl this

curl --http2 https://api.clashroyale.com/v1/locations/global/rankings/players -H "Authorization: Bearer [TOKEN]" -vso /dev/null

I can see that the TLS cert is Amazon issued, which is a hint that we run against an AWS owned Load Balancer/API Gateway. The 10k stream limit described above might be the issue here.

@fabianfett
Copy link
Member

Okay, running tests against the API (https://api.clashroyale.com) you provided, AHC fails over after exactly 10k requests.

@fabianfett
Copy link
Member

@MahdiBM Do you sometimes see an error like this: error=StreamClosed(streamID: HTTP2StreamID(20001)?

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Nov 21, 2021

Did I understand you right, after this error has appeared once all subsequent requests fail with the same error? The http
client does not recover from this error?

No, sorry for the confusion. Only a part of the requests fail. Haven't checked exactly, but should be something like 20-50% of the requests.

20-50% or 20 to 50 requests in total? How long does such a request normally take?

When I curl this

curl --http2 https://api.clashroyale.com/v1/locations/global/rankings/players -H "Authorization: Bearer [TOKEN]" -vso /dev/null

I can see that the TLS cert is Amazon issued, which is a hint that we run against an AWS owned Load Balancer/API Gateway. The 10k stream limit described above might be the issue here.

20% to 50%. It takes a normal amount of time ... less than 5s, usually 2s or less.
And yes the API uses Amazon services so that could be the issue.

Okay, running tests against the API (https://api.clashroyale.com) you provided, AHC fails over after exactly 10k requests.

Nice that didn't take long to reproduce :)
It is a free and fairly simple API and you can get your own token if you want and haven't yet.
https://developer.clashroyale.com

@MahdiBM Do you sometimes see an error like this: error=StreamClosed(streamID: HTTP2StreamID(20001)?

Correct. Rarely. But yes.

@fabianfett
Copy link
Member

Interestingly the Go client seems to have the same issue:
golang/go#18639

@fabianfett
Copy link
Member

More context in the AWS dev forums:
https://forums.aws.amazon.com/message.jspa?messageID=967466

@fabianfett
Copy link
Member

Interesting quotes from the RFC:

Activity on streams numbered lower or equal to the last stream
identifier might still complete successfully. The sender of a GOAWAY
frame might gracefully shut down a connection by sending a GOAWAY
frame, maintaining the connection in an "open" state until all in-
progress streams complete.

A server that is attempting to gracefully shut down a
connection SHOULD send an initial GOAWAY frame with the last stream
identifier set to 2^31-1 and a NO_ERROR code. This signals to the
client that a shutdown is imminent and that initiating further
requests is prohibited. After allowing time for any in-flight stream
creation (at least one round-trip time), the server can send another
GOAWAY frame with an updated last stream identifier. This ensures
that a connection can be cleanly shut down without losing requests.

This is NOT what the AWS ALB does.

On streams with lower- or equal-numbered identifiers that were not
closed completely prior to the connection being closed, reattempting
requests, transactions, or any protocol activity is not possible,
with the exception of idempotent actions like HTTP GET, PUT, or
DELETE. Any protocol activity that uses higher-numbered streams can
be safely retried using a new connection.

We could auto retry requests.

https://datatracker.ietf.org/doc/html/rfc7540#section-6.8

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Nov 23, 2021

I'm personally doing fine pinning my AHC to 1.6.4, though since i saw someone else having the same problem in Vapor Discord server, i wanted to ask what is a good workaround in your opinion for now?
Setting httpVersion to .http1Only should be the better way to make sure the AHC can receive updates, but i just went with the lazier fix ...
Are those two the expected workarounds, or are there better ways?

@fabianfett
Copy link
Member

.http1Only is the best workaround for now.

@MaxDesiatov
Copy link
Member

I'm also getting these issues when trying to download assets from GitHub releases. Pinning to 1.6.4 helps, but the issue is reproducible with both 1.7.0 and 1.8.0

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Nov 24, 2021

For those who are having this issue and use Vapor:

I initially thought it's a bit harder to change client's HTTPVersion in Vapor, but a few hours ago i took another look and found out its actually quite simple.
Make sure you are on async-http-client 1.7.0 or greater, then add this line of code to your configure func to force AHC to only use http/1 :

app.http.client.configuration.httpVersion = .http1Only

For others who directly use HTTPClient without Vapor:

Read this pull request's first comment to see how to change your httpVersion to http1Only:

@fabianfett
Copy link
Member

In general, after #490 we should see much better error messages. You should normally see a GOAWAY error message.

@fabianfett
Copy link
Member

I'm also getting these issues when trying to download assets from GitHub releases. Pinning to 1.6.4 helps, but the issue is reproducible with both 1.7.0 and 1.8.0

@MaxDesiatov Do you see after how many requests you see this error?

@MaxDesiatov
Copy link
Member

@MaxDesiatov Do you see after how many requests you see this error?

I'm not sure what you are referring to by "how many requests". I issue just a single request in my app, and it never finishes a download properly, always throwing HTTPClient.remoteConnectionClosed in AHC 1.7.0 and later. All works perfectly fine in 1.6.4.

MaxDesiatov added a commit to swiftwasm/carton that referenced this issue Nov 24, 2021
`async-http-client` package is now pinned to 1.6.4, as 1.7.0 and later versions caused issues with toolchain downloads (see swift-server/async-http-client#488 for more details).

I've also updated toolchain downloader code to bubble up download errors, which previously were hidden and made it very hard to diagnose these issues.

* Use SwiftWasm 5.5.0, bump version to 0.12.0

* Pin AHC to 1.8.0

* Pin AHC to 1.6.4, always remove downloaded archives

* Fix `defer` build issue
@MaxDesiatov
Copy link
Member

Here's a URL that I'm requesting, if that helps https://github.com/swiftwasm/swift/releases/download/swift-wasm-5.5.0-RELEASE/swift-wasm-5.5.0-RELEASE-macos_arm64.pkg

I'm using FileDownloadDelegate bundled with AHC to download the file.

@fabianfett
Copy link
Member

@MaxDesiatov Do you see the error on the first request?

@MaxDesiatov
Copy link
Member

MaxDesiatov commented Nov 24, 2021

I'm making a single request, and I guess that's implied to be the first request? A single run of my code issues only a single request, which always fails with HTTPClient.remoteConnectionClosed with AHC 1.7.0 or later. I don't run any requests after that since this is the point of the app I'm seeing this issue with: download a package from the URL just once and then exit.

@MaxDesiatov
Copy link
Member

MaxDesiatov commented Nov 24, 2021

Given that OP sees the issue appearing only after some amount of requests, I'm not sure now if my issue is exactly the same, but the error I'm seeing thrown is the same. And my issue is stably reproducible. Please let me know if I should create a separate ticket here for it.

@fabianfett
Copy link
Member

@MaxDesiatov Yes, the real issue in your case is very likely another one (and not http2 GOAWAY frames). I think we fixed your issue today with #490. Would you mind trying it once with current AHC main?

@MaxDesiatov
Copy link
Member

With main it still fails for me with HTTPClientError.remoteConnectionClosed

@robipresotto
Copy link

Yep, it works great on 1.6.4v only. Having the same issues on production and downgraded to 1.6.4 for now. Also experiencing the same issues described by @MaxDesiatov, after 3 or 4 requests it hangs and the client returns [ WARNING ] HTTPClientError.remoteConnectionClosed [request-id: A42D8BCE-6237-4C02-B60E-F4736107C1B8] (Vapor/Middleware/ErrorMiddleware.swift:42)

@fabianfett
Copy link
Member

@MaxDesiatov I could reproduce your issue. Thanks for the report. A fix has been proposed to nio-http2: apple/swift-nio-http2#317

@MaxDesiatov
Copy link
Member

Perfect, thanks!

fabianfett added a commit to apple/swift-nio-http2 that referenced this issue Nov 30, 2021
### Motivation

In the AsyncHTTPClient the read event is not always forwarded right away. We have seen instances, in which we see a `HTTPClientError.remoteConnectionClosed` error, on requests that finish normally. swift-server/async-http-client#488 

On deeper inspection, I noticed: If there is no unsatisfied read event, when a stream is closed, the pending reads are not forwarded. This can lead to response bytes being ignored on successful requests. NIOHTTP2 should behave as NIO and force forward all pending reads on channelInactive.

### Changes

- Forward all pending reads on channelInactive even if no read event has hit the channel

### Result

All requests will receive the complete request body.
@fabianfett
Copy link
Member

@MaxDesiatov we just shipped 1.8.1. Would you mind verifying that we fixed your issue?

@MaxDesiatov
Copy link
Member

MaxDesiatov commented Dec 1, 2021

thanks, this indeed fixes the issue for me

@fabianfett
Copy link
Member

@MahdiBM I have created two tickets #505 and #506 that describe the AWS ALB issue at more depth. Since we have also solved @MaxDesiatov's issue, do you mind if I close this ticket and we track work on the new issues?

@fabianfett
Copy link
Member

@robipresotto Would you mind checking if your problem got fixed with 1.8.1 as well?

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Dec 1, 2021

@fabianfett Sure, no problem, you can close this issue.

@robipresotto
Copy link

It works great 👍🏻 thanks a lot guys :)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants