Delay connection reuse for a poll cycle to catch closed connections.#7159
Delay connection reuse for a poll cycle to catch closed connections.#7159jplevyak wants to merge 3 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: John Plevyak <jplevyak@gmail.com>
|
Note: this includes the changes from which will be merged out when #7124 is merged. |
|
Please don't mix multiple PRs that are open. At first glance I have concerns about this PR but I don't want to actually review with the other one open. Does this PR supersede the other one? /wait |
|
Sorry, the test changes here depend on the HTTP/1.0 fix, which is why that is included here. I will ping this thread when I have merged the HTTP/1.0 fix. |
…reuse Signed-off-by: John Plevyak <jplevyak@gmail.com>
|
I have updated PR and it is now ready for review. |
mattklein123
left a comment
There was a problem hiding this comment.
I will be honest that I'm not a fan of this change. IMO it's just kicking the race condition can down the road. I'm not sure how 2 event loop cycles is any different than 3, etc. since we don't know when the remote close will actually happen.
If we are going to do such a thing, I think the correct solution here is to add a config option to allow connections to be delayed for X amount of time before being moved from the delayed list back to the active list (or just some fixed amount of time).
/wait
|
So, while I agree that this is not the ultimate solution and in fact, given that the upstream can close at any time there is no ultimate solution but:
|
|
I'm glad that it addressed the customer issue, but I think this is still ultimately a hack and I'm not in favor of adding this type of complexity to the code base. If we want to discuss adding a configurable delay timer, that sounds reasonable to me. cc @lizan @alyssawilk for further comment. |
|
Actually, can you also refresh my memory of why we need this delay at all? Why doesn't the upstream server indicate it is going to close the connection? |
|
If you don't like this change then I would suggest that we remove the existing hack which attempts to prevent immediate reuse as it adds nearly the same level of complexity but was ineffective in practice for the reasons I outlined above and it provides a false sense of security. |
|
+1 to remove it, but I'm trying to remember why we added it in the first place. @lizan? |
|
Sadly the HTTP/1.0 HTTP/1.1 specification does not require that the upstream server signal that it is going to close the connection in any way. The upshot is that it is perfectly acceptable for a server to complete an operation and then close the connection without sending a Connection: close header. Before the HTTP/1.0 fix, envoy would routinely attempt to reuse a connection to a 1.0 server which it had immediately closed after sending the response. Such a request would result in an error, but not necessarily before some data had been written into the write buffer. Particularly as the order of actions is WRITE then READ . When this happens for a POST, it should not be retried and the error propagates back to the client. |
Understood, that was clearly a bug and I'm glad we fixed it. What I'm wondering is whether the original defer fix was actually "fixing" HTTP/1.0 upstreams, or we are actually dealing with servers not indicating close. Given that you also did the HTTP/1.0 fix, presumably your customer is using HTTP/1.0 and their issues will be fixed? Or do they also have issues with HTTP/1.1? Can they modify their code to send the close header if they are going to close the connection? My feeling on the delay "fix," is that if we want to do it, it should actually be an optional config where if a timer is configured the connection is put on the delayed list for X period of time. In the default case we wouldn't do anything and would effectively revert the existing hack/fix. WDYT? |
|
re: HTTP/1.0 fix, IIRC we've never been support HTTP/1.0 upstreams, there was only limited and not fully standards compliant downstream HTTP/1.0 support in Envoy. The original fix to delay the attaching pending request process is due to Envoy HTTP codecs are not half-close aware. That change only saves the case that Envoy can read FIN before processing next request. #2715 and #2871 has more context of that. @jplevyak Can you elaborate why do you need the |
|
Lizan I don't see what half close has to do with this. That should not be relevant for an upstream connection. We are about to rehash the giant argument in the issue you linked to. The upstream server is not indicating it is immediately closing the connection and in the real world that is a bug. I'm strongly against continued "fixes" in this area, especially one like this with counting poll loops, etc. If we want to add an opt in feature to delay connection reuse to deal with buggy upstream servers that's the way I would approach this. |
|
@mattklein123 HTTP doesn't support half-close, so when the codec try to see whether upstream connection is half-closed it always get false. #2715 (comment) |
|
Ok, yes, agreed on that part. I don't remember why we fixed it this way with a single poll cycle vs some other more direct way but I would have to go back through. I think we did it the way we did due to regression risk. In any case, that's a specific case which is not this. I'm not ok with this unless we want to do an opt in generic delay feature. |
|
envoy has this If we know nothing about upstream, theoretically envoy could never prevent upstream concurrent close. However, there is a good chance that the upstream http server would close immediately after the http response. Combining this PR and the setting of upstream |
Right because having whole HTTP stack half-close aware would be a large change.
+1, I don't think the counting needed, need more explanation on the counting part. If this is just complements some cases that I missed in #2871 then it is probably fine (which is not very clear now). |
There might be alternative approach to achieve the goal that client should be reused at least after one poll. IMHO die at the second clean up might be the easy one, if we don't want to rely on the subtle implementation of libevent. |
@silentdai This is not unexpected and intended, the only case that we're saving in #2871 is until the end of the read event, regardless which poll cycle. The use of poll cycle is to make sure the clean up is after processing the specific read event (at the end of read event it will read end of stream). |
How? Where is the attempt to read? You mean the one returns EAGAIN to adapt edge trigger? What #2871 is missing |
|
@silentdai See my comment there. In https://github.com/envoyproxy/envoy/blob/v1.10.0/source/common/network/connection_impl.cc#L486 if transport socket read some bytes + EOS (this is the ONLY case that #2871 is trying to save), the upstream connection will be closed after filters process the read bytes, then the client will be removed. |
|
@lizan I see what you mean. Even I didn't really calculate the number, I admit that #2871 should already reduced a few 503 cases. However there are more.
In real world EOS(or FIN in Raw transport socket) and the last a few bytes would be in separate socket. We observe that in load test. Even in med-load test, we see the case that EOS is not read until next poll cycle. The below load test will bring up the first 503 in 20K requests.
With this PR the 503 rate reduce to 1/100 or even lower. |
@lizan in looking at this briefly again, I'm wondering why we didn't just add a @silentdai @jplevyak the case you are trying to "fix" here is different. Envoy has not yet read the FIN from the connection. It's coming some time later. The only change I'm amenable to in this area is an opt-in feature to allow connection re-use being delayed by some number of milliseconds. This can be off by default and then Istio can opt-in. If you want to build the feature that way, let's go for it. Otherwise I woudl recommend that you tell the customer to indicate that they are going to close the connection in their response. |
We could go this way although delaying one poll cycle is more elegant and practical. All known http server/application either close immediately or after idle seconds. At former case, the period between last byte and EOS is unpredictable. we know it's short but no one can tell. A larger number will always safer but it reduces connection reuse rate. I think a poll cycle in this PR is an awesome try. Since
BTW, it's only only istio is affected. It's microservice arch which is more 503 sensitive than monoarch. |
|
I agree this is not Istio specific issue, though as you said:
So poll cycles are not an elegant way to solve this, when you delay N cycles, EOS may arrive in N+1-th cycle. The number is to be determined by the deployment so if we do this it should be configurable. As noted in #2715, a retry (at Envoy level or application level) is preferred approach, #2715 (comment). Regardless of POST or GET, the status code 503 has a retry-able semantics defined in RFC 7231. (cc @jplevyak). |
Exactly. Ultimately, you are trying to introduce a delay. If you want a delay, let's build it right which means making it configurable, off by default, and with a variable millisecond delay. This is the only solution that I'm amenable to as I said above.
All well behaving HTTP/1.1 servers indicate they are going to close the connection if they are going to immediately close it (Envoy does this). As I have said over and over again here and in the linked issues, this is well known timing issue with HTTP/1.1. So to summarize, the options here are to:
|
|
A side question @silentdai @jplevyak did you try #7124 and this change individually with the customer for istio/istio#13848 or together? I'm curious how much of each change contributes to the fix. |
|
While "Regardless of POST or GET, the status code 503 has a retry-able semantics defined in RFC 7231" the problem is that the upstream server isn't returning 503, it is just closing the connection. A broken connection is not retry-able for POST, so envoy can't do that without overriding the spec. This problem manifests in production with some frequency because (it seems) both node (with some frameworks) and Apache HTTPD use 5 second keep-alive timeout so a potential solution is to use a 4 second inactivity timeout in envoy. |
|
For clarification, is this trying to institute a delay, or trying to make sure that a FIN which was already machine local is read into user-space? If it's the latter, I do support making sure that we detect closed connections, even though as Matt notes there's no guarantee that the terminal FIN actually arrives with the terminal data. I also agree that waiting 2 loops feels kind of hacky. Could we potentially peek before returning the connection to the pool? I'm not sure if peeking actually works for FINs but if that works it might be a compromise. |
|
Regarding the customer issue: I don't know the relative contribution of each change. I think the biggest contribution to "fixing" the issue was dramatically increasing the keep-alive timeout on the upstream server. I would be for dropping this change and removing the existing partial delay code. We could add a reuse delay, but I don't think that is the right solution. Instead I suggest that we instrument the code to determine under what conditions we see unexpected errors on connection reuse. If we can determine (for example) that "Server: Apache.*" connections are dropped after a particular time we can set a default timeout a bit shorter than that. |
Again, the existing partial delay code is NOT trying to institute a delay, but make sure a FIN which is already read into user-space take effective while stack unwinding. I'm fine if you revert it and go with #7159 (comment) |
|
Regarding the FIN local/peek question/suggestion: silentdai tried that and I also instrumented the code to see what was happening when the server closed immediately after sending the response but without Connection: coose, and poll is returning POLLRDHUP which is converted into EV_CLOSE by libevent which is then converted to FileReadyType::Closed by file_event_impl.cc which is then ignored by envoy if we are reading and are not using half-close semantics (which is the case for HTTP during connection reuse). Delaying by a poll() loop let's use detect the close as a Network::ConnectionEvent::RemoteClose. Probing also detected this case. However, when the upstream server closed after some delay (e.g. 5 seconds), then there was a race. We found that both the delay and the probe were at times unable to prevent an error because the FIN could arrive after the probe but before the read/write. So, to answer your question, the peek/probe of the connection still doesn't solve the problem but it does decreases the probability just as delaying a poll() cycle does. I'll leave silentdai to tell us if he knows how much each change decreases the probability. |
|
lizan, the code in onReadyReady calls doRead on the socket which loops, so that if there was already a FIN along with the data on the socket (e.g. a write + close by the server) then the code would detect it without the existing code which attempts to delay reuse. That code isn't necessary in this case. The only case in which it might be useful is if there was a delay between the doRead and when the FIN arrives. That is also the case that this PR is addressing. That is why I agree with Matt. We should remove the existing code and drop this PR and if we want to add a delay before reuse we can do so with an explicit time limit. However, I still think that we should just instrument the code and try to determine the circumstances of unexpected closes as a delay might not be the best solution. The two cases I know about are the HTTP/1.0 missing keep-alive (now fixed) and the 5 second timeout on Apache HTTPD and some node servers. The latter is not addressed by this PR or the existing delaying reuse code, so I don't have a real world example which requires the delay. I am curious if anyone else does. |
No, see https://github.com/envoyproxy/envoy/blob/v1.10.0/source/common/network/connection_impl.cc#L486 the HTTP connection have The server immediately closes connection without |
I don't know much work need to be done to fix this bug, as it's not bug in envoy's code base. |
|
lizan, I am looking not at the Event::FileReadyType::Closed event which indeed as you point out is not handled because of the lack of half-close handling, but instead at RawBufferSocket::DoRead which will detect the EOF as a read().result.rc_ == 0 and set end_stream == true in IoResult which will then be handled in ConnectionImpl::onReadReady by converting into PostIOAcction::Close which will result in a closeSocket(ConnectionEvent::RemoteClose). This is how an immediate close which results in pending FIN will be detected by envoy. |
|
"I don't know much work need to be done to fix this bug, as it's not bug in envoy's code base." I agree it's not a bug in Envoy's code base, but it's a pretty common problem (having come up with a number of users and a number of deployments) and as we work on supporting Envoy as a forward proxy speaking to the open internet (where we can't fix all buggy upstreams) do think it's one we're going to want to handle gracefully. |
|
This code catches an immediate close after the completion of a transaction. Before the 1.0 fix this would happen on every request for 1.0 servers which did not send Connection: close (e.g. flask). As per our discussions I don't think that this delay is needed anymore since the other cases which we have seen involve closes some time later (i.e. when the upstream server inactivity timeout fires). I am going to close this and create a PR to remove the existing delay code (which doesn't work as intended in any case). I will also open an issue to add instrumentation for unexpected closes. |
incomplete in that libevent doesn't guarantee ordering of events so delaying for a single poll cycle does not address the issue. Also because the connection is left in the ready queue it is available for another request coming in the same pool cycle or in the next poll cycle before ready connection is processed. Finally, this would only catch connections closed immediately which should be very infrequent since the HTTP/1.0 fix which eliminated those cases where a missing Connection: keep-alive was not interpreted as Connection: close, resulting in reuse of a connection which was immedately closed. See envoyproxy#7159 for details. Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak jplevyak@gmail.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Currently we attempt to delay connection reuse until after a poll() cycle, but the connection can be reused more quickly if another connection comes in before the poll() and because libevent does not guarantee the order of events. This change uses a separate delay_ list and delays for 2 cycles to ensure that a poll() occurs. This dramatically reduces the number of reused but closed connections in production tests.
Risk Level: Low
Testing: 24 hours with 5 apps in production.
Docs Changes:
Release Notes:
[Optional Fixes #Issue] istio/istio#13848
[Optional Deprecated:]