thrift: don't close the downstream on an upstream overflow #19133
Merged
zuercher merged 5 commits intoenvoyproxy:mainfrom Dec 2, 2021
Merged
thrift: don't close the downstream on an upstream overflow #19133zuercher merged 5 commits intoenvoyproxy:mainfrom
zuercher merged 5 commits intoenvoyproxy:mainfrom
Conversation
…rors When we fail to get an upstream connection (e.g.: PoolFailureReason::Overflow) there's no need to close the downstream connection, since the request never made it through. So we keep it open and avoid an issue that happens when closing remote connections after a local response - see below. There is a potentially separate issue which isn't addressed in this change: when sendLocalReply() is called with `end_stream=true` the `local_response_sent_` marker is not set: https://github.com/envoyproxy/envoy/blob/main/source/extensions/filters/network/thrift_proxy/conn_manager.cc#L702 Because using `end_stream=true` will call close with `Network::ConnectionCloseType::FlushWrite` we end with a delayed close() which means that the subsequent calls to `applyDecoderFilters()` might be racy (e.g.: downstream socket might or might not be closed). This can generate a crash. A possible solution is setting `local_response_sent_=true` regardless of `end_stream=true`, given that I can't see any reasons why this would be problematic. I'll follow-up with more unit tests and an integration test illustrating the issue with the delayed close() calls. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Member
Author
|
cc: @fishcakez |
added 2 commits
November 29, 2021 19:50
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
added 2 commits
December 2, 2021 13:44
…eam-connection-overflow
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
zuercher
approved these changes
Dec 2, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When we fail to get an upstream connection (e.g.: PoolFailureReason::Overflow)
there's no need to close the downstream connection, since the request never
made it through. So we keep it open and avoid an issue that happens when
closing remote connections after a local response - see below.
There is a potentially separate issue which isn't addressed in this change:
when sendLocalReply() is called with
end_stream=truethelocal_response_sent_marker is not set:https://github.com/envoyproxy/envoy/blob/main/source/extensions/filters/network/thrift_proxy/conn_manager.cc#L702
Because using
end_stream=truewill call close withNetwork::ConnectionCloseType::FlushWritewe end with a delayed close() which means that the subsequent calls
to
applyDecoderFilters()might be racy (e.g.: downstream socket mightor might not be closed). This can generate a crash.
A possible solution is setting
local_response_sent_=trueregardlessof
end_stream=true, given that I can't see any reasons why this wouldbe problematic.
I'll follow-up with more unit tests and an integration test
illustrating the issue with the delayed close() calls.
Signed-off-by: Raul Gutierrez Segales rgs@pinterest.com