network: delayed conn close#4382
Conversation
Mitigate client read/close race issues on downstream HTTP connections by adding a new connection close type 'FlushWriteAndDelay'. This new close type flushes the write buffer on a connection but does not immediately close after emptying the buffer (unlike ConnectionCloseType::FlushWrite). A timer has been added to track delayed closes for both 'FlushWrite' and 'FlushWriteAndDelay'. Upon triggering, the socket will be closed and the connection will be cleaned up. Delayed close processing can be disabled by setting the newly added HCM 'delayed_close_timeout' config option to 0. Risk Level: Medium (changes common case behavior for closing of downstream HTTP connections) Testing: Unit tests and integration tests added. Fixes envoyproxy#2929. Signed-off-by: Andres Guedez <aguedez@google.com>
HTTP/2 integration tests added for delayed close processing. Signed-off-by: Andres Guedez <aguedez@google.com>
Increase the timeout values for the common case (no timeout trigerred) tests to prevent test flakiness. Signed-off-by: Andres Guedez <aguedez@google.com>
dnoe
left a comment
There was a problem hiding this comment.
Nice, thanks for fixing this! Mostly style nits.
| // | ||
| // A value of 0 will completely disable delayed close processing, and the downstream connection's | ||
| // socket will be closed immediately after the write flush is completed. | ||
| google.protobuf.Duration delayed_close_timeout = 25 [(gogoproto.stdduration) = true]; |
There was a problem hiding this comment.
Is there a default value? What happens if delayed_close_timeout is not set?
There was a problem hiding this comment.
There is a default timeout value of 1000 ms. I've amended the comment to note this.
include/envoy/network/connection.h
Outdated
| FlushWrite, // Flush pending write data before raising ConnectionEvent::LocalClose | ||
| NoFlush // Do not flush any pending data and immediately raise ConnectionEvent::LocalClose | ||
| NoFlush, // Do not flush any pending data and immediately raise ConnectionEvent::LocalClose | ||
| FlushWriteAndDelay |
There was a problem hiding this comment.
Can you add a similar comment to the above two lines?
| close_with_flush_ = true; | ||
| RELEASE_ASSERT(type == ConnectionCloseType::FlushWrite || | ||
| type == ConnectionCloseType::FlushWriteAndDelay, | ||
| ""); |
There was a problem hiding this comment.
What's the rationale behind making this a RELEASE_ASSERT? What's the practical implication if it was an ASSERT and happened in production?
There was a problem hiding this comment.
I'm using the RELEASE_ASSERT to provide runtime protection against bugs introduced in this logic, since they could lead to issues such as resource leaks and/or memory corruption.
I just re-read the style guide and it does mention that ASSERT should be the common case for this type of check, so I can certainly change it if that's the convention.
There was a problem hiding this comment.
The convention is generally to use ASSERT for this, if you feel strongly about RELEASE_ASSERT I'll defer to senior maintainer for the decision.
There was a problem hiding this comment.
Changed to ASSERT() per style guidelines.
| type == ConnectionCloseType::FlushWriteAndDelay, | ||
| ""); | ||
| delayed_close_ = true; | ||
| bool delayed_close_timeout_set = delayedCloseTimeout().count(); |
There was a problem hiding this comment.
I'd prefer to see the explicit boolean expression here, ie, delayedCloseTimeout().count() > 0.
|
|
||
| // Force a closeSocket() after the write buffer is flushed if the close_type calls for it or if | ||
| // no delayed close timeout is set. | ||
| close_after_flush_ = type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set; |
There was a problem hiding this comment.
Super nit, but I think this expression is more readable if you reverse the two halves of the or, since it separates the = and == more.
| std::unique_ptr<Network::ConnectionImpl> server_connection(new Network::ConnectionImpl( | ||
| dispatcher, std::make_unique<ConnectionSocketImpl>(0, nullptr, nullptr), | ||
| TransportSocketPtr(transport_socket), true)); | ||
| server_connection->setDelayedCloseTimeout(std::chrono::milliseconds(500)); |
There was a problem hiding this comment.
Are we safe from flakes here?
There was a problem hiding this comment.
Yes. The timer's callback is manually invoked at the end of the test. The 'setDelayedCloseTimeout()' call is required to enable creation of the timer, but the actual value does not matter (as long as it's >0). I've added a clarifying comment.
| TEST_P(ConnectionImplTest, FlushWriteAndDelayCloseTimerTriggerTest) { | ||
| setUpBasicConnection(); | ||
| connect(); | ||
| server_connection_->setDelayedCloseTimeout(std::chrono::milliseconds(500)); |
There was a problem hiding this comment.
Same question about flakes here. How do we know 500ms is the right number?
There was a problem hiding this comment.
This is safe. Since the client connection does not issue a close() during this test, the timer always triggers. I've added a clarifying comment and reduced the timeout value since there is no need to wait so long.
| using testing::Invoke; | ||
| using testing::InvokeWithoutArgs; | ||
| using testing::Return; | ||
| using testing::ReturnNew; |
| version_); | ||
|
|
||
| connection.run(); | ||
| EXPECT_TRUE(response.find("SETTINGS expected") != std::string::npos); |
There was a problem hiding this comment.
I find the form EXPECT_THAT(response, HasSubstr("SETTINGS expected")) more readable when trying to understand how the test works.
There was a problem hiding this comment.
Thanks for pointing out the matcher. Done.
| version_); | ||
|
|
||
| connection.run(); | ||
| EXPECT_TRUE(response.find("SETTINGS expected") != std::string::npos); |
There was a problem hiding this comment.
Same here, the HasSubstr matcher will make this read nicely.
|
The CircleCI mac failure is due to the lack of early connection close detection in macOS (see #4299). Per the conversation in that issue, I am disabling the failing test on macOS. |
| // The delayed close timeout is for downstream connections managed by the HTTP connection manager. | ||
| // It is defined as a grace period after connection close processing has been locally initiated | ||
| // during which Envoy will flush the write buffers for the connection and await the peer to close | ||
| // the connection (i.e., issue a TCP FIN/RST). |
There was a problem hiding this comment.
In the case flush the write buffer and delay close, we should issue TCP FIN but not RST, the expected behavior is issue FIN after flush writing, and RST on timeout.
There was a problem hiding this comment.
I've clarified this comment since this is referring to the client shutting down the connection and Envoy receiving the FIN/RST.
With regards to Envoy issuing FIN vs RST, the behavior is for a shutdown(_, SHUT_WR) to be issued when the flush write is successfully completed. If the delayed close timeout triggers, a close() is issued on the socket, which will trigger FIN or RST depending on the state of the socket/connection.
What benefit are we trying to get out of always issuing RST on timeout? It will likely lead to faster reclamation of resources which is nice, but I think Envoy's behavior regarding FIN vs RST should be applied uniformly across timeout/error conditions and therefore should be tracked as a new issue.
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
| close_with_flush_ = true; | ||
| RELEASE_ASSERT(type == ConnectionCloseType::FlushWrite || | ||
| type == ConnectionCloseType::FlushWriteAndDelay, | ||
| ""); |
There was a problem hiding this comment.
The convention is generally to use ASSERT for this, if you feel strongly about RELEASE_ASSERT I'll defer to senior maintainer for the decision.
| resource_class: xlarge | ||
| working_directory: /source | ||
| environment: | ||
| BAZEL_REMOTE_CACHE: https://storage.googleapis.com/envoy-circleci-bazel-cache/ |
There was a problem hiding this comment.
Let's plan to revert these changes before this gets merged. If we do disable this for all of Envoy, I'd like to do it in a separate PR and consult with maintainers. I think you can leave it in for now with this comment in place so senior maintainer gets a heads up.
| return new Buffer::WatermarkBuffer(below_low, above_high); | ||
| })); | ||
|
|
||
| auto timer = new Event::MockTimer(&dispatcher); |
There was a problem hiding this comment.
That explanation makes sense. I get nervous when I see bare new calls, especially without a comment. I think this test might be doing the opposite of the pattern I usually see, which is to have a unique_ptr here in the test body which is std::moved into the right place to be returned by Envoy::MockBufferFactory::create() but only after you steal a bare pointer out of the unique_ptr to use when setting up any the gmock expectations. This is a clear case of dangerous cheating the unique_ptr, but it's ok in this limited test case example since we know the unique pointer scope lives as long as the body of this test. I think it leaves the object life cycle a bit clearer defined than if you don't use unique_ptr here.
If this doesn't work in your case, please add a comment explaining the object lifecycle at the bare news. Happy to chat more about this in person tomorrow too if it helps.
Signed-off-by: Andres Guedez <aguedez@google.com>
|
@dnoe -- thanks for the review! |
| // Counter* as this is an optional counter. Bind errors will not be tracked if this is nullptr. | ||
| Stats::Counter* bind_errors_; | ||
| // Optional counter. Delayed close semantics are only used by HTTP connections. | ||
| Stats::Counter* delayed_close_timeouts_; |
There was a problem hiding this comment.
Let's remove the reference to HTTP, since this isn't necessarily HTTP-specific. I think the comment can just mirror the one for bind_errors_.
There was a problem hiding this comment.
Agreed the delayed close timeout is not inherently HTTP specific, but the intent of the comment is to point out that as currently implemented it only affects HTTP connections, since the H{1,2} codecs are the only callers of setDelayedCloseTimeout() on a connection to enable it.
I'm fine removing this however if you don't think the note/clarification is required.
There was a problem hiding this comment.
My feeling is that it's just a comment we'll have to delete (or will be incorrect) when someone inevitably enables this for a different purpose.
There was a problem hiding this comment.
Ok. Cleaned up the comment.
include/envoy/network/connection.h
Outdated
|
|
||
| /** | ||
| * Set the timeout for delayed connection close()s. | ||
| * This is only used for downstream connections processing HTTP/1 and HTTP/2. |
There was a problem hiding this comment.
Likewise, remove this reference (or at least the "only" qualifier).
There was a problem hiding this comment.
Please document (probably on close) the relationship between this timeout value and the various close options. I had assumed that ConnectionCloseType::FlushWrite behavior was meant to be unchanged, but really it's going to wait up to the delayed close timeout, if set, to complete its flush before closing.
|
|
||
| void ConnectionImpl::onDelayedCloseTimeout() { | ||
| ENVOY_CONN_LOG(debug, "triggered delayed close", *this); | ||
| ASSERT(connection_stats_->delayed_close_timeouts_); |
There was a problem hiding this comment.
Prefer that you test if delayed_close_timeouts_ is null (similar to how bind_errors_ optionality is handled). Or else, ASSERT the pointer is non-null when the delayed close timeout is configured.
| if (delayed_close_timer_) { | ||
| // It's ok to disable even if the timer has already fired. | ||
| delayed_close_timer_->disableTimer(); | ||
| delayed_close_timer_.reset(); |
There was a problem hiding this comment.
Nit: don't need to reset -- it's about to be destroyed.
| } | ||
| }; | ||
| } // namespace Network | ||
| }; // namespace Envoy |
There was a problem hiding this comment.
Nit: undo these -- looks like fix format got confused at some point.
There was a problem hiding this comment.
tools/check_format.py fails when this is reverted -- 'FixNamespaceComments' appears to be required.
There was a problem hiding this comment.
I downloaded this file and it seems to pass format checks locally without the stray namespace comments. Has it failed CI's format check?
There was a problem hiding this comment.
Reverted comments. I was looking at the wrong lines... Thanks.
test/integration/http_integration.cc
Outdated
| } | ||
|
|
||
| void IntegrationCodecClient::waitForDisconnect() { | ||
| bool IntegrationCodecClient::waitForDisconnect(uint32_t time_to_wait) { |
There was a problem hiding this comment.
Nit: use std::chrono::millisconds here to match the other integration tests.
| if (wait_timer_triggered && !disconnected_) { | ||
| return false; | ||
| } | ||
| EXPECT_TRUE(disconnected_); |
There was a problem hiding this comment.
Should disable the timer if it hasn't fired.
test/integration/integration_test.cc
Outdated
|
|
||
| response->waitForEndStream(); | ||
| // The delayed close timeout should trigger since client is not closing the connection. | ||
| EXPECT_TRUE(codec_client_->waitForDisconnect(750)); |
There was a problem hiding this comment.
Have you given this the runs_per_test treatment?
There was a problem hiding this comment.
Yeah, I've done extensive flake testing and did not hit any issues. I've adjusted the timeouts though to make it even more resistant since the CI environment is considerably more flake prone than my machine.
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
zuercher
left a comment
There was a problem hiding this comment.
Thanks. One small thing, but otherwise looks good to me.
| // Create and activate a timer which will immediately close the connection if triggered. | ||
| // A config value of 0 disables the timeout. | ||
| if (delayed_close_timeout_set) { | ||
| ASSERT(connection_stats_->delayed_close_timeouts_ != nullptr); |
There was a problem hiding this comment.
Sorry, I meant in setDelayedCloseTimeout.I don't want someone to enable this feature on a connection and then only later get an assertion about this stat.
There was a problem hiding this comment.
connection_stats_ is set via setConnectionStats(), so asserting on the counter != nullptr in setDelayedCloseTimeout() will introduce an ordering dependency (which is not currently met). I think it would be preferable to keep these two functions independent of each other.
Unfortunately, initialization of ConnectionImpls is spread out between codec creation (allocation and call to setDelayedCloseTimeout) and ConnectionManagerImpl::initializeReadFilterCallbacks() where setConnectionStats() is called. Therefore, adding an initFinished() function where the ASSERT could be safely issued will require some work.
I can mark this with a TODO so we can move it into the appropriate place when such a function is added. What do you think?
There was a problem hiding this comment.
Or just make setting connection_state_->delayed_close_timeouts_ optional? (E.g. don't assert, just don't increment if it's null?)
Signed-off-by: Andres Guedez <aguedez@google.com>
|
@zuercher -- Thanks for reviewing! |
I don't have that commit in my branch; I'll merge and kick off new CI runs. |
|
@mattklein123 Please take a look. |
mattklein123
left a comment
There was a problem hiding this comment.
Very nice. Love seeing this implemented and the flush timeout leak fixed also. A few small questions/comments.
| Http1Settings settings, | ||
| std::chrono::milliseconds delayed_close_timeout) | ||
| : ConnectionImpl(connection, HTTP_REQUEST), callbacks_(callbacks), codec_settings_(settings) { | ||
| connection_.setDelayedCloseTimeout(delayed_close_timeout); |
There was a problem hiding this comment.
What's the reasoning for doing this set in the codec? It seems like something that is unrelated to the codec itself and should be done by higher layer code? Same question for the HTTP/2 codec below.
There was a problem hiding this comment.
It's not related to the codec directly but this particular delayed_close_timeout is HTTP only since it is managed via the HCM configuration. This seemed to be the most appropriate place to set the timeout for HTTP connections just after they are constructed.
Since the timeout value is intended to be network filter specific, I wouldn't expect any layer higher than the corresponding filter to issue the call to setDelayedCloseTimeout().
There was a problem hiding this comment.
I agree it should be done in the filter. Why not do it in the HCM filter and avoid codec modifications?
There was a problem hiding this comment.
Agreed it's best to contain it within the HCM. I revisited the code and moved setting the timeout to the HCM config.
|
|
||
| // No need to continue if a FlushWrite/FlushWriteAndDelay has already been issued and there is a | ||
| // pending delayed close. | ||
| if (delayed_close_) { |
There was a problem hiding this comment.
Does this happen in practice? Mostly just wondering about the flow that causes this. Perhaps flesh out the comment with some examples of cases where this happens?
There was a problem hiding this comment.
Yes, I initially tried an ASSERT() which triggered in multiple tests during a //test/... run. I just tracked down a failure during an integration_test and added it to the comment. Based on that example and code inspection, this conditional can evaluate true under fairly standard use cases such as when Envoy initiates the close() sequence during codec processing.
| } | ||
|
|
||
| delayed_close_ = true; | ||
| bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0; |
Signed-off-by: Andres Guedez <aguedez@google.com>
9d86332
Instead, constrain setting the delayed close timeout to the HCM config, which owns it. Signed-off-by: Andres Guedez <aguedez@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, one small change and I think you also need a master merge.
| HttpConnectionManagerConfig::createCodec(Network::Connection& connection, | ||
| const Buffer::Instance& data, | ||
| Http::ServerConnectionCallbacks& callbacks) { | ||
| connection.setDelayedCloseTimeout(delayed_close_timeout_); |
There was a problem hiding this comment.
I think I would recommend setting this in ConnectionManagerImpl::initializeReadFilterCallbacks. It's earlier than here and also near where we setup some other connection level stuff.
Signed-off-by: Andres Guedez <aguedez@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM. Can you merge master and we can 🚢? Thank you!
Signed-off-by: Andres Guedez <aguedez@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, sorry, one more doc thing.
| // | ||
| // A value of 0 will completely disable delayed close processing, and the downstream connection's | ||
| // socket will be closed immediately after the write flush is completed. | ||
| google.protobuf.Duration delayed_close_timeout = 26 [(gogoproto.stdduration) = true]; |
There was a problem hiding this comment.
Sorry I forget this from before. Can you add this to the release notes also with a link to this field?
Signed-off-by: Andres Guedez <aguedez@google.com>
|
@mattklein123 Thanks for the review! |
This reverts commit aa9478f. Signed-off-by: Jose Nino <jnino@lyft.com>
This PR reverts #4382. When deploying at Lyft we noticed crashes on here where we might be derefencing the connection_stats_ pointer after the point has been reset. Note: this PR keeps the changes to the API made in the original PR but tags the field as not implemented. This is what we have done in the past for reverts that involve changes that change the API. Signed-off-by: Jose Nino <jnino@lyft.com>
Re-enable the changes reverted in 9d32e5c, which were originally merged as part of envoyproxy#4382. Signed-off-by: Andres Guedez <aguedez@google.com>
Fixes a segfault introduced in envoyproxy#4382 due to a connection tear down race condition when the delayed close timer triggers after connection state has been reset via closeSocket(). Signed-off-by: Andres Guedez <aguedez@google.com>
This PR reverts envoyproxy#4382. When deploying at Lyft we noticed crashes on here where we might be derefencing the connection_stats_ pointer after the point has been reset. Note: this PR keeps the changes to the API made in the original PR but tags the field as not implemented. This is what we have done in the past for reverts that involve changes that change the API. Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: Aaltan Ahmad <aa@stripe.com>
…ix (envoyproxy#4587) Re-enable the changes reverted in 9d32e5c, which were originally merged as part of envoyproxy#4382. Signed-off-by: Andres Guedez <aguedez@google.com> Signed-off-by: Aaltan Ahmad <aa@stripe.com>
network: Support delayed close of downstream conns (#2929)
Mitigate client read/close race issues on downstream HTTP connections by adding a new connection
close type 'FlushWriteAndDelay'. This new close type flushes the write buffer on a connection but
does not immediately close after emptying the buffer (unlike ConnectionCloseType::FlushWrite).
A timer has been added to track delayed closes for both 'FlushWrite' and 'FlushWriteAndDelay'. Upon
triggering, the socket will be closed and the connection will be cleaned up.
Delayed close processing can be disabled by setting the newly added HCM 'delayed_close_timeout'
config option to 0.
Risk Level: Medium (changes common case behavior for closing of downstream HTTP connections)
Testing: Unit tests and integration tests added.
Fixes #2929.
Signed-off-by: Andres Guedez aguedez@google.com