-
Notifications
You must be signed in to change notification settings - Fork 5.5k
http: mitigate delayed close timeout race with connection write buffer flush #6437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
07b5ded
82dc62d
3f08a6d
eb58ad6
b06bb0b
dd6d9ea
412f485
f60db05
11008bd
aebc8cf
0424594
efad9c6
788e07a
2d75b29
031f520
06ec3e9
e9c0a73
adb0488
012bd0a
a68c785
77ec586
0e2248c
2d3d44b
577729e
7a6bceb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,6 +47,7 @@ Version history | |
| * http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged. | ||
| * http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data. | ||
| * http: added encodeComplete/decodeComplete. These are invoked at the end of the stream, after all data has been encoded/decoded respectively. Default implementation is a no-op. | ||
| * http: fixed a bug with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger while actively flushing a pending write buffer for a downstream connection. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you merge master again and move this to 1.11.0?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
| * outlier_detection: added support for :ref:`outlier detection event protobuf-based logging <arch_overview_outlier_detection_logging>`. | ||
| * mysql: added a MySQL proxy filter that is capable of parsing SQL queries over MySQL wire protocol. Refer to :ref:`MySQL proxy<config_network_filters_mysql_proxy>` for more details. | ||
| * performance: new buffer implementation (disabled by default; to test it, add "--use-libevent-buffers 0" to the command-line arguments when starting Envoy). | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,7 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
|
|
||
| uint64_t data_to_write = write_buffer_->length(); | ||
| ENVOY_CONN_LOG(debug, "closing data_to_write={} type={}", *this, data_to_write, enumToInt(type)); | ||
| const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0; | ||
| if (data_to_write == 0 || type == ConnectionCloseType::NoFlush || | ||
| !transport_socket_->canFlushClose()) { | ||
| if (data_to_write > 0) { | ||
|
|
@@ -107,13 +108,22 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| transport_socket_->doWrite(*write_buffer_, true); | ||
| } | ||
|
|
||
| closeSocket(ConnectionEvent::LocalClose); | ||
| if (type == ConnectionCloseType::FlushWriteAndDelay && delayed_close_timeout_set) { | ||
| // The socket is being closed and there is no more data to write. Since a delayed close has | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After one more pass, is this comment correct? Arguably we can also get here where there is data to write, but we're in the midst of a tls/alts handshake and have determined that flushing is pointless (canFlushClose is false) because the write could not flush all data. I think the canFlushClose() was added for the case where you have plaintext payload queued up behind an unfinished crypto handshake at which point the payload isn't going to get flushed and given old style options you should give up and close immediately. That said, if there were some crypto protocol where there were large bidi frames, I can imagine the same race we have for HTTP where the FIN + RST lagged behind a large client side "no I am rejecting your handshake" write which is queued in the kernel and so we want a one interval delay for the client to get that response. I think if we're in that case, the connection is going to observe the transport socket is "blocked", and the alarm will fire after one interval (handling any race) so the code is doing the right thing and the comment can just be tweaked a bit for clarity. Sound right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, thanks for pointing this out. I have clarified the comment.
Yeah, I agree with this. I revisited the logic for the transport sockets that return |
||
| // been requested, start the delayed close timer if it hasn't been done already by a previous | ||
| // close(). | ||
| if (!inDelayedClose()) { | ||
| initializeDelayedCloseTimer(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we gracefully handle a caller doing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added ASSERT() and comment. |
||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndTimeout; | ||
| } | ||
| } else { | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
| } else { | ||
| ASSERT(type == ConnectionCloseType::FlushWrite || | ||
| type == ConnectionCloseType::FlushWriteAndDelay); | ||
|
|
||
| // No need to continue if a FlushWrite/FlushWriteAndDelay has already been issued and there is a | ||
| // pending delayed close. | ||
| // If there is a pending delayed close, simply update the delayed close state. | ||
| // | ||
| // An example of this condition manifests when a downstream connection is closed early by Envoy, | ||
| // such as when a route can't be matched: | ||
|
|
@@ -123,35 +133,29 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| // ConnectionManagerImpl::checkForDeferredClose() | ||
| // 2) A second close is issued by a subsequent call to | ||
| // ConnectionManagerImpl::checkForDeferredClose() prior to returning from onData() | ||
| if (delayed_close_) { | ||
| if (inDelayedClose()) { | ||
| // Validate that a delayed close timer is already enabled unless it was disabled via | ||
| // configuration. | ||
| ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr); | ||
| if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity, does this state change actually happen or is this preemptive? Just wondering if we can simplify until this becomes an actual issue? Can we just assert the type is the same?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was just preemptive. I've changed the logic to a stronger ASSERT() and removed unnecessary tests.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My latest commit restores this logic. The reason is that after discussing with @alyssawilk offline, I now prefer maintaining Another option, which would significantly simplify the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Based on a quick scan, I haven't found any filters that attempt a graceful close followed by a forced close (it tends to be one or the other, typically the latter is only used on error handling code paths prior to any other close()s being issued). However, this seems like a reasonable thing to support, and more importantly, the existing logic in the So, the modified proposal would be to only allow state transitions
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alyssawilk @mattklein123 I would prefer to create a second PR for changing the This PR should be safe to merge given the amount of review and test coverage added. What do you think about doing this in two steps and I can follow up with the API change after this PR is merged? I think we will end up with a cleaner history and more granular options for rolling back if the API change causes breakage.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure that's fine with me. |
||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlush; | ||
| } else { | ||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndTimeout; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| delayed_close_ = true; | ||
| const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0; | ||
|
|
||
| // NOTE: the delayed close timeout (if set) affects both FlushWrite and FlushWriteAndDelay | ||
| // closes: | ||
| // 1. For FlushWrite, the timeout sets an upper bound on how long to wait for the flush to | ||
| // complete before the connection is locally closed. | ||
| // 2. For FlushWriteAndDelay, the timeout specifies an upper bound on how long to wait for the | ||
| // flush to complete and the peer to close the connection before it is locally closed. | ||
|
|
||
| // All close types that follow do not actually close() the socket immediately so that buffered | ||
| // data can be written. However, we do want to stop reading to apply TCP backpressure. | ||
| read_enabled_ = false; | ||
|
|
||
| // Force a closeSocket() after the write buffer is flushed if the close_type calls for it or if | ||
| // no delayed close timeout is set. | ||
| close_after_flush_ = !delayed_close_timeout_set || type == ConnectionCloseType::FlushWrite; | ||
|
|
||
| // Create and activate a timer which will immediately close the connection if triggered. | ||
| // A config value of 0 disables the timeout. | ||
| if (delayed_close_timeout_set) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. optional (style thing) having sanity checked that we couldn't call initializeDelayedCloseTimer() with inDelayedClose() true, I wonder if it's worth putting this in an else{} block just to make it super clear which branch we are on.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a comment. I would prefer not to unnecessarily indent unless you think it makes a large readability difference. |
||
| delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); }); | ||
| ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this, | ||
| delayedCloseTimeout().count()); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| initializeDelayedCloseTimer(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't we still get here with data_to_write > 0? I thought if we did close(DelayedCloseState::CloseAfterFlushAndTimeout) we didn't want to arm the timer until the flush was complete?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The timer is always armed when a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity. |
||
| delayed_close_state_ = (type == ConnectionCloseType::FlushWrite) | ||
| ? DelayedCloseState::CloseAfterFlush | ||
| : DelayedCloseState::CloseAfterFlushAndTimeout; | ||
| } else { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was all excited that I'd found a bug where we would let connections idle out forever but apparently that's WAI? Can I ask the API go from // A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's to // A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's maybe with some // .. attention:: flags or DANGER DANGER DANGER if you'd like to leak connections please set this to 0 :-P
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Added a |
||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlush; | ||
| } | ||
|
|
||
| file_event_->setEnabled(Event::FileReadyType::Write | | ||
|
|
@@ -162,7 +166,7 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| Connection::State ConnectionImpl::state() const { | ||
| if (!ioHandle().isOpen()) { | ||
| return State::Closed; | ||
| } else if (delayed_close_) { | ||
| } else if (inDelayedClose()) { | ||
| return State::Closing; | ||
| } else { | ||
| return State::Open; | ||
|
|
@@ -529,6 +533,12 @@ void ConnectionImpl::onWriteReady() { | |
| } | ||
| } | ||
|
|
||
| // Disable the delayed close timer since data is still being flushed. The timer should only | ||
| // trigger after a delayedCloseTimeout() period of inactivity. | ||
| if (delayed_close_timer_ != nullptr) { | ||
| delayed_close_timer_->disableTimer(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason to disable the timer here? Won't we boost it below? (Mainly just wondering if we can simplify slightly but there might be a good reason to keep this disable here.)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had missed that libevent's |
||
| } | ||
|
|
||
| IoResult result = transport_socket_->doWrite(*write_buffer_, write_end_stream_); | ||
| ASSERT(!result.end_stream_read_); // The interface guarantees that only read operations set this. | ||
| uint64_t new_buffer_size = write_buffer_->length(); | ||
|
|
@@ -539,16 +549,28 @@ void ConnectionImpl::onWriteReady() { | |
| // write callback. This can happen if we manage to complete the SSL handshake in the write | ||
| // callback, raise a connected event, and close the connection. | ||
| closeSocket(ConnectionEvent::RemoteClose); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to disable timer upon closeSocket() if timer has been set or in onDelayedCloseTimeout() check if io_handle is still open? Probably I missed something in the workflow as the old code didn't do so either. But it worth commenting out why we don't do that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just saw that closeSocket() disables timer. Sorry for my ignorant... |
||
| } else if ((close_after_flush_ && new_buffer_size == 0) || bothSidesHalfClosed()) { | ||
| } else if ((inDelayedClose() && new_buffer_size == 0) || bothSidesHalfClosed()) { | ||
| ENVOY_CONN_LOG(debug, "write flush complete", *this); | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } else if (result.action_ == PostIoAction::KeepOpen && result.bytes_processed_ > 0) { | ||
| for (BytesSentCb& cb : bytes_sent_callbacks_) { | ||
| cb(result.bytes_processed_); | ||
|
|
||
| // If a callback closes the socket, stop iterating. | ||
| if (!ioHandle().isOpen()) { | ||
| return; | ||
| if (delayed_close_state_ == DelayedCloseState::CloseAfterFlushAndTimeout) { | ||
| ASSERT(delayed_close_timer_ != nullptr); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think timer can still time out during flush write if there is long quiescence. Does resetting timer here take effect if the timer times out in this call stack? I'm not familiar of libevent, it is totally possible that event_del() already handles such case well.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked the libevent source and confirmed that |
||
| } else { | ||
| ASSERT(bothSidesHalfClosed() || delayed_close_state_ == DelayedCloseState::CloseAfterFlush); | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
| } else { | ||
| ASSERT(result.action_ == PostIoAction::KeepOpen); | ||
| if (delayed_close_timer_ != nullptr) { | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| } | ||
| if (result.bytes_processed_ > 0) { | ||
| for (BytesSentCb& cb : bytes_sent_callbacks_) { | ||
| cb(result.bytes_processed_); | ||
|
|
||
| // If a callback closes the socket, stop iterating. | ||
| if (!ioHandle().isOpen()) { | ||
| return; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -587,13 +609,22 @@ bool ConnectionImpl::bothSidesHalfClosed() { | |
| } | ||
|
|
||
| void ConnectionImpl::onDelayedCloseTimeout() { | ||
| delayed_close_timer_.reset(nullptr); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we just usually reset()
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
| ENVOY_CONN_LOG(debug, "triggered delayed close", *this); | ||
| if (connection_stats_ != nullptr && connection_stats_->delayed_close_timeouts_ != nullptr) { | ||
| connection_stats_->delayed_close_timeouts_->inc(); | ||
| } | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
|
|
||
| void ConnectionImpl::initializeDelayedCloseTimer() { | ||
| const auto timeout = delayedCloseTimeout().count(); | ||
| ASSERT(delayed_close_timer_ == nullptr && timeout > 0); | ||
| delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); }); | ||
| ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this, timeout); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| } | ||
|
|
||
| absl::string_view ConnectionImpl::transportFailureReason() const { | ||
| return transport_socket_->failureReason(); | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,16 +167,22 @@ class ConnectionImpl : public virtual Connection, | |
| // Callback issued when a delayed close timeout triggers. | ||
| void onDelayedCloseTimeout(); | ||
|
|
||
| void initializeDelayedCloseTimer(); | ||
| bool inDelayedClose() const { return delayed_close_state_ != DelayedCloseState::None; } | ||
|
|
||
| static std::atomic<uint64_t> next_global_id_; | ||
|
|
||
| // States associated with delayed closing of the connection (i.e., when the underlying socket is | ||
| // not immediately close()d as a result of a ConnectionImpl::close()). | ||
| enum class DelayedCloseState { None, CloseAfterFlush, CloseAfterFlushAndTimeout }; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add comments for each state? It's not immediately clear what the difference is between the latter 2.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
| DelayedCloseState delayed_close_state_{DelayedCloseState::None}; | ||
|
|
||
| Event::Dispatcher& dispatcher_; | ||
| const uint64_t id_; | ||
| Event::TimerPtr delayed_close_timer_; | ||
| std::list<ConnectionCallbacks*> callbacks_; | ||
| std::list<BytesSentCb> bytes_sent_callbacks_; | ||
| bool read_enabled_{true}; | ||
| bool close_after_flush_{false}; | ||
| bool delayed_close_{false}; | ||
| bool above_high_watermark_{false}; | ||
| bool detect_early_close_{true}; | ||
| bool enable_half_close_{false}; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, much more clear, thanks.
One last super nitty nit, I'd reverse the order since the flush happens first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.