-
Notifications
You must be signed in to change notification settings - Fork 5.3k
http: mitigate delayed close timeout race with connection write buffer flush #6437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
07b5ded
82dc62d
3f08a6d
eb58ad6
b06bb0b
dd6d9ea
412f485
f60db05
11008bd
aebc8cf
0424594
efad9c6
788e07a
2d75b29
031f520
06ec3e9
e9c0a73
adb0488
012bd0a
a68c785
77ec586
0e2248c
2d3d44b
577729e
7a6bceb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,7 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
|
|
||
| uint64_t data_to_write = write_buffer_->length(); | ||
| ENVOY_CONN_LOG(debug, "closing data_to_write={} type={}", *this, data_to_write, enumToInt(type)); | ||
| const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0; | ||
| if (data_to_write == 0 || type == ConnectionCloseType::NoFlush || | ||
| !transport_socket_->canFlushClose()) { | ||
| if (data_to_write > 0) { | ||
|
|
@@ -107,13 +108,25 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| transport_socket_->doWrite(*write_buffer_, true); | ||
| } | ||
|
|
||
| closeSocket(ConnectionEvent::LocalClose); | ||
| if (type == ConnectionCloseType::FlushWriteAndDelay && delayed_close_timeout_set) { | ||
| // The socket is being closed and either there is no more data to write or the data can not be | ||
| // flushed (!transport_socket_->canFlushClose()). Since a delayed close has been requested, | ||
| // start the delayed close timer if it hasn't been done already by a previous close(). | ||
| // NOTE: Even though the delayed_close_state_ is being set to CloseAfterFlushAndWait, since | ||
| // a write event is not being registered for the socket, this logic is simply setting the | ||
| // timer and waiting for it to trigger to close the socket. | ||
| if (!inDelayedClose()) { | ||
| initializeDelayedCloseTimer(); | ||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndWait; | ||
| } | ||
| } else { | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
| } else { | ||
| ASSERT(type == ConnectionCloseType::FlushWrite || | ||
| type == ConnectionCloseType::FlushWriteAndDelay); | ||
|
|
||
| // No need to continue if a FlushWrite/FlushWriteAndDelay has already been issued and there is a | ||
| // pending delayed close. | ||
| // If there is a pending delayed close, simply update the delayed close state. | ||
| // | ||
| // An example of this condition manifests when a downstream connection is closed early by Envoy, | ||
| // such as when a route can't be matched: | ||
|
|
@@ -123,35 +136,31 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| // ConnectionManagerImpl::checkForDeferredClose() | ||
| // 2) A second close is issued by a subsequent call to | ||
| // ConnectionManagerImpl::checkForDeferredClose() prior to returning from onData() | ||
| if (delayed_close_) { | ||
| if (inDelayedClose()) { | ||
| // Validate that a delayed close timer is already enabled unless it was disabled via | ||
| // configuration. | ||
| ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr); | ||
| if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity, does this state change actually happen or is this preemptive? Just wondering if we can simplify until this becomes an actual issue? Can we just assert the type is the same?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was just preemptive. I've changed the logic to a stronger ASSERT() and removed unnecessary tests.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My latest commit restores this logic. The reason is that after discussing with @alyssawilk offline, I now prefer maintaining Another option, which would significantly simplify the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Based on a quick scan, I haven't found any filters that attempt a graceful close followed by a forced close (it tends to be one or the other, typically the latter is only used on error handling code paths prior to any other close()s being issued). However, this seems like a reasonable thing to support, and more importantly, the existing logic in the So, the modified proposal would be to only allow state transitions
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alyssawilk @mattklein123 I would prefer to create a second PR for changing the This PR should be safe to merge given the amount of review and test coverage added. What do you think about doing this in two steps and I can follow up with the API change after this PR is merged? I think we will end up with a cleaner history and more granular options for rolling back if the API change causes breakage.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure that's fine with me. |
||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlush; | ||
| } else { | ||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndWait; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| delayed_close_ = true; | ||
| const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0; | ||
|
|
||
| // NOTE: the delayed close timeout (if set) affects both FlushWrite and FlushWriteAndDelay | ||
| // closes: | ||
| // 1. For FlushWrite, the timeout sets an upper bound on how long to wait for the flush to | ||
| // complete before the connection is locally closed. | ||
| // 2. For FlushWriteAndDelay, the timeout specifies an upper bound on how long to wait for the | ||
| // flush to complete and the peer to close the connection before it is locally closed. | ||
|
|
||
| // All close types that follow do not actually close() the socket immediately so that buffered | ||
| // data can be written. However, we do want to stop reading to apply TCP backpressure. | ||
| read_enabled_ = false; | ||
|
|
||
| // Force a closeSocket() after the write buffer is flushed if the close_type calls for it or if | ||
| // no delayed close timeout is set. | ||
| close_after_flush_ = !delayed_close_timeout_set || type == ConnectionCloseType::FlushWrite; | ||
|
|
||
| // Create and activate a timer which will immediately close the connection if triggered. | ||
| // A config value of 0 disables the timeout. | ||
| // NOTE: At this point, it's already been validated that the connection is not already in | ||
| // delayed close processing and therefore the timer has not yet been created. | ||
| if (delayed_close_timeout_set) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. optional (style thing) having sanity checked that we couldn't call initializeDelayedCloseTimer() with inDelayedClose() true, I wonder if it's worth putting this in an else{} block just to make it super clear which branch we are on.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a comment. I would prefer not to unnecessarily indent unless you think it makes a large readability difference. |
||
| delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); }); | ||
| ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this, | ||
| delayedCloseTimeout().count()); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| initializeDelayedCloseTimer(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't we still get here with data_to_write > 0? I thought if we did close(DelayedCloseState::CloseAfterFlushAndTimeout) we didn't want to arm the timer until the flush was complete?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The timer is always armed when a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity. |
||
| delayed_close_state_ = (type == ConnectionCloseType::FlushWrite) | ||
| ? DelayedCloseState::CloseAfterFlush | ||
| : DelayedCloseState::CloseAfterFlushAndWait; | ||
| } else { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was all excited that I'd found a bug where we would let connections idle out forever but apparently that's WAI? Can I ask the API go from // A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's to // A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's maybe with some // .. attention:: flags or DANGER DANGER DANGER if you'd like to leak connections please set this to 0 :-P
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Added a |
||
| delayed_close_state_ = DelayedCloseState::CloseAfterFlush; | ||
| } | ||
|
|
||
| file_event_->setEnabled(Event::FileReadyType::Write | | ||
|
|
@@ -162,7 +171,7 @@ void ConnectionImpl::close(ConnectionCloseType type) { | |
| Connection::State ConnectionImpl::state() const { | ||
| if (!ioHandle().isOpen()) { | ||
| return State::Closed; | ||
| } else if (delayed_close_) { | ||
| } else if (inDelayedClose()) { | ||
| return State::Closing; | ||
| } else { | ||
| return State::Open; | ||
|
|
@@ -534,21 +543,37 @@ void ConnectionImpl::onWriteReady() { | |
| uint64_t new_buffer_size = write_buffer_->length(); | ||
| updateWriteBufferStats(result.bytes_processed_, new_buffer_size); | ||
|
|
||
| // NOTE: If the delayed_close_timer_ is set, it must only trigger after a delayed_close_timeout_ | ||
| // period of inactivity from the last write event. Therefore, the timer must be reset to its | ||
| // original timeout value unless the socket is going to be closed as a result of the doWrite(). | ||
|
|
||
| if (result.action_ == PostIoAction::Close) { | ||
| // It is possible (though unlikely) for the connection to have already been closed during the | ||
| // write callback. This can happen if we manage to complete the SSL handshake in the write | ||
| // callback, raise a connected event, and close the connection. | ||
| closeSocket(ConnectionEvent::RemoteClose); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to disable timer upon closeSocket() if timer has been set or in onDelayedCloseTimeout() check if io_handle is still open? Probably I missed something in the workflow as the old code didn't do so either. But it worth commenting out why we don't do that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just saw that closeSocket() disables timer. Sorry for my ignorant... |
||
| } else if ((close_after_flush_ && new_buffer_size == 0) || bothSidesHalfClosed()) { | ||
| } else if ((inDelayedClose() && new_buffer_size == 0) || bothSidesHalfClosed()) { | ||
| ENVOY_CONN_LOG(debug, "write flush complete", *this); | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } else if (result.action_ == PostIoAction::KeepOpen && result.bytes_processed_ > 0) { | ||
| for (BytesSentCb& cb : bytes_sent_callbacks_) { | ||
| cb(result.bytes_processed_); | ||
|
|
||
| // If a callback closes the socket, stop iterating. | ||
| if (!ioHandle().isOpen()) { | ||
| return; | ||
| if (delayed_close_state_ == DelayedCloseState::CloseAfterFlushAndWait) { | ||
| ASSERT(delayed_close_timer_ != nullptr); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think timer can still time out during flush write if there is long quiescence. Does resetting timer here take effect if the timer times out in this call stack? I'm not familiar of libevent, it is totally possible that event_del() already handles such case well.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked the libevent source and confirmed that |
||
| } else { | ||
| ASSERT(bothSidesHalfClosed() || delayed_close_state_ == DelayedCloseState::CloseAfterFlush); | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
| } else { | ||
| ASSERT(result.action_ == PostIoAction::KeepOpen); | ||
| if (delayed_close_timer_ != nullptr) { | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| } | ||
| if (result.bytes_processed_ > 0) { | ||
| for (BytesSentCb& cb : bytes_sent_callbacks_) { | ||
| cb(result.bytes_processed_); | ||
|
|
||
| // If a callback closes the socket, stop iterating. | ||
| if (!ioHandle().isOpen()) { | ||
| return; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -587,13 +612,22 @@ bool ConnectionImpl::bothSidesHalfClosed() { | |
| } | ||
|
|
||
| void ConnectionImpl::onDelayedCloseTimeout() { | ||
| delayed_close_timer_.reset(); | ||
| ENVOY_CONN_LOG(debug, "triggered delayed close", *this); | ||
| if (connection_stats_ != nullptr && connection_stats_->delayed_close_timeouts_ != nullptr) { | ||
| connection_stats_->delayed_close_timeouts_->inc(); | ||
| } | ||
| closeSocket(ConnectionEvent::LocalClose); | ||
| } | ||
|
|
||
| void ConnectionImpl::initializeDelayedCloseTimer() { | ||
| const auto timeout = delayedCloseTimeout().count(); | ||
| ASSERT(delayed_close_timer_ == nullptr && timeout > 0); | ||
| delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); }); | ||
| ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this, timeout); | ||
| delayed_close_timer_->enableTimer(delayedCloseTimeout()); | ||
| } | ||
|
|
||
| absl::string_view ConnectionImpl::transportFailureReason() const { | ||
| return transport_socket_->failureReason(); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we gracefully handle a caller doing
setDelayedCloseTimeout(timeout1)
close(ConnectionCloseType::FlushWriteAndDelay)
setDelayedCloseTimeout(timeout2)
close(ConnectionCloseType::FlushWriteAndDelay)
and I don't think we need to. However, do you think it's worth commenting that somewhere in the APIs and/or an assert in setDelayedCloseTimeout that you can't set-after-close or am I overthinking? Arguably doing anything after close is pretty sketchy but we are handling multiple close() calls for good reasons...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ASSERT() and comment.