Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
07b5ded
Enable delayed close timer after write flush. (Fixes #6392).
AndresGuedez Mar 29, 2019
82dc62d
Fix comment formatting.
AndresGuedez Mar 29, 2019
3f08a6d
Update documentation.
AndresGuedez Mar 29, 2019
eb58ad6
Test cleanup.
AndresGuedez Mar 29, 2019
b06bb0b
Enable delayed close timer on close().
AndresGuedez Apr 4, 2019
dd6d9ea
Clarify documentation and minor readability refactor.
AndresGuedez Apr 5, 2019
412f485
Merge remote-tracking branch 'upstream/master' into delayed-close-flu…
AndresGuedez Apr 8, 2019
f60db05
Cleanup.
AndresGuedez Apr 9, 2019
11008bd
Simplify logic to handle multiple close()s.
AndresGuedez Apr 9, 2019
aebc8cf
Fix spelling mistake in comment.
AndresGuedez Apr 9, 2019
0424594
Strengthen ASSERT()s and add comments for clarity.
AndresGuedez Apr 9, 2019
efad9c6
Minor cleanup.
AndresGuedez Apr 9, 2019
788e07a
Merge remote-tracking branch 'upstream/master' into delayed-close-flu…
AndresGuedez Apr 9, 2019
2d75b29
Cleanup.
AndresGuedez Apr 9, 2019
031f520
Cleanup.
AndresGuedez Apr 9, 2019
06ec3e9
Clarify comments.
AndresGuedez Apr 10, 2019
e9c0a73
Further comment clarification.
AndresGuedez Apr 10, 2019
adb0488
Clarify the accepted use of the 'type' argument for close().
AndresGuedez Apr 10, 2019
012bd0a
Add documentation about setting useful delayed close timeout values.
AndresGuedez Apr 10, 2019
a68c785
Revert "Simplify logic to handle multiple close()s."
AndresGuedez Apr 10, 2019
77ec586
s/CloseAfterFlushAndTimeout/CloseAfterFlushAndWait
AndresGuedez Apr 11, 2019
0e2248c
Merge remote-tracking branch 'upstream/master' into delayed-close-flu…
AndresGuedez Apr 11, 2019
2d3d44b
Update release notes.
AndresGuedez Apr 12, 2019
577729e
Merge remote-tracking branch 'upstream/master' into delayed-close-flu…
AndresGuedez Apr 12, 2019
7a6bceb
Fix version history after merge.
AndresGuedez Apr 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,14 @@ message HttpConnectionManager {

// The delayed close timeout is for downstream connections managed by the HTTP connection manager.
// It is defined as a grace period after connection close processing has been locally initiated
// during which Envoy will flush the write buffers for the connection and await the peer to close
// (i.e., a TCP FIN/RST is received by Envoy from the downstream connection).
// during which Envoy will wait for the peer to close (i.e., a TCP FIN/RST is received by Envoy
// from the downstream connection) prior to Envoy closing the socket associated with that
// connection.
// NOTE: This timeout is enforced even when the socket associated with the downstream connection
// is pending a flush of the write buffer. However, any progress made writing data to the socket
// will restart the timer associated with this timeout. This means that the total grace period for
// a socket in this state will be
// <total_time_waiting_for_write_buffer_flushes>+<delayed_close_timeout>.
//
// Delaying Envoy's connection close and giving the peer the opportunity to initiate the close
// sequence mitigates a race condition that exists when downstream clients do not drain/process
Expand All @@ -213,8 +219,15 @@ message HttpConnectionManager {
//
// The default timeout is 1000 ms if this option is not specified.
//
// A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed.
// .. NOTE::
// To be useful in avoiding the race condition described above, this timeout must be set
// to *at least* <max round trip time expected between clients and Envoy>+<100ms to account for
// a reasonsable "worst" case processing time for a full iteration of Envoy's event loop>.
//
// .. WARNING::
// A value of 0 will completely disable delayed close processing. When disabled, the downstream
// connection's socket will be closed immediately after the write flush is completed or will
// never close if the write flush does not complete.
google.protobuf.Duration delayed_close_timeout = 26 [(gogoproto.stdduration) = true];

// Configuration for :ref:`HTTP access logs <arch_overview_access_logs>`
Expand Down
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Version history
1.11.0 (Pending)
================
* dubbo_proxy: support the :ref:`Dubbo proxy filter <config_network_filters_dubbo_proxy>`.
* http: mitigated a race condition with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger while actively flushing a pending write buffer for a downstream connection.
* upstream: added :ref:`upstream_cx_pool_overflow <config_cluster_manager_cluster_stats>` for the connection pool circuit breaker.

1.10.0 (Apr 5, 2019)
Expand Down
1 change: 1 addition & 0 deletions include/envoy/network/connection.h
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ class Connection : public Event::DeferredDeletable, public FilterManager {

/**
* Set the timeout for delayed connection close()s.
* This can only be called prior to issuing a close() on the connection.
* @param timeout The timeout value in milliseconds
*/
virtual void setDelayedCloseTimeout(std::chrono::milliseconds timeout) PURE;
Expand Down
102 changes: 68 additions & 34 deletions source/common/network/connection_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ void ConnectionImpl::close(ConnectionCloseType type) {

uint64_t data_to_write = write_buffer_->length();
ENVOY_CONN_LOG(debug, "closing data_to_write={} type={}", *this, data_to_write, enumToInt(type));
const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0;
if (data_to_write == 0 || type == ConnectionCloseType::NoFlush ||
!transport_socket_->canFlushClose()) {
if (data_to_write > 0) {
Expand All @@ -107,13 +108,25 @@ void ConnectionImpl::close(ConnectionCloseType type) {
transport_socket_->doWrite(*write_buffer_, true);
}

closeSocket(ConnectionEvent::LocalClose);
if (type == ConnectionCloseType::FlushWriteAndDelay && delayed_close_timeout_set) {
// The socket is being closed and either there is no more data to write or the data can not be
// flushed (!transport_socket_->canFlushClose()). Since a delayed close has been requested,
// start the delayed close timer if it hasn't been done already by a previous close().
// NOTE: Even though the delayed_close_state_ is being set to CloseAfterFlushAndWait, since
// a write event is not being registered for the socket, this logic is simply setting the
// timer and waiting for it to trigger to close the socket.
if (!inDelayedClose()) {
initializeDelayedCloseTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we gracefully handle a caller doing
setDelayedCloseTimeout(timeout1)
close(ConnectionCloseType::FlushWriteAndDelay)
setDelayedCloseTimeout(timeout2)
close(ConnectionCloseType::FlushWriteAndDelay)
and I don't think we need to. However, do you think it's worth commenting that somewhere in the APIs and/or an assert in setDelayedCloseTimeout that you can't set-after-close or am I overthinking? Arguably doing anything after close is pretty sketchy but we are handling multiple close() calls for good reasons...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ASSERT() and comment.

delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndWait;
}
} else {
closeSocket(ConnectionEvent::LocalClose);
}
} else {
ASSERT(type == ConnectionCloseType::FlushWrite ||
type == ConnectionCloseType::FlushWriteAndDelay);

// No need to continue if a FlushWrite/FlushWriteAndDelay has already been issued and there is a
// pending delayed close.
// If there is a pending delayed close, simply update the delayed close state.
//
// An example of this condition manifests when a downstream connection is closed early by Envoy,
// such as when a route can't be matched:
Expand All @@ -123,35 +136,31 @@ void ConnectionImpl::close(ConnectionCloseType type) {
// ConnectionManagerImpl::checkForDeferredClose()
// 2) A second close is issued by a subsequent call to
// ConnectionManagerImpl::checkForDeferredClose() prior to returning from onData()
if (delayed_close_) {
if (inDelayedClose()) {
// Validate that a delayed close timer is already enabled unless it was disabled via
// configuration.
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, does this state change actually happen or is this preemptive? Just wondering if we can simplify until this becomes an actual issue? Can we just assert the type is the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just preemptive. I've changed the logic to a stronger ASSERT() and removed unnecessary tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest commit restores this logic. The reason is that after discussing with @alyssawilk offline, I now prefer maintaining Connection::close() API backwards compatibility with the existing behavior which allows ConnectionCloseType transitions between close() calls. This minimizes the (admittedly small) risk that an existing user of the API passing FlushWrite and FlushWriteAndDelay would break after this PR is merged. It also enforces consistent handling of type transitions by allowing all ConnectionCloseType transitions as opposed to special casing FlushWrite and FlushWriteAndDelay.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Copy link
Contributor Author

@AndresGuedez AndresGuedez Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Based on a quick scan, I haven't found any filters that attempt a graceful close followed by a forced close (it tends to be one or the other, typically the latter is only used on error handling code paths prior to any other close()s being issued). However, this seems like a reasonable thing to support, and more importantly, the existing logic in the ConnectionImpl destructor forces support for X -> NoFlush transitions since a "just in case" close(NoFlush) is always attempted on destruction. This could be changed, but it seems completely reasonable to me given that leaking connections is much worse than not fully flushing the write buffer or avoiding the race that the delayed_close_timeout is mitigating.

So, the modified proposal would be to only allow state transitions ConnectionCloseType::* -> ConnectionCloseType::NoFlush after the first close() has been issued. This still achieves the goal of simplifying the logic while providing some defense against bugs.

Copy link
Contributor Author

@AndresGuedez AndresGuedez Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk @mattklein123 I would prefer to create a second PR for changing the Connection::close() API to enforce the state transition constraints.

This PR should be safe to merge given the amount of review and test coverage added. What do you think about doing this in two steps and I can follow up with the API change after this PR is merged? I think we will end up with a cleaner history and more granular options for rolling back if the API change causes breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine with me.

delayed_close_state_ = DelayedCloseState::CloseAfterFlush;
} else {
delayed_close_state_ = DelayedCloseState::CloseAfterFlushAndWait;
}
return;
}

delayed_close_ = true;
const bool delayed_close_timeout_set = delayedCloseTimeout().count() > 0;

// NOTE: the delayed close timeout (if set) affects both FlushWrite and FlushWriteAndDelay
// closes:
// 1. For FlushWrite, the timeout sets an upper bound on how long to wait for the flush to
// complete before the connection is locally closed.
// 2. For FlushWriteAndDelay, the timeout specifies an upper bound on how long to wait for the
// flush to complete and the peer to close the connection before it is locally closed.

// All close types that follow do not actually close() the socket immediately so that buffered
// data can be written. However, we do want to stop reading to apply TCP backpressure.
read_enabled_ = false;

// Force a closeSocket() after the write buffer is flushed if the close_type calls for it or if
// no delayed close timeout is set.
close_after_flush_ = !delayed_close_timeout_set || type == ConnectionCloseType::FlushWrite;

// Create and activate a timer which will immediately close the connection if triggered.
// A config value of 0 disables the timeout.
// NOTE: At this point, it's already been validated that the connection is not already in
// delayed close processing and therefore the timer has not yet been created.
if (delayed_close_timeout_set) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional (style thing) having sanity checked that we couldn't call initializeDelayedCloseTimer() with inDelayedClose() true, I wonder if it's worth putting this in an else{} block just to make it super clear which branch we are on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. I would prefer not to unnecessarily indent unless you think it makes a large readability difference.

delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); });
ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this,
delayedCloseTimeout().count());
delayed_close_timer_->enableTimer(delayedCloseTimeout());
initializeDelayedCloseTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we still get here with data_to_write > 0? I thought if we did close(DelayedCloseState::CloseAfterFlushAndTimeout) we didn't want to arm the timer until the flush was complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer is always armed when a close(FlushWrite) or close(FlushWriteAndDelay) is issued. The only difference between the two close types is that the socket is immediately closed after the flush with the former, while the delayed_close_timeout_ is allowed to expire and trigger with the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P

I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity.

delayed_close_state_ = (type == ConnectionCloseType::FlushWrite)
? DelayedCloseState::CloseAfterFlush
: DelayedCloseState::CloseAfterFlushAndWait;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was all excited that I'd found a bug where we would let connections idle out forever but apparently that's WAI?

Can I ask the API go from

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed.

to

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed and will never close if the write flush does not complete

maybe with some // .. attention:: flags or DANGER DANGER DANGER if you'd like to leak connections please set this to 0 :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a .. WARNING::.

delayed_close_state_ = DelayedCloseState::CloseAfterFlush;
}

file_event_->setEnabled(Event::FileReadyType::Write |
Expand All @@ -162,7 +171,7 @@ void ConnectionImpl::close(ConnectionCloseType type) {
Connection::State ConnectionImpl::state() const {
if (!ioHandle().isOpen()) {
return State::Closed;
} else if (delayed_close_) {
} else if (inDelayedClose()) {
return State::Closing;
} else {
return State::Open;
Expand Down Expand Up @@ -534,21 +543,37 @@ void ConnectionImpl::onWriteReady() {
uint64_t new_buffer_size = write_buffer_->length();
updateWriteBufferStats(result.bytes_processed_, new_buffer_size);

// NOTE: If the delayed_close_timer_ is set, it must only trigger after a delayed_close_timeout_
// period of inactivity from the last write event. Therefore, the timer must be reset to its
// original timeout value unless the socket is going to be closed as a result of the doWrite().

if (result.action_ == PostIoAction::Close) {
// It is possible (though unlikely) for the connection to have already been closed during the
// write callback. This can happen if we manage to complete the SSL handshake in the write
// callback, raise a connected event, and close the connection.
closeSocket(ConnectionEvent::RemoteClose);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to disable timer upon closeSocket() if timer has been set or in onDelayedCloseTimeout() check if io_handle is still open? Probably I missed something in the workflow as the old code didn't do so either. But it worth commenting out why we don't do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw that closeSocket() disables timer. Sorry for my ignorant...

} else if ((close_after_flush_ && new_buffer_size == 0) || bothSidesHalfClosed()) {
} else if ((inDelayedClose() && new_buffer_size == 0) || bothSidesHalfClosed()) {
ENVOY_CONN_LOG(debug, "write flush complete", *this);
closeSocket(ConnectionEvent::LocalClose);
} else if (result.action_ == PostIoAction::KeepOpen && result.bytes_processed_ > 0) {
for (BytesSentCb& cb : bytes_sent_callbacks_) {
cb(result.bytes_processed_);

// If a callback closes the socket, stop iterating.
if (!ioHandle().isOpen()) {
return;
if (delayed_close_state_ == DelayedCloseState::CloseAfterFlushAndWait) {
ASSERT(delayed_close_timer_ != nullptr);
delayed_close_timer_->enableTimer(delayedCloseTimeout());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think timer can still time out during flush write if there is long quiescence. Does resetting timer here take effect if the timer times out in this call stack? I'm not familiar of libevent, it is totally possible that event_del() already handles such case well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the libevent source and confirmed that event_add() on an already active timer will correctly handle the case of ignoring a pending callback if the timer has triggered and subsequently re-enqueuing it with the specified timeout.

} else {
ASSERT(bothSidesHalfClosed() || delayed_close_state_ == DelayedCloseState::CloseAfterFlush);
closeSocket(ConnectionEvent::LocalClose);
}
} else {
ASSERT(result.action_ == PostIoAction::KeepOpen);
if (delayed_close_timer_ != nullptr) {
delayed_close_timer_->enableTimer(delayedCloseTimeout());
}
if (result.bytes_processed_ > 0) {
for (BytesSentCb& cb : bytes_sent_callbacks_) {
cb(result.bytes_processed_);

// If a callback closes the socket, stop iterating.
if (!ioHandle().isOpen()) {
return;
}
}
}
}
Expand Down Expand Up @@ -587,13 +612,22 @@ bool ConnectionImpl::bothSidesHalfClosed() {
}

void ConnectionImpl::onDelayedCloseTimeout() {
delayed_close_timer_.reset();
ENVOY_CONN_LOG(debug, "triggered delayed close", *this);
if (connection_stats_ != nullptr && connection_stats_->delayed_close_timeouts_ != nullptr) {
connection_stats_->delayed_close_timeouts_->inc();
}
closeSocket(ConnectionEvent::LocalClose);
}

void ConnectionImpl::initializeDelayedCloseTimer() {
const auto timeout = delayedCloseTimeout().count();
ASSERT(delayed_close_timer_ == nullptr && timeout > 0);
delayed_close_timer_ = dispatcher_.createTimer([this]() -> void { onDelayedCloseTimeout(); });
ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this, timeout);
delayed_close_timer_->enableTimer(delayedCloseTimeout());
}

absl::string_view ConnectionImpl::transportFailureReason() const {
return transport_socket_->failureReason();
}
Expand Down
22 changes: 20 additions & 2 deletions source/common/network/connection_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ class ConnectionImpl : public virtual Connection,
static uint64_t nextGlobalIdForTest() { return next_global_id_; }

void setDelayedCloseTimeout(std::chrono::milliseconds timeout) override {
// Validate that this is only called prior to issuing a close() or closeSocket().
ASSERT(delayed_close_timer_ == nullptr && ioHandle().isOpen());
delayed_close_timeout_ = timeout;
}
std::chrono::milliseconds delayedCloseTimeout() const override { return delayed_close_timeout_; }
Expand Down Expand Up @@ -167,16 +169,32 @@ class ConnectionImpl : public virtual Connection,
// Callback issued when a delayed close timeout triggers.
void onDelayedCloseTimeout();

void initializeDelayedCloseTimer();
bool inDelayedClose() const { return delayed_close_state_ != DelayedCloseState::None; }

static std::atomic<uint64_t> next_global_id_;

// States associated with delayed closing of the connection (i.e., when the underlying socket is
// not immediately close()d as a result of a ConnectionImpl::close()).
enum class DelayedCloseState {
None,
// The socket will be closed immediately after the buffer is flushed _or_ if a period of
// inactivity after the last write event greater than or equal to delayed_close_timeout_ has
// elapsed.
CloseAfterFlush,
// The socket will be closed after a grace period of delayed_close_timeout_ has elapsed after
// the socket is flushed _or_ if a period of inactivity after the last write event greater than
// or equal to delayed_close_timeout_ has elapsed.
CloseAfterFlushAndWait
};
DelayedCloseState delayed_close_state_{DelayedCloseState::None};

Event::Dispatcher& dispatcher_;
const uint64_t id_;
Event::TimerPtr delayed_close_timer_;
std::list<ConnectionCallbacks*> callbacks_;
std::list<BytesSentCb> bytes_sent_callbacks_;
bool read_enabled_{true};
bool close_after_flush_{false};
bool delayed_close_{false};
bool above_high_watermark_{false};
bool detect_early_close_{true};
bool enable_half_close_{false};
Expand Down
Loading