Skip to content

http: mitigate delayed close timeout race with connection write buffer flush#6437

Merged
mattklein123 merged 25 commits intoenvoyproxy:masterfrom
AndresGuedez:delayed-close-flush-race
Apr 12, 2019
Merged

http: mitigate delayed close timeout race with connection write buffer flush#6437
mattklein123 merged 25 commits intoenvoyproxy:masterfrom
AndresGuedez:delayed-close-flush-race

Conversation

@AndresGuedez
Copy link
Contributor

@AndresGuedez AndresGuedez commented Mar 29, 2019

Change the behavior of the delayed_close_timeout such that it won't trigger unless there has been at least a delayed_close_timeout period of inactivity after the last write event on the socket pending to be closed.

This mitigates a race where a slow client and/or low timeout value would cause the socket to be closed while data was actively being written to the socket. Note that this change does not eliminate this race since a slow client could still be considered idle by the updated timeout logic, but this should be very rare when useful values (i.e., >1s to avoid the race condition on close that this timer addresses) are configured.

Risk Level: Medium
Testing: New unit tests added
Docs Changes: Updated version history and HttpConnectionManager proto doc
Fixes #6392

This commit changes the behavior of the delayed close timeout so that the timer is only enabled
after a connection's write buffer is flushed.

Previously, this timeout was serving double duty as an upper bound on the time that a connection
would spend waiting for a write flush as well. However, this behavior created a race where a small
delayed close timeout or slow clients could cause the timer to trigger and the connection to be
closed prior to the write buffer being fully flushed.

Therefore, this commit simplifies the logic and eliminates the idle timer behavior.

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>

EXPECT_EQ(test_server_->counter("tcp.tcp_stats.upstream_flush_total")->value(), 1);
EXPECT_EQ(test_server_->gauge("tcp.tcp_stats.upstream_flush_active")->value(), 0);
test_server_->waitForGaugeEq("tcp.tcp_stats.upstream_flush_active", 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes flakiness I experienced while testing this PR.

@alyssawilk PTAL since you also recently addressed flakiness in this test via #5919.

@mattklein123
Copy link
Member

@AndresGuedez @alyssawilk high level question: Do we need two timers here? It seems like we still need an outer bound timeout in which to wait for the data to flush? I'm not sure this path is covered by any other timer at this point in the flow? (I might be not recalling this correctly.)

@AndresGuedez
Copy link
Contributor Author

@AndresGuedez @alyssawilk high level question: Do we need two timers here? It seems like we still need an outer bound timeout in which to wait for the data to flush? I'm not sure this path is covered by any other timer at this point in the flow? (I might be not recalling this correctly.)

My thinking on this right now is that idle timeouts at a higher layer than the ConnectionImpl (such as the HttpConnectionManager idle_timeout) should be responsible for that upper bound for the data flush. This allows us to avoid the complexity of another timer/timeout in the already fairly complicated ConnectionImpl logic and also reduces ambiguity for users when configuring idle timeouts.

Are there edge cases in the connection life cycle that wouldn't be covered by these higher layer idle timeouts?

@mattklein123
Copy link
Member

Are there edge cases in the connection life cycle that wouldn't be covered by these higher layer idle timeouts?

I'm not sure, I don't remember. I can't remember when the HTTP idle timer gets re-armed. Let me review the code a bit over the next few days. Thanks a ton for jumping on this.

@mattklein123
Copy link
Member

Let me review the code a bit over the next few days.

I just reviewed the code in conn_manager_impl, and I agree that the idle timer should still cover this case and we would re-arm the timer and then eventually timeout and close the connection if we are not able to fully flush it and wait for the other side to respond. @alyssawilk can you check me on this?

Assuming we agree ^ is true this seems like a good thing to do, although I wonder if there are any other callers of this that might now hang without any guarding timeout? @alyssawilk are you interested in taking the initial pass on this review?

@alyssawilk
Copy link
Contributor

Yeah, I think in theory at the point we've set up a drain-close the socket should read-disabled (we do not get to actively use it again) and either it flushes and we close after the timeout interval, or it does not flush and the connection_idle_timer_ goes off. Looks like the idle timer does FlushWrite so as long as we have UTs for FlushWriteAndDelay followed by FlushWrite we should be OK.

Happy to take a look but it'll have to be Wednesday as I'm booked the rest of today and out tomorrow :-(

@AndresGuedez
Copy link
Contributor Author

AndresGuedez commented Apr 2, 2019

Yeah, I think in theory at the point we've set up a drain-close the socket should read-disabled (we do not get to actively use it again) and either it flushes and we close after the timeout interval, or it does not flush and the connection_idle_timer_ goes off. Looks like the idle timer does FlushWrite so as long as we have UTs for FlushWriteAndDelay followed by FlushWrite we should be OK.

Happy to take a look but it'll have to be Wednesday as I'm booked the rest of today and out tomorrow :-(

Thanks for taking a look!

I revisited the (connection) idle timeout logic in the ConnectionManagerImpl and I am now revising my approach to this fix. The issue that the idle timeout triggers either a read_callbacks_->connection().close(ConnectionCloseType::FlushWrite) or a connection->close(ConnectionCloseType::FlushWriteAndDelay) at the end of the drain sequence, and with my current approach, both of those close types would lead to an unbounded wait for a flush.

So I see a couple of viable options:

  1. Continue with the current approach of this PR (the delayed close timer is only activated after flush) and another timer/timeout would be required in the ConnectionManagerImpl to ensure that the onIdleTimeout close sequence ends with a close(ConnectionCloseType::NoFlush) leading to an immediate close, or
  2. Go back to the existing implementation of the delayed close timer being enabled as soon as the FlushWrite or FlushWriteAndDelay close is issued, and simply reset (disable and re-enable) the timer when an onWriteReady event is triggered for the connection. This prevents the timer triggering as the write buffer is being flushed while preserving the idle timeout semantics within the ConnectionImpl layer. AFAICT timer enable/disable operations are O(log(N)) in libevent, which should be performant enough given that I expect only a small subset of connections will require more than a single onWriteReady event to flush the entirety of the write buffer prior to closing the socket.

I am actively working on option 2. since I think it leads to simpler logic in the ConnectionManagerImpl (no need for yet another timeout) while adding a small amount of extra complexity in the ConnectionImpl.

@mattklein123
Copy link
Member

@AndresGuedez (2) sounds great to me.

Revert the previous approach of enabling the delayed close timer only after the flush is completed.
Instead, the timer is enabled when a close(FlushWrite) or close(FlushWriteAndDelay) is issued and
the timer is reset when the onWriteReady event fires and data is written to the socket. This
prevents the race between flushing and the timer triggering while preserving an upper bound on
inactivity after the close() is issued.

Signed-off-by: Andres Guedez <aguedez@google.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks right at at high level. I definitely want to another pass Monday with a pencil and paper and caffeine, but here's a couple of nits in the interim.

// from the downstream connection) prior to Envoy closing the socket associated with that
// connection. Note that this timeout may trigger a socket close even when Envoy's write
// buffer has not been fully flushed after close processing is initiated; this happens when Envoy
// is unable to write data to the socket for an interval greater than or equal to this timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment could use clarification, as it reads more like if it's "can not write any data to the socket during T" or "can not write all data to the socket during T" and it looks like the grace period is absolute from the close call rather than a progress check.

* http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged.
* http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data.
* http: added encodeComplete/decodeComplete. These are invoked at the end of the stream, after all data has been encoded/decoded respectively. Default implementation is a no-op.
* http: fixed a bug with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger prior to flushing the write buffer for the downstream connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update here as well? It can still trigger, but won't trigger if progress is being made?

if (delayed_close_) {
if (inDelayedClose()) {
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
delayed_close_state_ = (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this out for readability?

Signed-off-by: Andres Guedez <aguedez@google.com>
@AndresGuedez AndresGuedez force-pushed the delayed-close-flush-race branch from 7c02143 to dd6d9ea Compare April 5, 2019 14:32
@AndresGuedez
Copy link
Contributor Author

I had to force push to fix missing DCO in the last commit. @alyssawilk apologies if this causes some review history issues.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is great. At a high level LGTM. Some small questions/comments.

/wait

* http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged.
* http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data.
* http: added encodeComplete/decodeComplete. These are invoked at the end of the stream, after all data has been encoded/decoded respectively. Default implementation is a no-op.
* http: fixed a bug with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger while actively flushing a pending write buffer for a downstream connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge master again and move this to 1.11.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// States associated with delayed closing of the connection (i.e., when the underlying socket is
// not immediately close()d as a result of a ConnectionImpl::close()).
enum class DelayedCloseState { None, CloseAfterFlush, CloseAfterFlushAndTimeout };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments for each state? It's not immediately clear what the difference is between the latter 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Disable the delayed close timer since data is still being flushed. The timer should only
// trigger after a delayedCloseTimeout() period of inactivity.
if (delayed_close_timer_ != nullptr) {
delayed_close_timer_->disableTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to disable the timer here? Won't we boost it below? (Mainly just wondering if we can simplify slightly but there might be a good reason to keep this disable here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed that libevent's event_add() would reset an already active timer with the newly provided timeout value. As you point out, this is not necessary and I have removed it.

// Validate that a delayed close timer is already enabled unless it was disabled via
// configuration.
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, does this state change actually happen or is this preemptive? Just wondering if we can simplify until this becomes an actual issue? Can we just assert the type is the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just preemptive. I've changed the logic to a stronger ASSERT() and removed unnecessary tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest commit restores this logic. The reason is that after discussing with @alyssawilk offline, I now prefer maintaining Connection::close() API backwards compatibility with the existing behavior which allows ConnectionCloseType transitions between close() calls. This minimizes the (admittedly small) risk that an existing user of the API passing FlushWrite and FlushWriteAndDelay would break after this PR is merged. It also enforces consistent handling of type transitions by allowing all ConnectionCloseType transitions as opposed to special casing FlushWrite and FlushWriteAndDelay.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Copy link
Contributor Author

@AndresGuedez AndresGuedez Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Based on a quick scan, I haven't found any filters that attempt a graceful close followed by a forced close (it tends to be one or the other, typically the latter is only used on error handling code paths prior to any other close()s being issued). However, this seems like a reasonable thing to support, and more importantly, the existing logic in the ConnectionImpl destructor forces support for X -> NoFlush transitions since a "just in case" close(NoFlush) is always attempted on destruction. This could be changed, but it seems completely reasonable to me given that leaking connections is much worse than not fully flushing the write buffer or avoiding the race that the delayed_close_timeout is mitigating.

So, the modified proposal would be to only allow state transitions ConnectionCloseType::* -> ConnectionCloseType::NoFlush after the first close() has been issued. This still achieves the goal of simplifying the logic while providing some defense against bugs.

Copy link
Contributor Author

@AndresGuedez AndresGuedez Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk @mattklein123 I would prefer to create a second PR for changing the Connection::close() API to enforce the state transition constraints.

This PR should be safe to merge given the amount of review and test coverage added. What do you think about doing this in two steps and I can follow up with the API change after this PR is merged? I think we will end up with a cleaner history and more granular options for rolling back if the API change causes breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine with me.

…sh-race

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
// is pending a flush of the write buffer. However, any progress made writing data to the socket
// will restart the timer associated with this timeout. This means that the total grace period for
// a socket in this state will be
// <delayed_close_timeout>+<total_time_waiting_for_write_buffer_flushes>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, much more clear, thanks.

One last super nitty nit, I'd reverse the order since the flush happens first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// been requested, start the delayed close timer if it hasn't been done already by a previous
// close().
if (!inDelayedClose()) {
initializeDelayedCloseTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we gracefully handle a caller doing
setDelayedCloseTimeout(timeout1)
close(ConnectionCloseType::FlushWriteAndDelay)
setDelayedCloseTimeout(timeout2)
close(ConnectionCloseType::FlushWriteAndDelay)
and I don't think we need to. However, do you think it's worth commenting that somewhere in the APIs and/or an assert in setDelayedCloseTimeout that you can't set-after-close or am I overthinking? Arguably doing anything after close is pretty sketchy but we are handling multiple close() calls for good reasons...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ASSERT() and comment.


// Create and activate a timer which will immediately close the connection if triggered.
// A config value of 0 disables the timeout.
if (delayed_close_timeout_set) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional (style thing) having sanity checked that we couldn't call initializeDelayedCloseTimer() with inDelayedClose() true, I wonder if it's worth putting this in an else{} block just to make it super clear which branch we are on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. I would prefer not to unnecessarily indent unless you think it makes a large readability difference.

// Validate that a delayed close timer is already enabled unless it was disabled via
// configuration.
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
// Validate that the same close type is used when multiple close()s are issued. An edge case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we make it clear in the class definition what transitions are allowed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a comment to the DelayedCloseState enum declaration.

ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this,
delayedCloseTimeout().count());
delayed_close_timer_->enableTimer(delayedCloseTimeout());
initializeDelayedCloseTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we still get here with data_to_write > 0? I thought if we did close(DelayedCloseState::CloseAfterFlushAndTimeout) we didn't want to arm the timer until the flush was complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer is always armed when a close(FlushWrite) or close(FlushWriteAndDelay) is issued. The only difference between the two close types is that the socket is immediately closed after the flush with the former, while the delayed_close_timeout_ is allowed to expire and trigger with the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P

I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity.

Signed-off-by: Andres Guedez <aguedez@google.com>
@AndresGuedez AndresGuedez changed the title http: fix delayed close timeout race with connection write buffer flush http: mitigate delayed close timeout race with connection write buffer flush Apr 9, 2019
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks as always for your patience - I think we're almost there! Just a few nits from me, and I'll wait back to hear on Dan's libevent question

ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this,
delayedCloseTimeout().count());
delayed_close_timer_->enableTimer(delayedCloseTimeout());
initializeDelayedCloseTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P

I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity.

}

void ConnectionImpl::onDelayedCloseTimeout() {
delayed_close_timer_.reset(nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just usually reset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

delayed_close_state_ = (type == ConnectionCloseType::FlushWrite)
? DelayedCloseState::CloseAfterFlush
: DelayedCloseState::CloseAfterFlushAndTimeout;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was all excited that I'd found a bug where we would let connections idle out forever but apparently that's WAI?

Can I ask the API go from

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed.

to

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed and will never close if the write flush does not complete

maybe with some // .. attention:: flags or DANGER DANGER DANGER if you'd like to leak connections please set this to 0 :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a .. WARNING::.

@alyssawilk
Copy link
Contributor

Oh also please master merge for coverage - I would like to make sure we have all the branches covered


// Test that the delayed close timer is reset while write flushes are happening when a connection is
// in delayed close mode.
TEST_P(ConnectionImplTest, DelayedCloseTimerResetWithPendingWriteBufferFlushes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and the one above are pretty much the same other than EXPECT_CALL(*transport_socket, doWrite...). Can you extract common code into helper function or combine these 2 tests into one, i.e. EXPECT_CALL(... doWrite()) twice with different return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapsed into a single test. Simplifications made throughout this PR made the second test largely redundant as you point out.

// It is possible (though unlikely) for the connection to have already been closed during the
// write callback. This can happen if we manage to complete the SSL handshake in the write
// callback, raise a connected event, and close the connection.
closeSocket(ConnectionEvent::RemoteClose);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw that closeSocket() disables timer. Sorry for my ignorant...

…sh-race

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
- Decompose ASSERT() for readability.
- Collapse very similar tests into one.

Signed-off-by: Andres Guedez <aguedez@google.com>

closeSocket(ConnectionEvent::LocalClose);
if (type == ConnectionCloseType::FlushWriteAndDelay && delayed_close_timeout_set) {
// The socket is being closed and there is no more data to write. Since a delayed close has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After one more pass, is this comment correct? Arguably we can also get here where there is data to write, but we're in the midst of a tls/alts handshake and have determined that flushing is pointless (canFlushClose is false) because the write could not flush all data.

I think the canFlushClose() was added for the case where you have plaintext payload queued up behind an unfinished crypto handshake at which point the payload isn't going to get flushed and given old style options you should give up and close immediately.

That said, if there were some crypto protocol where there were large bidi frames, I can imagine the same race we have for HTTP where the FIN + RST lagged behind a large client side "no I am rejecting your handshake" write which is queued in the kernel and so we want a one interval delay for the client to get that response.

I think if we're in that case, the connection is going to observe the transport socket is "blocked", and the alarm will fire after one interval (handling any race) so the code is doing the right thing and the comment can just be tweaked a bit for clarity. Sound right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks for pointing this out. I have clarified the comment.

That said, if there were some crypto protocol where there were large bidi frames, I can imagine the same race we have for HTTP where the FIN + RST lagged behind a large client side "no I am rejecting your handshake" write which is queued in the kernel and so we want a one interval delay for the client to get that response.

I think if we're in that case, the connection is going to observe the transport socket is "blocked", and the alarm will fire after one interval (handling any race) so the code is doing the right thing and the comment can just be tweaked a bit for clarity. Sound right?

Yeah, I agree with this. I revisited the logic for the transport sockets that return canFlushClose() == false and at least for the TLS (SSL) socket, it doesn't seem right that canFlushClose() is conditional on the handshake completing. TLS alerts are transmitted during handshake failures and should be allowed to flush as well. I'll file an issue to follow up on this.

// Validate that a delayed close timer is already enabled unless it was disabled via
// configuration.
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
// Validate that the same close type is used when multiple close()s are issued. An edge case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
This reverts commit 11008bd and allows callers of the
Connection::close() API to change the 'type' arg between 'FlushWrite' and 'FlushWriteAndDelay' when
issuing multiple close() calls on the same connection.

The new logic matches the existing behavior which allows any 'type' transition between 'close()'
calls and eliminates the risk that changing the behavior of the API will break existing external
uses (via third party filters) of the Connnection API.

Signed-off-by: Andres Guedez <aguedez@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking great. 2 more comments for discussion.

/wait-any

// The socket will be closed after a grace period of delayed_close_timeout_ has elapsed after
// the socket is flushed _or_ if a period of inactivity after the last write event greater than
// or equal to delayed_close_timeout_ has elapsed.
CloseAfterFlushAndTimeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: WDYT about calling this CloseAfterFlushAndDelay? Both of these states involve timeouts so IMO this is a little confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered CloseAfterFlushAndDelay but ultimately decided against it because both CloseAfterFlush and CloseAfterFlushAndTimeout introduce a delay into close() processing by waiting for either the flush or the <flush+timer trigger> to happen.

I could rename to CloseAfterFlushAndWait if you think that's clearer than CloseAfterFlushAndTimeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do think Wait is more clear than Timeout in this case if you don't mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Validate that a delayed close timer is already enabled unless it was disabled via
// configuration.
ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Signed-off-by: Andres Guedez <aguedez@google.com>
…sh-race

Signed-off-by: Andres Guedez <aguedez@google.com>
mattklein123
mattklein123 previously approved these changes Apr 11, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks. @danzh2010 any further comments?

Signed-off-by: Andres Guedez <aguedez@google.com>
…sh-race

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
@AndresGuedez
Copy link
Contributor Author

I merged master to resolve version history conflict. Thanks for the reviews everyone!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@mattklein123 mattklein123 merged commit cdaeb13 into envoyproxy:master Apr 12, 2019
duderino pushed a commit to istio/envoy that referenced this pull request Aug 13, 2019
…r flush (envoyproxy#6437)

Change the behavior of the delayed_close_timeout such that it won't trigger unless there
has been at least a delayed_close_timeout period of inactivity after the last write event on
the socket pending to be closed.

This mitigates a race where a slow client and/or low timeout value would cause the socket
to be closed while data was actively being written to the socket. Note that this change does
not eliminate this race since a slow client could still be considered idle by the updated timeout
logic, but this should be very rare when useful values (i.e., >1s to avoid the race condition on
close that this timer addresses) are configured.

Risk Level: Medium
Testing: New unit tests added
Docs Changes: Updated version history and HttpConnectionManager proto doc
Fixes envoyproxy#6392

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

delayed_close_timeout - could it actually speed up connection closing, instead of delaying?

4 participants