http: mitigate delayed close timeout race with connection write buffer flush by AndresGuedez · Pull Request #6437 · envoyproxy/envoy

AndresGuedez · 2019-03-29T18:47:55Z

Change the behavior of the delayed_close_timeout such that it won't trigger unless there has been at least a delayed_close_timeout period of inactivity after the last write event on the socket pending to be closed.

This mitigates a race where a slow client and/or low timeout value would cause the socket to be closed while data was actively being written to the socket. Note that this change does not eliminate this race since a slow client could still be considered idle by the updated timeout logic, but this should be very rare when useful values (i.e., >1s to avoid the race condition on close that this timer addresses) are configured.

Risk Level: Medium
Testing: New unit tests added
Docs Changes: Updated version history and HttpConnectionManager proto doc
Fixes #6392

This commit changes the behavior of the delayed close timeout so that the timer is only enabled after a connection's write buffer is flushed. Previously, this timeout was serving double duty as an upper bound on the time that a connection would spend waiting for a write flush as well. However, this behavior created a race where a small delayed close timeout or slow clients could cause the timer to trigger and the connection to be closed prior to the write buffer being fully flushed. Therefore, this commit simplifies the logic and eliminates the idle timer behavior. Signed-off-by: Andres Guedez <aguedez@google.com>

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez · 2019-03-29T18:51:06Z

test/integration/tcp_proxy_integration_test.cc


  EXPECT_EQ(test_server_->counter("tcp.tcp_stats.upstream_flush_total")->value(), 1);
-  EXPECT_EQ(test_server_->gauge("tcp.tcp_stats.upstream_flush_active")->value(), 0);
+  test_server_->waitForGaugeEq("tcp.tcp_stats.upstream_flush_active", 0);


This change fixes flakiness I experienced while testing this PR.

@alyssawilk PTAL since you also recently addressed flakiness in this test via #5919.

mattklein123 · 2019-03-29T20:38:02Z

@AndresGuedez @alyssawilk high level question: Do we need two timers here? It seems like we still need an outer bound timeout in which to wait for the data to flush? I'm not sure this path is covered by any other timer at this point in the flow? (I might be not recalling this correctly.)

AndresGuedez · 2019-03-29T20:44:09Z

@AndresGuedez @alyssawilk high level question: Do we need two timers here? It seems like we still need an outer bound timeout in which to wait for the data to flush? I'm not sure this path is covered by any other timer at this point in the flow? (I might be not recalling this correctly.)

My thinking on this right now is that idle timeouts at a higher layer than the ConnectionImpl (such as the HttpConnectionManager idle_timeout) should be responsible for that upper bound for the data flush. This allows us to avoid the complexity of another timer/timeout in the already fairly complicated ConnectionImpl logic and also reduces ambiguity for users when configuring idle timeouts.

Are there edge cases in the connection life cycle that wouldn't be covered by these higher layer idle timeouts?

mattklein123 · 2019-03-29T20:49:12Z

Are there edge cases in the connection life cycle that wouldn't be covered by these higher layer idle timeouts?

I'm not sure, I don't remember. I can't remember when the HTTP idle timer gets re-armed. Let me review the code a bit over the next few days. Thanks a ton for jumping on this.

mattklein123 · 2019-04-01T18:40:08Z

Let me review the code a bit over the next few days.

I just reviewed the code in conn_manager_impl, and I agree that the idle timer should still cover this case and we would re-arm the timer and then eventually timeout and close the connection if we are not able to fully flush it and wait for the other side to respond. @alyssawilk can you check me on this?

Assuming we agree ^ is true this seems like a good thing to do, although I wonder if there are any other callers of this that might now hang without any guarding timeout? @alyssawilk are you interested in taking the initial pass on this review?

alyssawilk · 2019-04-01T19:58:06Z

Yeah, I think in theory at the point we've set up a drain-close the socket should read-disabled (we do not get to actively use it again) and either it flushes and we close after the timeout interval, or it does not flush and the connection_idle_timer_ goes off. Looks like the idle timer does FlushWrite so as long as we have UTs for FlushWriteAndDelay followed by FlushWrite we should be OK.

Happy to take a look but it'll have to be Wednesday as I'm booked the rest of today and out tomorrow :-(

AndresGuedez · 2019-04-02T14:40:42Z

Yeah, I think in theory at the point we've set up a drain-close the socket should read-disabled (we do not get to actively use it again) and either it flushes and we close after the timeout interval, or it does not flush and the connection_idle_timer_ goes off. Looks like the idle timer does FlushWrite so as long as we have UTs for FlushWriteAndDelay followed by FlushWrite we should be OK.

Happy to take a look but it'll have to be Wednesday as I'm booked the rest of today and out tomorrow :-(

Thanks for taking a look!

I revisited the (connection) idle timeout logic in the ConnectionManagerImpl and I am now revising my approach to this fix. The issue that the idle timeout triggers either a read_callbacks_->connection().close(ConnectionCloseType::FlushWrite) or a connection->close(ConnectionCloseType::FlushWriteAndDelay) at the end of the drain sequence, and with my current approach, both of those close types would lead to an unbounded wait for a flush.

So I see a couple of viable options:

Continue with the current approach of this PR (the delayed close timer is only activated after flush) and another timer/timeout would be required in the ConnectionManagerImpl to ensure that the onIdleTimeout close sequence ends with a close(ConnectionCloseType::NoFlush) leading to an immediate close, or
Go back to the existing implementation of the delayed close timer being enabled as soon as the FlushWrite or FlushWriteAndDelay close is issued, and simply reset (disable and re-enable) the timer when an onWriteReady event is triggered for the connection. This prevents the timer triggering as the write buffer is being flushed while preserving the idle timeout semantics within the ConnectionImpl layer. AFAICT timer enable/disable operations are O(log(N)) in libevent, which should be performant enough given that I expect only a small subset of connections will require more than a single onWriteReady event to flush the entirety of the write buffer prior to closing the socket.

I am actively working on option 2. since I think it leads to simpler logic in the ConnectionManagerImpl (no need for yet another timeout) while adding a small amount of extra complexity in the ConnectionImpl.

mattklein123 · 2019-04-02T16:33:32Z

@AndresGuedez (2) sounds great to me.

Revert the previous approach of enabling the delayed close timer only after the flush is completed. Instead, the timer is enabled when a close(FlushWrite) or close(FlushWriteAndDelay) is issued and the timer is reset when the onWriteReady event fires and data is written to the socket. This prevents the race between flushing and the timer triggering while preserving an upper bound on inactivity after the close() is issued. Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk

I think this looks right at at high level. I definitely want to another pass Monday with a pencil and paper and caffeine, but here's a couple of nits in the interim.

alyssawilk · 2019-04-04T15:09:52Z

api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto

+  // from the downstream connection) prior to Envoy closing the socket associated with that
+  // connection. Note that this timeout may trigger a socket close even when Envoy's write
+  // buffer has not been fully flushed after close processing is initiated; this happens when Envoy
+  // is unable to write data to the socket for an interval greater than or equal to this timeout


I think this comment could use clarification, as it reads more like if it's "can not write any data to the socket during T" or "can not write all data to the socket during T" and it looks like the grace period is absolute from the close call rather than a progress check.

alyssawilk · 2019-04-04T15:10:47Z

docs/root/intro/version_history.rst

 * http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged.
 * http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data.
 * http: added encodeComplete/decodeComplete. These are invoked at the end of the stream, after all data has been encoded/decoded respectively. Default implementation is a no-op.
+* http: fixed a bug with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger prior to flushing the write buffer for the downstream connection.


Maybe update here as well? It can still trigger, but won't trigger if progress is being made?

alyssawilk · 2019-04-04T18:22:32Z

source/common/network/connection_impl.cc

-    if (delayed_close_) {
+    if (inDelayedClose()) {
+      ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
+      delayed_close_state_ = (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set)


Can you break this out for readability?

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez · 2019-04-05T14:33:50Z

I had to force push to fix missing DCO in the last commit. @alyssawilk apologies if this causes some review history issues.

mattklein123

Thanks this is great. At a high level LGTM. Some small questions/comments.

/wait

mattklein123 · 2019-04-08T17:22:07Z

docs/root/intro/version_history.rst

 * http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged.
 * http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data.
 * http: added encodeComplete/decodeComplete. These are invoked at the end of the stream, after all data has been encoded/decoded respectively. Default implementation is a no-op.
+* http: fixed a bug with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger while actively flushing a pending write buffer for a downstream connection.


Can you merge master again and move this to 1.11.0?

mattklein123 · 2019-04-08T17:28:22Z

source/common/network/connection_impl.h


+  // States associated with delayed closing of the connection (i.e., when the underlying socket is
+  // not immediately close()d as a result of a ConnectionImpl::close()).
+  enum class DelayedCloseState { None, CloseAfterFlush, CloseAfterFlushAndTimeout };


Can you add comments for each state? It's not immediately clear what the difference is between the latter 2.

mattklein123 · 2019-04-08T17:33:29Z

source/common/network/connection_impl.cc

+  // Disable the delayed close timer since data is still being flushed. The timer should only
+  // trigger after a delayedCloseTimeout() period of inactivity.
+  if (delayed_close_timer_ != nullptr) {
+    delayed_close_timer_->disableTimer();


Is there any reason to disable the timer here? Won't we boost it below? (Mainly just wondering if we can simplify slightly but there might be a good reason to keep this disable here.)

I had missed that libevent's event_add() would reset an already active timer with the newly provided timeout value. As you point out, this is not necessary and I have removed it.

mattklein123 · 2019-04-08T17:39:25Z

source/common/network/connection_impl.cc

+      // Validate that a delayed close timer is already enabled unless it was disabled via
+      // configuration.
+      ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
+      if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) {


Out of curiosity, does this state change actually happen or is this preemptive? Just wondering if we can simplify until this becomes an actual issue? Can we just assert the type is the same?

This was just preemptive. I've changed the logic to a stronger ASSERT() and removed unnecessary tests.

My latest commit restores this logic. The reason is that after discussing with @alyssawilk offline, I now prefer maintaining Connection::close() API backwards compatibility with the existing behavior which allows ConnectionCloseType transitions between close() calls. This minimizes the (admittedly small) risk that an existing user of the API passing FlushWrite and FlushWriteAndDelay would break after this PR is merged. It also enforces consistent handling of type transitions by allowing all ConnectionCloseType transitions as opposed to special casing FlushWrite and FlushWriteAndDelay.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Based on a quick scan, I haven't found any filters that attempt a graceful close followed by a forced close (it tends to be one or the other, typically the latter is only used on error handling code paths prior to any other close()s being issued). However, this seems like a reasonable thing to support, and more importantly, the existing logic in the ConnectionImpl destructor forces support for X -> NoFlush transitions since a "just in case" close(NoFlush) is always attempted on destruction. This could be changed, but it seems completely reasonable to me given that leaking connections is much worse than not fully flushing the write buffer or avoiding the race that the delayed_close_timeout is mitigating.

So, the modified proposal would be to only allow state transitions ConnectionCloseType::* -> ConnectionCloseType::NoFlush after the first close() has been issued. This still achieves the goal of simplifying the logic while providing some defense against bugs.

@alyssawilk @mattklein123 I would prefer to create a second PR for changing the Connection::close() API to enforce the state transition constraints.

This PR should be safe to merge given the amount of review and test coverage added. What do you think about doing this in two steps and I can follow up with the API change after this PR is merged? I think we will end up with a cleaner history and more granular options for rolling back if the API change causes breakage.

Sure that's fine with me.

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk · 2019-04-09T12:40:47Z

api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto

+  // is pending a flush of the write buffer. However, any progress made writing data to the socket
+  // will restart the timer associated with this timeout. This means that the total grace period for
+  // a socket in this state will be
+  // <delayed_close_timeout>+<total_time_waiting_for_write_buffer_flushes>.


Awesome, much more clear, thanks.

One last super nitty nit, I'd reverse the order since the flush happens first.

alyssawilk · 2019-04-09T12:48:07Z

source/common/network/connection_impl.cc

+      // been requested, start the delayed close timer if it hasn't been done already by a previous
+      // close().
+      if (!inDelayedClose()) {
+        initializeDelayedCloseTimer();


I don't think we gracefully handle a caller doing
setDelayedCloseTimeout(timeout1)
close(ConnectionCloseType::FlushWriteAndDelay)
setDelayedCloseTimeout(timeout2)
close(ConnectionCloseType::FlushWriteAndDelay)
and I don't think we need to. However, do you think it's worth commenting that somewhere in the APIs and/or an assert in setDelayedCloseTimeout that you can't set-after-close or am I overthinking? Arguably doing anything after close is pretty sketchy but we are handling multiple close() calls for good reasons...

Added ASSERT() and comment.

alyssawilk · 2019-04-09T12:54:33Z

source/common/network/connection_impl.cc

-
-    // Create and activate a timer which will immediately close the connection if triggered.
-    // A config value of 0 disables the timeout.
    if (delayed_close_timeout_set) {


optional (style thing) having sanity checked that we couldn't call initializeDelayedCloseTimer() with inDelayedClose() true, I wonder if it's worth putting this in an else{} block just to make it super clear which branch we are on.

I added a comment. I would prefer not to unnecessarily indent unless you think it makes a large readability difference.

alyssawilk · 2019-04-09T13:04:15Z

source/common/network/connection_impl.cc

+      // Validate that a delayed close timer is already enabled unless it was disabled via
+      // configuration.
+      ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
+      // Validate that the same close type is used when multiple close()s are issued. An edge case


Do we make it clear in the class definition what transitions are allowed?

Done. Added a comment to the DelayedCloseState enum declaration.

alyssawilk · 2019-04-09T13:20:02Z

source/common/network/connection_impl.cc

-      ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this,
-                     delayedCloseTimeout().count());
-      delayed_close_timer_->enableTimer(delayedCloseTimeout());
+      initializeDelayedCloseTimer();


can't we still get here with data_to_write > 0? I thought if we did close(DelayedCloseState::CloseAfterFlushAndTimeout) we didn't want to arm the timer until the flush was complete?

The timer is always armed when a close(FlushWrite) or close(FlushWriteAndDelay) is issued. The only difference between the two close types is that the socket is immediately closed after the flush with the former, while the delayed_close_timeout_ is allowed to expire and trigger with the latter.

Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P

I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity.

Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk

Thanks as always for your patience - I think we're almost there! Just a few nits from me, and I'll wait back to hear on Dan's libevent question

alyssawilk · 2019-04-09T18:13:19Z

source/common/network/connection_impl.cc

-      ENVOY_CONN_LOG(debug, "setting delayed close timer with timeout {} ms", *this,
-                     delayedCloseTimeout().count());
-      delayed_close_timer_->enableTimer(delayedCloseTimeout());
+      initializeDelayedCloseTimer();


Recapping offline discussions for posterity, I missed the "I am redesigning this" email, Andres missed putting the new design in the description, and we are now untangled :-P

I will say that with the new plan I'm not convinced that this solves #6392 insofar as 20ms pretty short. That said I think it solves an underlying problem worth solving and #6392 may simply need longer timeouts. Might be worth commenting somewhere in the APIs that to be useful this timeout needs to be at O(1 max_rtt + libevent_loop_time) to avoid races. I'll look for places we can add more clarity.

alyssawilk · 2019-04-09T18:40:12Z

source/common/network/connection_impl.cc

 }

 void ConnectionImpl::onDelayedCloseTimeout() {
+  delayed_close_timer_.reset(nullptr);


I think we just usually reset()

alyssawilk · 2019-04-09T18:58:38Z

source/common/network/connection_impl.cc

+      delayed_close_state_ = (type == ConnectionCloseType::FlushWrite)
+                                 ? DelayedCloseState::CloseAfterFlush
+                                 : DelayedCloseState::CloseAfterFlushAndTimeout;
+    } else {


I was all excited that I'd found a bug where we would let connections idle out forever but apparently that's WAI?

Can I ask the API go from

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed.

to

// A value of 0 will completely disable delayed close processing, and the downstream connection's // A value of 0 will completely disable delayed close processing, and the downstream connection's
// socket will be closed immediately after the write flush is completed and will never close if the write flush does not complete

maybe with some // .. attention:: flags or DANGER DANGER DANGER if you'd like to leak connections please set this to 0 :-P

Done. Added a .. WARNING::.

alyssawilk · 2019-04-09T19:04:16Z

Oh also please master merge for coverage - I would like to make sure we have all the branches covered

danzh2010 · 2019-04-09T20:09:33Z

test/common/network/connection_impl_test.cc

+
+// Test that the delayed close timer is reset while write flushes are happening when a connection is
+// in delayed close mode.
+TEST_P(ConnectionImplTest, DelayedCloseTimerResetWithPendingWriteBufferFlushes) {


This test and the one above are pretty much the same other than EXPECT_CALL(*transport_socket, doWrite...). Can you extract common code into helper function or combine these 2 tests into one, i.e. EXPECT_CALL(... doWrite()) twice with different return.

Collapsed into a single test. Simplifications made throughout this PR made the second test largely redundant as you point out.

danzh2010 · 2019-04-09T20:16:12Z

source/common/network/connection_impl.cc

    // It is possible (though unlikely) for the connection to have already been closed during the
    // write callback. This can happen if we manage to complete the SSL handshake in the write
    // callback, raise a connected event, and close the connection.
    closeSocket(ConnectionEvent::RemoteClose);


I just saw that closeSocket() disables timer. Sorry for my ignorant...

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Signed-off-by: Andres Guedez <aguedez@google.com>

- Decompose ASSERT() for readability. - Collapse very similar tests into one. Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk · 2019-04-10T13:44:29Z

source/common/network/connection_impl.cc


-    closeSocket(ConnectionEvent::LocalClose);
+    if (type == ConnectionCloseType::FlushWriteAndDelay && delayed_close_timeout_set) {
+      // The socket is being closed and there is no more data to write. Since a delayed close has


After one more pass, is this comment correct? Arguably we can also get here where there is data to write, but we're in the midst of a tls/alts handshake and have determined that flushing is pointless (canFlushClose is false) because the write could not flush all data.

I think the canFlushClose() was added for the case where you have plaintext payload queued up behind an unfinished crypto handshake at which point the payload isn't going to get flushed and given old style options you should give up and close immediately.

That said, if there were some crypto protocol where there were large bidi frames, I can imagine the same race we have for HTTP where the FIN + RST lagged behind a large client side "no I am rejecting your handshake" write which is queued in the kernel and so we want a one interval delay for the client to get that response.

I think if we're in that case, the connection is going to observe the transport socket is "blocked", and the alarm will fire after one interval (handling any race) so the code is doing the right thing and the comment can just be tweaked a bit for clarity. Sound right?

Good catch, thanks for pointing this out. I have clarified the comment.

That said, if there were some crypto protocol where there were large bidi frames, I can imagine the same race we have for HTTP where the FIN + RST lagged behind a large client side "no I am rejecting your handshake" write which is queued in the kernel and so we want a one interval delay for the client to get that response.

I think if we're in that case, the connection is going to observe the transport socket is "blocked", and the alarm will fire after one interval (handling any race) so the code is doing the right thing and the comment can just be tweaked a bit for clarity. Sound right?

Yeah, I agree with this. I revisited the logic for the transport sockets that return canFlushClose() == false and at least for the TLS (SSL) socket, it doesn't seem right that canFlushClose() is conditional on the handshake completing. TLS alerts are transmitted during handshake failures and should be allowed to flush as well. I'll file an issue to follow up on this.

alyssawilk · 2019-04-10T13:47:18Z

source/common/network/connection_impl.cc

+      // Validate that a delayed close timer is already enabled unless it was disabled via
+      // configuration.
+      ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
+      // Validate that the same close type is used when multiple close()s are issued. An edge case


Signed-off-by: Andres Guedez <aguedez@google.com>

This reverts commit 11008bd and allows callers of the Connection::close() API to change the 'type' arg between 'FlushWrite' and 'FlushWriteAndDelay' when issuing multiple close() calls on the same connection. The new logic matches the existing behavior which allows any 'type' transition between 'close()' calls and eliminates the risk that changing the behavior of the API will break existing external uses (via third party filters) of the Connnection API. Signed-off-by: Andres Guedez <aguedez@google.com>

mattklein123

Thanks, looking great. 2 more comments for discussion.

/wait-any

mattklein123 · 2019-04-11T18:24:38Z

source/common/network/connection_impl.h

+    // The socket will be closed after a grace period of delayed_close_timeout_ has elapsed after
+    // the socket is flushed _or_ if a period of inactivity after the last write event greater than
+    // or equal to delayed_close_timeout_ has elapsed.
+    CloseAfterFlushAndTimeout


nit: WDYT about calling this CloseAfterFlushAndDelay? Both of these states involve timeouts so IMO this is a little confusing.

I considered CloseAfterFlushAndDelay but ultimately decided against it because both CloseAfterFlush and CloseAfterFlushAndTimeout introduce a delay into close() processing by waiting for either the flush or the <flush+timer trigger> to happen.

I could rename to CloseAfterFlushAndWait if you think that's clearer than CloseAfterFlushAndTimeout.

Yeah I do think Wait is more clear than Timeout in this case if you don't mind.

mattklein123 · 2019-04-11T18:26:56Z

source/common/network/connection_impl.cc

+      // Validate that a delayed close timer is already enabled unless it was disabled via
+      // configuration.
+      ASSERT(!delayed_close_timeout_set || delayed_close_timer_ != nullptr);
+      if (type == ConnectionCloseType::FlushWrite || !delayed_close_timeout_set) {


Another option, which would significantly simplify the close() logic, is to enforce that callers use the same type after the initial close() is issued on a Connection but this would break backwards compatibility and would have a much higher risk of breaking existing users.

This would be my preference. Are there any existing users that actually do this? (There might be and it might a reasonable thing to try a graceful close followed by a force closed, I just don't remember).

Signed-off-by: Andres Guedez <aguedez@google.com>

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

mattklein123

Awesome, thanks. @danzh2010 any further comments?

Signed-off-by: Andres Guedez <aguedez@google.com>

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez · 2019-04-12T10:52:27Z

I merged master to resolve version history conflict. Thanks for the reviews everyone!

mattklein123

Awesome work!

…r flush (envoyproxy#6437) Change the behavior of the delayed_close_timeout such that it won't trigger unless there has been at least a delayed_close_timeout period of inactivity after the last write event on the socket pending to be closed. This mitigates a race where a slow client and/or low timeout value would cause the socket to be closed while data was actively being written to the socket. Note that this change does not eliminate this race since a slow client could still be considered idle by the updated timeout logic, but this should be very rare when useful values (i.e., >1s to avoid the race condition on close that this timer addresses) are configured. Risk Level: Medium Testing: New unit tests added Docs Changes: Updated version history and HttpConnectionManager proto doc Fixes envoyproxy#6392 Signed-off-by: Andres Guedez <aguedez@google.com> Signed-off-by: Piotr Sikora <piotrsikora@google.com>

AndresGuedez added 4 commits March 29, 2019 12:08

Fix comment formatting.

82dc62d

Signed-off-by: Andres Guedez <aguedez@google.com>

Update documentation.

3f08a6d

Signed-off-by: Andres Guedez <aguedez@google.com>

Test cleanup.

eb58ad6

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez commented Mar 29, 2019

View reviewed changes

mattklein123 assigned mattklein123 and alyssawilk Mar 29, 2019

mattklein123 added the waiting label Apr 2, 2019

repokitteh-read-only bot removed the waiting label Apr 4, 2019

alyssawilk reviewed Apr 4, 2019

View reviewed changes

alyssawilk assigned danzh2010 Apr 4, 2019

Clarify documentation and minor readability refactor.

dd6d9ea

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez force-pushed the delayed-close-flush-race branch from 7c02143 to dd6d9ea Compare April 5, 2019 14:32

mattklein123 requested changes Apr 8, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Apr 8, 2019

AndresGuedez added 2 commits April 8, 2019 15:34

Merge remote-tracking branch 'upstream/master' into delayed-close-flu…

412f485

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Cleanup.

f60db05

Signed-off-by: Andres Guedez <aguedez@google.com>

repokitteh-read-only bot removed the waiting label Apr 9, 2019

Simplify logic to handle multiple close()s.

11008bd

Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk reviewed Apr 9, 2019

View reviewed changes

Fix spelling mistake in comment.

aebc8cf

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez changed the title ~~http: fix delayed close timeout race with connection write buffer flush~~ http: mitigate delayed close timeout race with connection write buffer flush Apr 9, 2019

alyssawilk reviewed Apr 9, 2019

View reviewed changes

danzh2010 reviewed Apr 9, 2019

View reviewed changes

AndresGuedez added 3 commits April 9, 2019 16:24

Merge remote-tracking branch 'upstream/master' into delayed-close-flu…

788e07a

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Cleanup.

2d75b29

Signed-off-by: Andres Guedez <aguedez@google.com>

Cleanup.

031f520

- Decompose ASSERT() for readability. - Collapse very similar tests into one. Signed-off-by: Andres Guedez <aguedez@google.com>

alyssawilk reviewed Apr 10, 2019

View reviewed changes

AndresGuedez added 5 commits April 10, 2019 10:19

Clarify comments.

06ec3e9

Signed-off-by: Andres Guedez <aguedez@google.com>

Further comment clarification.

e9c0a73

Signed-off-by: Andres Guedez <aguedez@google.com>

Clarify the accepted use of the 'type' argument for close().

adb0488

Signed-off-by: Andres Guedez <aguedez@google.com>

Add documentation about setting useful delayed close timeout values.

012bd0a

Signed-off-by: Andres Guedez <aguedez@google.com>

mattklein123 reviewed Apr 11, 2019

View reviewed changes

repokitteh-read-only bot added waiting:any and removed waiting:any labels Apr 11, 2019

AndresGuedez added 2 commits April 11, 2019 17:13

s/CloseAfterFlushAndTimeout/CloseAfterFlushAndWait

77ec586

Signed-off-by: Andres Guedez <aguedez@google.com>

Merge remote-tracking branch 'upstream/master' into delayed-close-flu…

0e2248c

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

mattklein123 previously approved these changes Apr 11, 2019

View reviewed changes

AndresGuedez added 3 commits April 12, 2019 06:50

Update release notes.

2d3d44b

Signed-off-by: Andres Guedez <aguedez@google.com>

Merge remote-tracking branch 'upstream/master' into delayed-close-flu…

577729e

…sh-race Signed-off-by: Andres Guedez <aguedez@google.com>

Fix version history after merge.

7a6bceb

Signed-off-by: Andres Guedez <aguedez@google.com>

AndresGuedez dismissed mattklein123’s stale review via 7a6bceb April 12, 2019 10:51

mattklein123 approved these changes Apr 12, 2019

View reviewed changes

mattklein123 merged commit cdaeb13 into envoyproxy:master Apr 12, 2019

MarcinFalkowski mentioned this pull request Apr 17, 2019

mitigate delayed close timeout race with connection write buffer flush - reopen #6616

Closed

sfc-gh-igadot mentioned this pull request Sep 16, 2025

TCP proxy - sockets stuck in FIN_WAIT_2 state (Add delayed_close_timeout option?) #41086

Closed

Conversation

AndresGuedez commented Mar 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 commented Mar 29, 2019

Uh oh!

AndresGuedez commented Mar 29, 2019

Uh oh!

mattklein123 commented Mar 29, 2019

Uh oh!

mattklein123 commented Apr 1, 2019

Uh oh!

alyssawilk commented Apr 1, 2019

Uh oh!

AndresGuedez commented Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 commented Apr 2, 2019

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndresGuedez commented Apr 5, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndresGuedez Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndresGuedez Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

AndresGuedez commented Mar 29, 2019 •

edited

Loading

AndresGuedez commented Apr 2, 2019 •

edited

Loading

AndresGuedez Apr 11, 2019 •

edited

Loading

AndresGuedez Apr 11, 2019 •

edited

Loading