grid: Add a new class for tracking HTTP/3 status by RyanTheOptimist · Pull Request #16067 · envoyproxy/envoy

RyanTheOptimist · 2021-04-19T18:18:06Z

grid: Add a new class for tracking HTTP/3 status.

Create a new Http3StatusTracker class which can mark HTTP/3 as broken
for a period of time, subject to exponential backoff. Use this in ConnectivityGrid.

Risk Level: Low
Testing: New unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Create a new BrokenHttp3Tracker class which can mark HTTP/3 as broken for a period of time, subject to exponential backoff. Signed-off-by: Ryan Hamilton <rch@google.com>

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist · 2021-04-19T18:18:18Z

/assign @RenjieTang

RenjieTang

Nice! some minor comments.

source/common/http/broken_http3_tracker.cc

source/common/http/broken_http3_tracker.h

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist

Thanks Renjie!

source/common/http/broken_http3_tracker.cc

source/common/http/broken_http3_tracker.h

RenjieTang

LGTM with mod one nit.

source/common/http/http3_status_tracker.cc

antoniovicente

Ryan: Thanks for introducing retry behavior. CI seems to be showing some problems with the test. Also, it seems that some of the methods of the newly introduced object are not used by the connection pool grid.

source/common/http/http3_status_tracker.cc

antoniovicente · 2021-04-20T00:02:17Z

source/common/http/http3_status_tracker.cc

+bool Http3StatusTracker::isHttp3Confirmed() const { return state_ == State::Confirmed; }
+
+void Http3StatusTracker::markHttp3Broken() {
+  state_ = State::Broken;


What are valid values for state_ when entering this method?

Good question! Mentally, I'm modeling this off of the similar code in Chrome. That code runs up at the request/response layer (the HttpNetworkTransaction) which is above the connection establishment layer. As such and given that requests happen in parallel, it's possible for basically any sequence of markBroken/markConfirmed calls to arrive in any order. I suspect that we'll eventually want something similar. But since we're not doing anything like that now, there's no need to permit such state transitions. So I've added ASSERT() calls to make it clear what the valid states are. Thanks for pointing this out.

(In any case, this should be reachable from any state other than broken)

It may be possible to trigger this ASSERT, if there are 2 concurrent attempts to connect to the same endpoint. That said, it's possible that there are protections elsewhere to prevent this from happening or it is relatively unlikely to happen without a burst of requests for that service; we would need the number of requests to exceed the upstream's multiplexing factor and trigger creation of a second connection in order to meet demand.

Ah! Excellent point! That's very true. Ok, in that case we're back to the Chrome situation where the parallelism means that we can really get any sequence of events in any order. I've removed the ASSERT() calls.

Just for my own curiosity why would we have multiple concurrent connection attempts to the same endpoint on a given worker thread? Is this related to prefetching or part of the grid logic?

I'm relatively unclear at this on exactly how this all plays together. From what Antonio said, it sound like if there were a sufficient number of simultaneous requests we might trigger the creation of a second attempt. The other case that I think I heard from alyssa is that it's possible to have multiple calls to ConnectivityGrid::newStream() happen before the first call finishes. This won't result in multiple TCP/QUIC connection attempts because the underlying connection pool will do the right thing. But I think this is transparent to the ConnectivtyGrid; each call to newStream creates a new WrapperCallbacks and (up to) 2 ConnectionAttempts which should mean it's possible to get unexpected state transitions. Happy to do something different if I'm not understanding.

Yeah this sounds plausible to me: newStream is non-blocking (like most things in Envoy), so if multiple streams are established before the connection is established so you'd see multiple newStreams come in before the connection is established.

No change necessary, I was just curious how this all fit together :)

Optional (I mean in this PR, not optional overall) I wonder if we should start landing docs on this as we start implementing the "real" logic.

Generally we land docs when we unhide config (#15926) but the failover logic is sufficiently complicated I think we could land docs for now in source/docs and then move them to docs/ when the PR lands. Your call if we do them now or in a future iteration :-)

Docs definitely make sense and I'm happy to work on them. I think I'll do that in a follow-up since this PR is (hopefully) basically done at this point.

source/common/http/http3_status_tracker.cc

test/common/http/http3_status_tracker_test.cc

source/common/http/http3_status_tracker.h

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist

Thanks for the thoughtful review!

RyanTheOptimist · 2021-04-20T00:47:56Z

source/common/http/http3_status_tracker.cc

+bool Http3StatusTracker::isHttp3Confirmed() const { return state_ == State::Confirmed; }
+
+void Http3StatusTracker::markHttp3Broken() {
+  state_ = State::Broken;


Good question! Mentally, I'm modeling this off of the similar code in Chrome. That code runs up at the request/response layer (the HttpNetworkTransaction) which is above the connection establishment layer. As such and given that requests happen in parallel, it's possible for basically any sequence of markBroken/markConfirmed calls to arrive in any order. I suspect that we'll eventually want something similar. But since we're not doing anything like that now, there's no need to permit such state transitions. So I've added ASSERT() calls to make it clear what the valid states are. Thanks for pointing this out.

(In any case, this should be reachable from any state other than broken)

source/common/http/http3_status_tracker.cc

source/common/http/http3_status_tracker.h

test/common/http/http3_status_tracker_test.cc

Signed-off-by: Ryan Hamilton <rch@google.com>

antoniovicente

Looks good

source/common/http/conn_pool_grid.h

Signed-off-by: Ryan Hamilton <rch@google.com>

snowp

Thanks this looks pretty good, just a few minor comments

source/common/http/http3_status_tracker.h

snowp · 2021-04-22T15:31:46Z

source/common/http/http3_status_tracker.cc

+bool Http3StatusTracker::isHttp3Confirmed() const { return state_ == State::Confirmed; }
+
+void Http3StatusTracker::markHttp3Broken() {
+  state_ = State::Broken;


Just for my own curiosity why would we have multiple concurrent connection attempts to the same endpoint on a given worker thread? Is this related to prefetching or part of the grid logic?

source/common/http/http3_status_tracker.cc

snowp · 2021-04-22T15:34:56Z

test/common/http/http3_status_tracker_test.cc

+class Http3StatusTrackerTest : public testing::Test {
+public:
+  Http3StatusTrackerTest()
+      : timer_(new StrictMock<MockTimer>(&dispatcher_)), tracker_(dispatcher_) {}


I think we usually omit the StrictMock part since by default all the mocks are strict (we're not super consistent here so not a big deal)

Wow, interesting! What's the magic that makes all the Mocks strict? Some #define something somewhere? Very cool. In any case, done.

This bit: https://github.com/envoyproxy/envoy/blob/main/bazel/envoy_test.bzl#L170-L172

Oooo! Thanks!

test/common/http/http3_status_tracker_test.cc

snowp · 2021-04-22T15:37:38Z

test/common/http/http3_status_tracker_test.cc

+}
+
+TEST_F(Http3StatusTrackerTest, MarkBrokenWithBackoff) {
+  // markBroken will only be called when the time is not enabled.


From reading the test it seems like we're the ones calling markHttp3Broken, what does this comment refer to?

Ah. I can remove this comment if it doesn't make sense. I wrote the comment because the MockTimer API confused me. I expected that expect() would return true if enableTimer() had been called but disableTimer() or invokeCallback() had not been. In other words, I didn't expect to need to mock out this method. So the comment was saying that when markBroken() is called in this tests, the timer will not have been enabled. Does that make sense? Would you recommend I rephrase or remove the comment?

Oh I see, I would try to include something that indicates that we're talking about invariants for this is used by prod code. My initial read was that this was trying to explain what we should expect to see in this test, so I expected to see some EXPECT_CALL(.., markBroken()) kind of expectations

Makes sense. That being said, I took several stabs at trying to write something up and each time ended up with a bit of an essay that didn't really seem to add any readability to the test, so I've just nuked the comment. (Really the problem was my lack of understanding of how MockTimer worked and so this is probably not the best place to address that :>)

test/common/http/http3_status_tracker_test.cc

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist

Thanks!

source/common/http/http3_status_tracker.cc

RyanTheOptimist · 2021-04-22T18:09:17Z

source/common/http/http3_status_tracker.cc

+bool Http3StatusTracker::isHttp3Confirmed() const { return state_ == State::Confirmed; }
+
+void Http3StatusTracker::markHttp3Broken() {
+  state_ = State::Broken;


I'm relatively unclear at this on exactly how this all plays together. From what Antonio said, it sound like if there were a sufficient number of simultaneous requests we might trigger the creation of a second attempt. The other case that I think I heard from alyssa is that it's possible to have multiple calls to ConnectivityGrid::newStream() happen before the first call finishes. This won't result in multiple TCP/QUIC connection attempts because the underlying connection pool will do the right thing. But I think this is transparent to the ConnectivtyGrid; each call to newStream creates a new WrapperCallbacks and (up to) 2 ConnectionAttempts which should mean it's possible to get unexpected state transitions. Happy to do something different if I'm not understanding.

source/common/http/http3_status_tracker.h

test/common/http/http3_status_tracker_test.cc

RyanTheOptimist · 2021-04-22T18:13:28Z

test/common/http/http3_status_tracker_test.cc

+}
+
+TEST_F(Http3StatusTrackerTest, MarkBrokenWithBackoff) {
+  // markBroken will only be called when the time is not enabled.


Ah. I can remove this comment if it doesn't make sense. I wrote the comment because the MockTimer API confused me. I expected that expect() would return true if enableTimer() had been called but disableTimer() or invokeCallback() had not been. In other words, I didn't expect to need to mock out this method. So the comment was saying that when markBroken() is called in this tests, the timer will not have been enabled. Does that make sense? Would you recommend I rephrase or remove the comment?

RyanTheOptimist · 2021-04-22T18:26:39Z

test/common/http/http3_status_tracker_test.cc

+class Http3StatusTrackerTest : public testing::Test {
+public:
+  Http3StatusTrackerTest()
+      : timer_(new StrictMock<MockTimer>(&dispatcher_)), tracker_(dispatcher_) {}


Wow, interesting! What's the magic that makes all the Mocks strict? Some #define something somewhere? Very cool. In any case, done.

RyanTheOptimist · 2021-04-23T13:53:42Z

/retest

repokitteh-read-only · 2021-04-23T13:53:46Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16067 (comment) was created by @RyanTheOptimist.

see: more, trace.

Signed-off-by: Ryan Hamilton <rch@google.com>

snowp

LGTM, thanks!

grid: Add a new class for tracking HTTP/3 status. Create a new Http3StatusTracker class which can mark HTTP/3 as broken for a period of time, subject to exponential backoff. Use this in ConnectivityGrid. Risk Level: Low Testing: New unit tests Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Signed-off-by: Ryan Hamilton <rch@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>

RyanTheOptimist added 3 commits April 19, 2021 17:36

grid: Add a new class for tracking HTTP/3 brokeness.

54b5881

Create a new BrokenHttp3Tracker class which can mark HTTP/3 as broken for a period of time, subject to exponential backoff. Signed-off-by: Ryan Hamilton <rch@google.com>

Oh, and the changes to ConnectivityGrid

43935f2

Signed-off-by: Ryan Hamilton <rch@google.com>

format

4f57d0d

Signed-off-by: Ryan Hamilton <rch@google.com>

repokitteh-read-only bot assigned RenjieTang Apr 19, 2021

RenjieTang reviewed Apr 19, 2021

View reviewed changes

source/common/http/broken_http3_tracker.cc Outdated Show resolved Hide resolved

source/common/http/broken_http3_tracker.h Outdated Show resolved Hide resolved

antoniovicente self-assigned this Apr 19, 2021

Fix comments from Renjie including renaming the class/files.

2028dc7

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist commented Apr 19, 2021

View reviewed changes

source/common/http/broken_http3_tracker.cc Outdated Show resolved Hide resolved

source/common/http/broken_http3_tracker.h Outdated Show resolved Hide resolved

RenjieTang previously approved these changes Apr 19, 2021

View reviewed changes

source/common/http/http3_status_tracker.cc Show resolved Hide resolved

RyanTheOptimist changed the title ~~grid: Add a new class for tracking HTTP/3 brokeness.~~ grid: Add a new class for tracking HTTP/3 status Apr 19, 2021

antoniovicente reviewed Apr 20, 2021

View reviewed changes

antoniovicente added the waiting label Apr 20, 2021

Address comments from Antonio

7ed8c1f

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist dismissed RenjieTang’s stale review via 7ed8c1f April 20, 2021 17:52

repokitteh-read-only bot removed the waiting label Apr 20, 2021

Format

a19e0aa

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist commented Apr 20, 2021

View reviewed changes

antoniovicente reviewed Apr 20, 2021

View reviewed changes

test/common/http/http3_status_tracker_test.cc Show resolved Hide resolved

antoniovicente added the waiting label Apr 20, 2021

RyanTheOptimist added 2 commits April 20, 2021 23:32

Address next comments from Antonio

5a9cc69

Signed-off-by: Ryan Hamilton <rch@google.com>

format

627f577

Signed-off-by: Ryan Hamilton <rch@google.com>

repokitteh-read-only bot removed the waiting label Apr 20, 2021

Remove unused usings

5a95f22

Signed-off-by: Ryan Hamilton <rch@google.com>

antoniovicente previously approved these changes Apr 21, 2021

View reviewed changes

source/common/http/conn_pool_grid.h Show resolved Hide resolved

antoniovicente assigned snowp Apr 21, 2021

Comments

a6d41ed

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist dismissed antoniovicente’s stale review via a6d41ed April 22, 2021 00:19

snowp suggested changes Apr 22, 2021

View reviewed changes

address comments from snowp

fdf486c

Signed-off-by: Ryan Hamilton <rch@google.com>

RyanTheOptimist commented Apr 22, 2021

View reviewed changes

RyanTheOptimist added 5 commits April 23, 2021 17:18

Merge branch 'main' into broken_tracker

2aa5191

Signed-off-by: Ryan Hamilton <rch@google.com>

Remove confusing comment

33a55cf

Signed-off-by: Ryan Hamilton <rch@google.com>

Patch from Yan to fix the GCC build

ed01f6b

Signed-off-by: Ryan Hamilton <rch@google.com>

Merge branch 'main' into broken_tracker

0f8b751

Signed-off-by: Ryan Hamilton <rch@google.com>

One more fix from Yan

b562300

Signed-off-by: Ryan Hamilton <rch@google.com>

snowp approved these changes Apr 27, 2021

View reviewed changes

alyssawilk merged commit 3756e5b into envoyproxy:main Apr 28, 2021

Conversation

RyanTheOptimist commented Apr 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RyanTheOptimist commented Apr 19, 2021

Uh oh!

RenjieTang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RyanTheOptimist left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RenjieTang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoniovicente left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RyanTheOptimist left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoniovicente left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snowp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanTheOptimist commented Apr 19, 2021 •

edited

Loading

antoniovicente left a comment •

edited

Loading