Skip to content

upstream adding QUIC-to-TCP failover#15894

Merged
alyssawilk merged 7 commits intoenvoyproxy:mainfrom
alyssawilk:h3_api
Apr 16, 2021
Merged

upstream adding QUIC-to-TCP failover#15894
alyssawilk merged 7 commits intoenvoyproxy:mainfrom
alyssawilk:h3_api

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk commented Apr 8, 2021

Hey Matt, can I get your thoughts on 2 things?

  1. if we're Ok with having a hard-coded something (right now "use_quic") matcher to handle TCP-TLS and QUIC-TLS in the same cluster (or maybe we want to do something to combine the two configs and auto-select based on transport?)

  2. how we want to handle the connection pool being able to select a non-default transportSocket. I have "use_quic" as a placeholder argument but I think if we extend the API the boolean should probably be replaced with an optional metadata. I'd like your thoughts if you have a better solution before I go change the API and all the mocks though.

Because I haven't cleaned up all the mocks so this will not build, but the integration test both provides a proof of concept for failover working, and the annoyance in configuring both TCP-TLS and QUIC-TLS transport sockets.

cc @RyanTheOptimist

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Part of #14829

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #15894 was opened by alyssawilk.

see: more, trace.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff. A left some ideas for discussion. Thank you!

/wait-any

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk marked this pull request as ready for review April 13, 2021 15:43
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

Tsan failure was CI failure, not test fail. This should be ready for a pass!

@alyssawilk
Copy link
Copy Markdown
Contributor Author

@snowp would you have time today to do a first pass? Matt's on call for the release and I was hoping to land this before I take off for a week.

@alyssawilk
Copy link
Copy Markdown
Contributor Author

ugggh, accidentally pushed to the wrong branch. reverted but CI is of course running again (was green, and barring flakes will be green again)

@alyssawilk
Copy link
Copy Markdown
Contributor Author

CI is unrelated flake:

2021-04-15T17:51:08.2680556Z Pulling zipkin (openzipkin/zipkin:)...
2021-04-15T17:51:13.3215508Z received unexpected HTTP status: 503 Service Unavailable
2021-04-15T17:51:13.3776159Z ERROR: starting zipkin .
2021-04-15T17:51:13.3797151Z
2021-04-15T17:51:13.3797958Z > [zipkin] Cleanup (.)
2021-04-15T17:51:13.9288406Z Removing network zipkin-tracing_envoymesh
2021-04-15T17:51:14.2006879Z TESTS FAILED:

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@antoniovicente antoniovicente self-assigned this Apr 15, 2021
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. There may be a threading issue in the call to maybeCreateFallbackFactory if the factories are not per-worker.

Network::TransportSocketPtr
createTransportSocket(Network::TransportSocketOptionsSharedPtr options) const override {
// TODO(#15649) make createTransportSocket non-const to avoid the const cast.
const_cast<QuicClientTransportSocketFactory*>(this)->maybeCreateFallbackFactory();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are factories per worker or can createTransportSocket be called concurrently by multiple theads? If the later, there's a thread race in this code.

I would consider creating this fallback factory in QuicClientTransportSocketFactory's constructor in order to guarantee that it always exists if createTransportSocket is ever called.

Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols};
return std::make_unique<Http::ConnectivityGrid>(
dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options,
state, source, std::chrono::milliseconds(300), coptions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, are there plans to make next_attempt_duration configurable in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR

ASSERT_TRUE(cluster1->info()->idleTimeout().has_value());
EXPECT_EQ(std::chrono::hours(1), cluster1->info()->idleTimeout().value());

const std::string explicit_http3 = R"EOF(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: changing "explicit" to "alpn" or "auto" would be slightly clearer

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful review!
I think default-creating the fallback isn't enough for secrets update, but I'm going to leave that as a TODO and fix when I get back. Assuming I manage to get away =P

Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols};
return std::make_unique<Http::ConnectivityGrid>(
dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options,
state, source, std::chrono::milliseconds(300), coptions);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change and enjoy your vacation next week!

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@alyssawilk alyssawilk merged commit ad4919d into envoyproxy:main Apr 16, 2021
stevenzzzz pushed a commit to stevenzzzz/envoy that referenced this pull request Apr 16, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
douglas-reid pushed a commit to douglas-reid/envoy that referenced this pull request Apr 19, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829

Signed-off-by: Gokul Nair <gnair@twitter.com>
@alyssawilk alyssawilk deleted the h3_api branch February 28, 2022 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants