upstream adding QUIC-to-TCP failover#15894
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
mattklein123
left a comment
There was a problem hiding this comment.
Exciting stuff. A left some ideas for discussion. Thank you!
/wait-any
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
Tsan failure was CI failure, not test fail. This should be ready for a pass! |
|
@snowp would you have time today to do a first pass? Matt's on call for the release and I was hoping to land this before I take off for a week. |
|
ugggh, accidentally pushed to the wrong branch. reverted but CI is of course running again (was green, and barring flakes will be green again) |
|
CI is unrelated flake: 2021-04-15T17:51:08.2680556Z Pulling zipkin (openzipkin/zipkin:)... |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
antoniovicente
left a comment
There was a problem hiding this comment.
Generally looks good. There may be a threading issue in the call to maybeCreateFallbackFactory if the factories are not per-worker.
| Network::TransportSocketPtr | ||
| createTransportSocket(Network::TransportSocketOptionsSharedPtr options) const override { | ||
| // TODO(#15649) make createTransportSocket non-const to avoid the const cast. | ||
| const_cast<QuicClientTransportSocketFactory*>(this)->maybeCreateFallbackFactory(); |
There was a problem hiding this comment.
Are factories per worker or can createTransportSocket be called concurrently by multiple theads? If the later, there's a thread race in this code.
I would consider creating this fallback factory in QuicClientTransportSocketFactory's constructor in order to guarantee that it always exists if createTransportSocket is ever called.
| Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols}; | ||
| return std::make_unique<Http::ConnectivityGrid>( | ||
| dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options, | ||
| state, source, std::chrono::milliseconds(300), coptions); |
There was a problem hiding this comment.
Curious, are there plans to make next_attempt_duration configurable in the future?
There was a problem hiding this comment.
Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR
| ASSERT_TRUE(cluster1->info()->idleTimeout().has_value()); | ||
| EXPECT_EQ(std::chrono::hours(1), cluster1->info()->idleTimeout().value()); | ||
|
|
||
| const std::string explicit_http3 = R"EOF( |
There was a problem hiding this comment.
nit: changing "explicit" to "alpn" or "auto" would be slightly clearer
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk
left a comment
There was a problem hiding this comment.
Thanks for the helpful review!
I think default-creating the fallback isn't enough for secrets update, but I'm going to leave that as a TODO and fix when I get back. Assuming I manage to get away =P
| Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols}; | ||
| return std::make_unique<Http::ConnectivityGrid>( | ||
| dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options, | ||
| state, source, std::chrono::milliseconds(300), coptions); |
There was a problem hiding this comment.
Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR
antoniovicente
left a comment
There was a problem hiding this comment.
Thanks for the change and enjoy your vacation next week!
Risk Level: n/a (hidden by default) Testing: e2e tests Docs Changes: n/a Release Notes: n/a Part of envoyproxy#14829 Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Risk Level: n/a (hidden by default) Testing: e2e tests Docs Changes: n/a Release Notes: n/a Part of envoyproxy#14829 Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Risk Level: n/a (hidden by default) Testing: e2e tests Docs Changes: n/a Release Notes: n/a Part of envoyproxy#14829 Signed-off-by: Gokul Nair <gnair@twitter.com>
Hey Matt, can I get your thoughts on 2 things?
if we're Ok with having a hard-coded something (right now "use_quic") matcher to handle TCP-TLS and QUIC-TLS in the same cluster (or maybe we want to do something to combine the two configs and auto-select based on transport?)
how we want to handle the connection pool being able to select a non-default transportSocket. I have "use_quic" as a placeholder argument but I think if we extend the API the boolean should probably be replaced with an optional metadata. I'd like your thoughts if you have a better solution before I go change the API and all the mocks though.
Because I haven't cleaned up all the mocks so this will not build, but the integration test both provides a proof of concept for failover working, and the annoyance in configuring both TCP-TLS and QUIC-TLS transport sockets.
cc @RyanTheOptimist
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Part of #14829