quic: change QuicNetworkConnectivityObserver interface#35775
quic: change QuicNetworkConnectivityObserver interface#35775RyanTheOptimist merged 42 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
/retest |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
/retest |
|
/retest |
Signed-off-by: Dan Zhang <danzh@google.com>
|
/assign @RyanTheOptimist |
|
/retest |
1 similar comment
|
/retest |
RyanTheOptimist
left a comment
There was a problem hiding this comment.
Boy, there is a lot going on in this PR. Could we consider breaking it into some smaller pieces?
envoy/upstream/cluster_manager.h
Outdated
| class EnvoyQuicNetworkObserverRegistryFactory; | ||
| class EnvoyQuicNetworkObserverRegistry; | ||
| #else | ||
| // Dumb definitions of QUIC classes if QUICHE is compiled out. |
There was a problem hiding this comment.
nit: QUICHE is always compiled in, but the QUIC parts may or may not be. So probably s/QUICHE/QUIC/?
| dns_callbacks_handle_{nullptr}; | ||
| Upstream::ClusterManager& cluster_manager_; | ||
| #ifdef ENVOY_ENABLE_QUIC | ||
| std::unique_ptr<Quic::EnvoyMobileQuicNetworkObserverRegistryFactory> |
There was a problem hiding this comment.
nit: I looks like this is always created (when QUIC is compiled in). If so, consider making this a non-pointer member. That would avoid the need to explicitly construct it in the constructor.
| DnsCacheManagerSharedPtr dns_cache_manager_; | ||
| ProxySettingsConstSharedPtr proxy_settings_; | ||
| static NetworkState network_state_; | ||
| const bool quic_upstream_connection_handle_network_change_; |
There was a problem hiding this comment.
nit: comment, please.
| virtual Extensions::Common::DynamicForwardProxy::DnsCacheSharedPtr dnsCache() PURE; | ||
|
|
||
| /** | ||
| * Called when OS changes the preferred network. |
There was a problem hiding this comment.
nit: "OS" => "the OS"
| ASSERT(Runtime::runtimeFeatureEnabled( | ||
| "envoy.reloadable_features.quic_upstream_connection_handle_network_change")); | ||
| dispatcher_.post([this]() { | ||
| // Retain the existing observers in a list and iterate on the list. |
There was a problem hiding this comment.
Why do we need to do this? Does the list get potentially mutated?
There was a problem hiding this comment.
yes, the conn pool may create new connections as a back fill if the connections on this list gets closed.
| ENVOY_LOG_MISC(trace, "Default network changed."); | ||
| ASSERT(Runtime::runtimeFeatureEnabled( | ||
| "envoy.reloadable_features.quic_upstream_connection_handle_network_change")); | ||
| dispatcher_.post([this]() { |
There was a problem hiding this comment.
Out of curiosity, why is this posted to the dispatcher instead of running on the current thread? Maybe add a comment?
There was a problem hiding this comment.
This is called on from Android API, so not necessarily on the network thread.
There was a problem hiding this comment.
Can you clue me in on where it's called unsafely? It looks like it's called from the connection manager's onnetworkmadedefault which is called from InternalEngine::onDefaultNetworkChanged which already switches to the dispatcher context.
There was a problem hiding this comment.
Also I'm going to pause here because of quic-specific logic and wait for a discussion of general non-quic plans
There was a problem hiding this comment.
Oh, I didn't notice that InternalEngine already post()'ed the callback. E-M only has one dispatcher, so probably we can omit post()'ing here. But theoretically the observer registry is one per worker thread (owned by ThreadLocalClusterManagerImpl), and its onNetworkMadeDefault() is called on the main thread even though InternalEngine already switches to the dispatcher context via post(). On the other hand, E-M only has one network thread and dispatcher. I'm wondering if it's clearer to explicitly do post() here to fit the threading model in Envoy core or skip doing post() given this is E-M code.
There was a problem hiding this comment.
I removed the post() here and add assert on isThreadSafe(). instead
envoy/upstream/cluster_manager.h
Outdated
| #else | ||
| // Dumb definitions of QUIC classes if QUICHE is compiled out. | ||
| class EnvoyQuicNetworkObserverRegistryFactory {}; | ||
| class EnvoyQuicNetworkObserverRegistry {}; |
There was a problem hiding this comment.
Actually, I think we should handle this conditional definition at the place where these classes are defined. Perhaps instead of conditionally building the .h/.cc files, we could wrap their contents in #ifdef so that they are always defined and we can avoid this conditional logic outside of that class. WDYT?
|
|
||
| void onConnectionEvent(ActiveClient& client, absl::string_view failure_reason, | ||
| Network::ConnectionEvent event); | ||
| Network::ConnectionEvent event, bool purge_pending_streams); |
There was a problem hiding this comment.
nit: please add a comment which explains how purge_pending_streams works (or perhaps why we might set it to true sometimes).
There was a problem hiding this comment.
I might be misreading this, but it seems like if purge_pending_streams is true then we end up calling purgePendingStreams() which calls onPoolFailure(). Is the comment backwards, or am I misunderstanding something?
There was a problem hiding this comment.
Woops, it is backwards.
source/common/http/conn_pool_base.cc
Outdated
| if (codec_client_->numActiveRequests() == 0) { | ||
| if (codec_client_->protocol() == Protocol::Http3 && error_code == GoAwayErrorCode::Other) { | ||
| // This must be network change because QUIC GOAWAY frame doesn't have error code. | ||
| close_after_network_change_ = true; |
There was a problem hiding this comment.
Why do we do this only for H3? (Maybe add a comment?)
There was a problem hiding this comment.
This is how ActiveClient knows about the upcoming connection close is caused by network change instead of a real GOAWAY frame. And we can only infer it in HTTP/3 because H3 GOAWAY frame should trigger onGoAway() with GoAwayErrorCode::NoError. In this PR QuicNetworkConnectivityObserver::onNetworkChanged() calls this with GoAwayErrorCode::Other.
There was a problem hiding this comment.
Hm. I see. And is onNetworkChanged() the only caller which passes in Other? This feels a big like spooky action at a distance. We could potentially add an explicit argument to onGoaway() or add a new enum value, though both are possibly used in lots of places. I don't want to block if we're sure this works, but it feels a bit sketchy.
There was a problem hiding this comment.
I was debating between these options. The current way is indeed sketchy, but at least works. I also checked the usage of GoAwayErrorCode and found that a lot of places use if (NoError) {...} else {...} or if (!NoError) {...}, so I hesitated about adding a new enum value.
The remaining alternatives are changing the existing Http::ConnectionCallbacks::onGoAway() interface or adding a new interface to one of the interfaces ConnectionPool::ActiveClient inherits from. These are all used in a lot of places. And onGoAway() is also used in server code where network change doesn't make sense. Which one do you think we should pursue?
There was a problem hiding this comment.
Hm... What if we add a new interface for ActiveClient to implement. This new interface could have an onNetworkChanged() method. Then QuicHttpClientConnectionImpl can take an instance of this interface (which would be ActiveClient, of course) where it can call some new setNetworkChangedCallback() method on the EnvoyQuicClientSession. Would that work?
There was a problem hiding this comment.
I added a new callback interface for Network::ClientConnection to achieve this.
source/common/http/conn_pool_base.h
Outdated
| HttpConnPoolImplBase& parent() { return *static_cast<HttpConnPoolImplBase*>(&parent_); } | ||
|
|
||
| Http::CodecClientPtr codec_client_; | ||
| bool close_after_network_change_{false}; |
|
/wait |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
CI seems to be failing |
|
/wait |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
I split this PR into smaller ones. The current one is simply refining QuicNetworkConnectivityObserver interface. The rest of the plumbing is moved into a follow up PR. PTAL |
| #include <memory> | ||
|
|
||
| #include "source/common/common/logger.h" | ||
| using NetworkHandle = int64_t; |
There was a problem hiding this comment.
shouldn't this be moved to inside the Envoy::Quic namespace so it is scoped?
There was a problem hiding this comment.
Moved it into Envoy namespace.
Signed-off-by: Dan Zhang <danzh@google.com>
|
Addressed test coverage. PTAL! |
|
This is core code, so please also wait for Ryan's approval |
Commit Message: change QuicNetworkConnectivityObserver into pure virtual interface for easy mocking and move the implementation into another class QuicNetworkConnectivityObserverImpl. By doing so, envoy_quic_network_observer_registry_factory_lib no longer needs to depend on QUICHE targets and thus no need to provide a dumb implementation if QUICHE is compiled out.
Also change QuicNetworkConnectivityObserver to have 3 interfaces: onNetworkMadeDefault(), onNetworkConnected() and onNetworkDisconnected().
Risk Level: low, new interface not in use
Testing: n/a
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Runtime guard: N/A