Skip to content

quic: change QuicNetworkConnectivityObserver interface#35775

Merged
RyanTheOptimist merged 42 commits intoenvoyproxy:mainfrom
danzh2010:networkchange
Oct 10, 2025
Merged

quic: change QuicNetworkConnectivityObserver interface#35775
RyanTheOptimist merged 42 commits intoenvoyproxy:mainfrom
danzh2010:networkchange

Conversation

@danzh2010
Copy link
Contributor

@danzh2010 danzh2010 commented Aug 21, 2024

Commit Message: change QuicNetworkConnectivityObserver into pure virtual interface for easy mocking and move the implementation into another class QuicNetworkConnectivityObserverImpl. By doing so, envoy_quic_network_observer_registry_factory_lib no longer needs to depend on QUICHE targets and thus no need to provide a dumb implementation if QUICHE is compiled out.

Also change QuicNetworkConnectivityObserver to have 3 interfaces: onNetworkMadeDefault(), onNetworkConnected() and onNetworkDisconnected().

Risk Level: low, new interface not in use
Testing: n/a
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Runtime guard: N/A

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #35775 was opened by danzh2010.

see: more, trace.

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

/retest

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

/retest

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

/retest

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

/assign @RyanTheOptimist

@danzh2010
Copy link
Contributor Author

/retest

1 similar comment
@danzh2010
Copy link
Contributor Author

/retest

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boy, there is a lot going on in this PR. Could we consider breaking it into some smaller pieces?

class EnvoyQuicNetworkObserverRegistryFactory;
class EnvoyQuicNetworkObserverRegistry;
#else
// Dumb definitions of QUIC classes if QUICHE is compiled out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: QUICHE is always compiled in, but the QUIC parts may or may not be. So probably s/QUICHE/QUIC/?

dns_callbacks_handle_{nullptr};
Upstream::ClusterManager& cluster_manager_;
#ifdef ENVOY_ENABLE_QUIC
std::unique_ptr<Quic::EnvoyMobileQuicNetworkObserverRegistryFactory>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I looks like this is always created (when QUIC is compiled in). If so, consider making this a non-pointer member. That would avoid the need to explicitly construct it in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

DnsCacheManagerSharedPtr dns_cache_manager_;
ProxySettingsConstSharedPtr proxy_settings_;
static NetworkState network_state_;
const bool quic_upstream_connection_handle_network_change_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

virtual Extensions::Common::DynamicForwardProxy::DnsCacheSharedPtr dnsCache() PURE;

/**
* Called when OS changes the preferred network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "OS" => "the OS"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ASSERT(Runtime::runtimeFeatureEnabled(
"envoy.reloadable_features.quic_upstream_connection_handle_network_change"));
dispatcher_.post([this]() {
// Retain the existing observers in a list and iterate on the list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? Does the list get potentially mutated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the conn pool may create new connections as a back fill if the connections on this list gets closed.

ENVOY_LOG_MISC(trace, "Default network changed.");
ASSERT(Runtime::runtimeFeatureEnabled(
"envoy.reloadable_features.quic_upstream_connection_handle_network_change"));
dispatcher_.post([this]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this posted to the dispatcher instead of running on the current thread? Maybe add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called on from Android API, so not necessarily on the network thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clue me in on where it's called unsafely? It looks like it's called from the connection manager's onnetworkmadedefault which is called from InternalEngine::onDefaultNetworkChanged which already switches to the dispatcher context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm going to pause here because of quic-specific logic and wait for a discussion of general non-quic plans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice that InternalEngine already post()'ed the callback. E-M only has one dispatcher, so probably we can omit post()'ing here. But theoretically the observer registry is one per worker thread (owned by ThreadLocalClusterManagerImpl), and its onNetworkMadeDefault() is called on the main thread even though InternalEngine already switches to the dispatcher context via post(). On the other hand, E-M only has one network thread and dispatcher. I'm wondering if it's clearer to explicitly do post() here to fit the threading model in Envoy core or skip doing post() given this is E-M code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the post() here and add assert on isThreadSafe(). instead

#else
// Dumb definitions of QUIC classes if QUICHE is compiled out.
class EnvoyQuicNetworkObserverRegistryFactory {};
class EnvoyQuicNetworkObserverRegistry {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should handle this conditional definition at the place where these classes are defined. Perhaps instead of conditionally building the .h/.cc files, we could wrap their contents in #ifdef so that they are always defined and we can avoid this conditional logic outside of that class. WDYT?


void onConnectionEvent(ActiveClient& client, absl::string_view failure_reason,
Network::ConnectionEvent event);
Network::ConnectionEvent event, bool purge_pending_streams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add a comment which explains how purge_pending_streams works (or perhaps why we might set it to true sometimes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading this, but it seems like if purge_pending_streams is true then we end up calling purgePendingStreams() which calls onPoolFailure(). Is the comment backwards, or am I misunderstanding something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, it is backwards.

if (codec_client_->numActiveRequests() == 0) {
if (codec_client_->protocol() == Protocol::Http3 && error_code == GoAwayErrorCode::Other) {
// This must be network change because QUIC GOAWAY frame doesn't have error code.
close_after_network_change_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this only for H3? (Maybe add a comment?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how ActiveClient knows about the upcoming connection close is caused by network change instead of a real GOAWAY frame. And we can only infer it in HTTP/3 because H3 GOAWAY frame should trigger onGoAway() with GoAwayErrorCode::NoError. In this PR QuicNetworkConnectivityObserver::onNetworkChanged() calls this with GoAwayErrorCode::Other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I see. And is onNetworkChanged() the only caller which passes in Other? This feels a big like spooky action at a distance. We could potentially add an explicit argument to onGoaway() or add a new enum value, though both are possibly used in lots of places. I don't want to block if we're sure this works, but it feels a bit sketchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating between these options. The current way is indeed sketchy, but at least works. I also checked the usage of GoAwayErrorCode and found that a lot of places use if (NoError) {...} else {...} or if (!NoError) {...}, so I hesitated about adding a new enum value.
The remaining alternatives are changing the existing Http::ConnectionCallbacks::onGoAway() interface or adding a new interface to one of the interfaces ConnectionPool::ActiveClient inherits from. These are all used in a lot of places. And onGoAway() is also used in server code where network change doesn't make sense. Which one do you think we should pursue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... What if we add a new interface for ActiveClient to implement. This new interface could have an onNetworkChanged() method. Then QuicHttpClientConnectionImpl can take an instance of this interface (which would be ActiveClient, of course) where it can call some new setNetworkChangedCallback() method on the EnvoyQuicClientSession. Would that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new callback interface for Network::ClientConnection to achieve this.

HttpConnPoolImplBase& parent() { return *static_cast<HttpConnPoolImplBase*>(&parent_); }

Http::CodecClientPtr codec_client_;
bool close_after_network_change_{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@RyanTheOptimist
Copy link
Contributor

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@RyanTheOptimist
Copy link
Contributor

CI seems to be failing
/wait

@danzh2010
Copy link
Contributor Author

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010 danzh2010 changed the title quic: propagate default network change event to downstream connections quic: change QuicNetworkConnectivityObserver interface Oct 9, 2025
@danzh2010
Copy link
Contributor Author

I split this PR into smaller ones. The current one is simply refining QuicNetworkConnectivityObserver interface. The rest of the plumbing is moved into a follow up PR. PTAL

@danzh2010 danzh2010 assigned abeyad and unassigned alyssawilk Oct 9, 2025
Signed-off-by: Dan Zhang <danzh@google.com>
abeyad
abeyad previously approved these changes Oct 9, 2025
#include <memory>

#include "source/common/common/logger.h"
using NetworkHandle = int64_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be moved to inside the Envoy::Quic namespace so it is scoped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it into Envoy namespace.

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

Addressed test coverage. PTAL!

@abeyad
Copy link
Contributor

abeyad commented Oct 10, 2025

This is core code, so please also wait for Ryan's approval

@RyanTheOptimist RyanTheOptimist merged commit a61a0dd into envoyproxy:main Oct 10, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants