xds: decouple A76 and A86 EDS metadata parsing#8875
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8875 +/- ##
==========================================
- Coverage 83.31% 80.60% -2.72%
==========================================
Files 414 416 +2
Lines 32805 33500 +695
==========================================
- Hits 27333 27002 -331
- Misses 4067 4677 +610
- Partials 1405 1821 +416
🚀 New features to boost your workflow:
|
arjan-bal
left a comment
There was a problem hiding this comment.
LGTM! Adding a second reviewer.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively decouples the A76 (ring hash endpoint hash key) and A86 (HTTP CONNECT) features in EDS metadata parsing. The change correctly makes TypedFilterMetadata processing conditional on GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT, preventing unwanted validation failures when only A76 is enabled. The code is clean and the added tests cover some important scenarios. I've suggested one additional test case to explicitly verify the fix for the bug that motivated this change.
The EDS metadata parsing was inadvertently coupling the A76 (ring hash endpoint hash key) and A86 (HTTP CONNECT) features. When either GRPC_XDS_ENDPOINT_HASH_KEY_BACKWARD_COMPAT was false (A76 enabled) or GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT was true (A86 enabled), the code would parse all metadata including TypedFilterMetadata with converters. This caused issues because: - A76 only needs FilterMetadata["envoy.lb"] for hash_key extraction - A86 needs TypedFilterMetadata with converters (e.g., for envoy.config.core.v3.Address) which can fail validation When A76 was enabled but A86 was disabled, invalid proxy addresses in TypedFilterMetadata would still cause NACKs, even though the user didn't care about HTTP CONNECT. This change makes TypedFilterMetadata processing conditional on GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT, so that A76 can be enabled independently without triggering A86-specific validation failures. Also adds test coverage for the previously untested combination of GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT=true with GRPC_XDS_ENDPOINT_HASH_KEY_BACKWARD_COMPAT=true. Release-note: Fix a bug where enabling A76 ring hash endpoint hash key feature would cause EDS resources to be NACKed if they contained invalid proxy address metadata, even when HTTP CONNECT (A86) was not enabled.
422ff7b to
7ce62c9
Compare
Curious. Why does your control plane send invalid proxy addresses in the first place? |
|
@atollena I discussed this with @markdroth and this is what he had to say: I agree that that is probably sub-optimal behavior. In principle, we should probably do something like use the env var to control whether each of the metadata types is registered with the parser, so that even if we parse the metadata map itself, we'll ignore any entry whose type is not enabled by the env var. That having been said, I'm not sure this is worth fixing, since we're eventually going to enable the A86 changes and drop the env var guard, and at that point they'll need to fix their control plane to provide valid data anyway. So it may be better for them to just do that now and not make a change that will ultimately just be temporary. What do you think? |
I only discovered this in the unit tests of grpc-go, which tests the case of invalid proxy addresses that shouldn't fail parsing if A86 is disabled, but it turns out that without this PR, the A76 env var also enables this new behavior. My goal was just to clean up the env var guards for A76 by first enabling them by default, but that would currently enable the address parsing of A86 for all grpc-go users, which doesn't seem right.
I think the change in this PR is a reasonable fix: only parse typed fields if A86 is enabled, since this is the first gRFC to add typed fields. Only enable string types parsing if A76 is enabled (I don't think it can fail). |
|
Apologies for the delay on this one. I'm a bit tied up with something this week. Will try to get to this asap. Thanks. |
|
What do you think about registering the parser for A86 only when the env var is enabled? This is currently done unconditionally here: I think this should make the unit test failure that you are running into go away. Also, we have already started working on the implementation of A83, and it would be nice to follow the same approach there of registering the parser only when the corresponding env var is enabled. Otherwise, we will be stuck inside of |
Understood, I took your suggestion. I also added a test helper to unregister converters. |
easwars
left a comment
There was a problem hiding this comment.
Apologies for the delay.
The EDS metadata parsing was inadvertently coupling the A76 (ring hash
endpoint hash key) and A86 (HTTP CONNECT) features. When either
GRPC_XDS_ENDPOINT_HASH_KEY_BACKWARD_COMPAT was false (A76 enabled) or
GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT was true (A86 enabled), the code would
parse all metadata including TypedFilterMetadata with converters.
This caused issues because:
envoy.config.core.v3.Address) which can fail validation
When A76 was enabled but A86 was disabled, invalid proxy addresses in
TypedFilterMetadata would still cause NACKs, even though the user didn't care
about HTTP CONNECT.
This change makes TypedFilterMetadata processing conditional on
GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT, so that A76 can be enabled independently
without triggering A86-specific validation failures.
RELEASE NOTES:
would cause EDS resources to be NACKed if they contained invalid proxy address
metadata, even when HTTP CONNECT (A86) was not enabled.