Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: implementation of the xdsChannel #7757

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Oct 17, 2024

#a71-xds-fallback
#xdsclient-refactor
Addresses #6902

  • An xdsChannel represents a client channel to the management server, and it builds on top of recently merged PRs that added functionality for an xDS transport and ADS/LRS streams.
  • An xdsChannel will own a transport.Interface, an ads.Stream and an lrs.Stream.
  • xdsChannels will be owned by the xDS client and authority instances will acquire references to the xdsChannel.
    • An xdsChannel might be shared across authority instances.
  • An xdsChannel will communicate events to the xDS client (like ADS responses, ADS stream failures etc), which will propagate them to all interested authority instances.
  • This PR only adds the xdsChannel functionality.
    • A subsequent PR will integrate this into the xDS client.

RELEASE NOTES: none

@easwars easwars added the Type: Internal Cleanup Refactors, etc label Oct 17, 2024
@easwars easwars added this to the 1.68 Release milestone Oct 17, 2024
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 79.47020% with 31 lines in your changes missing coverage. Please review.

Project coverage is 81.33%. Comparing base (830135e) to head (c0bd002).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/xdsclient/channel.go 78.76% 24 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7757      +/-   ##
==========================================
+ Coverage   80.58%   81.33%   +0.74%     
==========================================
  Files         365      368       +3     
  Lines       28635    36768    +8133     
==========================================
+ Hits        23076    29905    +6829     
- Misses       4357     5632    +1275     
- Partials     1202     1231      +29     
Files with missing lines Coverage Δ
xds/internal/xdsclient/authority.go 89.75% <100.00%> (+0.51%) ⬆️
xds/internal/xdsclient/transport/ads/ads_stream.go 71.05% <ø> (+71.05%) ⬆️
...nal/xdsclient/xdsresource/cluster_resource_type.go 77.14% <100.00%> (-2.86%) ⬇️
...l/xdsclient/xdsresource/endpoints_resource_type.go 75.00% <100.00%> (-3.58%) ⬇️
...al/xdsclient/xdsresource/listener_resource_type.go 85.96% <100.00%> (-1.27%) ⬇️
...ds/internal/xdsclient/xdsresource/resource_type.go 100.00% <ø> (ø)
...dsclient/xdsresource/route_config_resource_type.go 75.00% <100.00%> (-3.58%) ⬇️
xds/internal/xdsclient/channel.go 78.76% <78.76%> (ø)

... and 315 files with indirect coverage changes

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor implementation nits, feels really close overall.

xds/internal/xdsclient/channel.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel.go Show resolved Hide resolved
xds/internal/xdsclient/channel.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel.go Show resolved Hide resolved
xds/internal/xdsclient/channel.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel.go Show resolved Hide resolved
xds/internal/xdsclient/channel.go Show resolved Hide resolved
xds/internal/xdsclient/channel.go Show resolved Hide resolved
@zasweq zasweq assigned easwars and unassigned zasweq Oct 21, 2024
@easwars easwars assigned zasweq and unassigned easwars Oct 21, 2024
// the update, and the status of the update (ACKed or NACKed).
//
// If there are any errors decoding the resources, the metadata will indicate
// that the update was NACKed, and the errors will be returned as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/errors will be returned as well/something as in paragraph 2, that mentions it is the value for the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor testing comments.

xds/internal/xdsclient/channel_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Show resolved Hide resolved
xds/internal/xdsclient/channel_test.go Show resolved Hide resolved
@zasweq zasweq assigned easwars and unassigned zasweq Oct 22, 2024
ret[name] = ads.DataAndErrTuple{Err: err}
}

if len(topLevelErrors) == 0 && len(perResourceErrors) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification: why status is ServiceStatusNACKed if only some resources have errors and some not? If i remember correctly, the server will send us all the resources by default whether we need it or not. So, we should not consider those we don't need right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the xDS transport protocol level, an ACK or a NACK is not for a particular resource, but for a version of a particular resource type.

What this means is that if we receive resources [A, B, C] of type Listener at version 1, and we like all of them, we would ACK version 1 for resource type Listener.

Later on, if we receive an update for the same resource type and we receive resource [A, B, C] at version 2, but this time we don't like resource C for some reason, we would send a NACK for version 2 for resource type Listener. But we will go ahead and use resources A and B at version 2, but we will use resource C at version 1. This information is captured in the update metadata and when a CSDS client queries us for a resource dump, we will return [A at 2], [B at 2], [C at 1, but NACKed 2]. This is taken care of at the authority level, which is the one that maintains the resource cache.

Hope this helps.

func verifyUpdateAndMetadata(ctx context.Context, t *testing.T, eh *testEventHandler, wantUpdates map[string]ads.DataAndErrTuple, wantMD xdsresource.UpdateMetadata) {
t.Helper()

gotTyp, gotUpdates, gotMD, err := eh.waitForUpdate(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it intentional everywhere to write typ instead of type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is a reserved keyword in Go.

})
listener2 = testutils.MarshalAny(t, &v3listenerpb.Listener{
Name: listenerName2,
ApiListener: apiListener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we use same apiListener here for listener2? The VirtualHosts's Domains have listenerName1 only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The virtual host domain matching is done by the xds resolver. As the comment on the type indicates:

// VirtualHost contains the routes for a list of Domains.
//
// Note that the domains in this slice can be a wildcard, not an exact string.
// The consumer of this struct needs to find the best match for its hostname.

I ended up changing it to a wildcard to avoid confusion.

But your comment actually made me dig deeper into why the test was passing and I figured out that the xdsresource.ResourceData type which is part of the ads.DataAndErrTuple had a method named Equal. This Equal method only compared the underlying raw proto. And since an Equal method was defined on the type, cmp.Diff ended up using that and was not doing a comparison of the actual internal representation of the update. I fixed that by renaming the method to RawEqual. Now, only callers who actually want to compare the raw proto will call RawEqual and currently the only caller that does that is the authority.

Thanks.

TypeUrl: "type.googleapis.com/envoy.config.listener.v3.Listener",
Resources: []*anypb.Any{listener1, listener2},
},
wantUpdates: map[string]ads.DataAndErrTuple{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to verify both when we requested only one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xDS management server can return more resources than the client actually requested. This was historically the case because envoy generally uses a wildcard for LDS and CDS resources and the management server would return all resources that it knew about. But when gRPC started implementing xDS, we were requesting for specific resources, but the management servers were returning all resources that they knew of (for simplicity of implementation). So, gRPC has to drop resources that it did not request. Again, this is done at the authority layer.

@easwars easwars assigned zasweq and unassigned easwars Oct 22, 2024
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zasweq zasweq assigned easwars and unassigned zasweq Oct 22, 2024
@easwars easwars merged commit 8212cf0 into grpc:master Oct 23, 2024
15 checks passed
@easwars easwars deleted the xds_channel_impl branch October 23, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants