Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: add support for multiple xDS clients, for fallback #7347

Merged
merged 7 commits into from
Jul 2, 2024

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Jun 24, 2024

This PR adds support for multiple xDS clients, as required by xDS fallback(A71).

Summary of changes:

  • xDS client creation APIs now require a name to be specified
    • On the client side, the user's dial target is used, and on the server side a well-known dedicated key #server is used
    • Tests pass t.Name as the xDS client name
    • Also, added a GetForTesting method since it is no longer possible to simply call New and expect to get a reference to the singleton xDS client (as used by some tests to trigger resource not found errors)
    • Changes to the following resolvers to pass user's dial target during xDS client creation
      • google_c2p
      • xds
  • xDS client changes to dumping resources
    • Removed the DumpResources method from the XDSClient interface
    • Added a DumpResources function to the xdsclient package that traverses all existing xDS clients, retrieves a dump from each of them, and adds the client_scope field to each one, and returns a response that contains resources from all the clients
  • CSDS service implementation changes
    • No longer holds a reference to the xDS singleton. Instead invokes the DumpResources API provided by the xdsclient package (that retrieves configs from all xDS clients in the system).
    • E2E test now creates more than one xDS client and verifies that the client_scope field is set appropriately.
  • Minor improvements to stringing functionality in the bootstrap package to improve test log readability.

Fixes #6899

#a71-xds-fallback

RELEASE NOTES: none

@easwars easwars added the Type: Feature New features or improvements in behavior label Jun 24, 2024
@easwars easwars added this to the 1.66 Release milestone Jun 24, 2024
@easwars
Copy link
Contributor Author

easwars commented Jun 24, 2024

I probably need to add one more test where the xDS clients are talking to different management servers, and receiving different resources. But the review can start prior to that, and I should have that test written soonish.

@easwars
Copy link
Contributor Author

easwars commented Jun 24, 2024

TestDumpResources is failing because the test context is expiring before we test can read the expected config dump from the xDS clients. My guess at this point is that because the go-control-plane keeps resending resources when they are NACKed, the gRPC client is overwhelmed by ADS responses. But I have to confirm this theory. Will update when I have a breakthrough.

@easwars easwars closed this Jun 24, 2024
@easwars easwars reopened this Jun 24, 2024
Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 80.21978% with 18 lines in your changes missing coverage. Please review.

Project coverage is 81.32%. Comparing base (4dd7f55) to head (ef92a2f).
Report is 22 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7347      +/-   ##
==========================================
+ Coverage   80.58%   81.32%   +0.73%     
==========================================
  Files         349      348       -1     
  Lines       34056    26744    -7312     
==========================================
- Hits        27445    21749    -5696     
+ Misses       5431     3795    -1636     
- Partials     1180     1200      +20     
Files Coverage Δ
xds/csds/csds.go 69.23% <100.00%> (-3.50%) ⬇️
xds/internal/resolver/xds_resolver.go 80.71% <100.00%> (+1.78%) ⬆️
xds/internal/xdsclient/authority.go 86.09% <100.00%> (-2.85%) ⬇️
xds/internal/xdsclient/clientimpl_dump.go 100.00% <100.00%> (+15.00%) ⬆️
xds/server.go 81.74% <100.00%> (-1.00%) ⬇️
xds/internal/xdsclient/clientimpl_loadreport.go 69.23% <0.00%> (-4.11%) ⬇️
internal/testutils/xds/e2e/logging.go 50.00% <50.00%> (ø)
xds/googledirectpath/googlec2p.go 87.50% <33.33%> (+1.78%) ⬆️
xds/internal/xdsclient/client_refcounted.go 83.72% <92.30%> (ø)
xds/internal/xdsclient/client_new.go 86.04% <75.00%> (+1.76%) ⬆️
... and 1 more

... and 292 files with indirect coverage changes

@easwars
Copy link
Contributor Author

easwars commented Jun 25, 2024

Tests are passing now. This is good to be looked at.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good overall, mostly just style/naming nits. Some questions about scaling up tests.

xds/internal/xdsclient/client_new.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client_new.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client_new.go Outdated Show resolved Hide resolved
xds/server_ext_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/clientimpl_dump.go Show resolved Hide resolved
xds/csds/csds_e2e_test.go Show resolved Hide resolved
xds/csds/csds_e2e_test.go Show resolved Hide resolved
xds/csds/csds_e2e_test.go Show resolved Hide resolved

}
func newRefCountedWithConfig(name string, fallbackConfig *bootstrap.Config, watchExpiryTimeout, idleAuthorityTimeout time.Duration) (XDSClient, func(), error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top level comment of get or create if not present? Or change name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for the Close, is the reason to keep these names this happens implicitly with objects that are ref counted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained the reasoning in the comment for close.

xds/internal/xdsclient/client_new.go Show resolved Hide resolved
@zasweq zasweq assigned easwars and unassigned zasweq Jun 27, 2024
@easwars easwars assigned zasweq and unassigned easwars Jun 27, 2024
@easwars
Copy link
Contributor Author

easwars commented Jun 27, 2024

Looks like there is a flake in the csds e2e test. I will continue to look at that, but this is still good for a second pass. Thanks.

Comment on lines 61 to 63
// newRefCountedWithConfig creates a new reference counted xDS client
// implementation for name, if one does not exist already. The passed in
// fallback config is used when bootstrap environment variables are not defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "gets a reference to the existing xDS Client, or creates one if one does not exist already" or something alongside those lines for first context. To explicitly call out what happens if the xDS Client already exists for this name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Tests the scenario where there are multiple calls to New() with different
// names. Verifies that reference counts are tracked correctly and that only
// when all references are released, the client is closed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: switch "are released, the client is closed" to something that emphasizes the plurality of the clients maybe "are released for a client, only that specific client is closed" or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@zasweq zasweq assigned easwars and unassigned zasweq Jun 28, 2024
// Close the first client completely.
closeFunc1()
for i := 0; i < count; i++ {
closeFuncs1[i]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: here and the close for 2: the test above tests the idempotency implemented through the once func. Maybe call some of these twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

@easwars
Copy link
Contributor Author

easwars commented Jul 1, 2024

The CSDS test is failing on Github Actions because of the way go-control-plane continuously resends NACKed resources. This results in a busy loop on the xdsClient where it receives a resource and NACKs it repeatedly. This results in starvation of the DumpResources call from the CSDS service.

I'm planning to put this PR on hold for now, fix grpc/grpc#34099 and get back here.

@easwars
Copy link
Contributor Author

easwars commented Jul 2, 2024

@zasweq : Based on our offline discussion, I made a commit that skips the test on arm64. Waiting for GA to pass before I merge this. That way, I can work on the other PR implementing ADS stream flow control, and once I have that, I will be able to tweak this test and get it to work.

@zasweq
Copy link
Contributor

zasweq commented Jul 2, 2024

Sgtm, thinking about it I agree with Doug. To minimize the chance of merge conflicts probably best to get this in sooner rather than later.

@easwars easwars merged commit 4e9b596 into grpc:master Jul 2, 2024
13 checks passed
@easwars easwars deleted the support_multiple_xds_clients branch July 2, 2024 22:27
printchard pushed a commit to printchard/grpc-go that referenced this pull request Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xdsclient: support multiple xDS clients [A71]
2 participants