Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: fix cluster_resolver LB policy behavior upon receipt of NACKed EDS resource #6217

Closed
easwars opened this issue Apr 21, 2023 · 3 comments · Fixed by #6436
Closed

xds: fix cluster_resolver LB policy behavior upon receipt of NACKed EDS resource #6217

easwars opened this issue Apr 21, 2023 · 3 comments · Fixed by #6436
Assignees

Comments

@easwars
Copy link
Contributor

easwars commented Apr 21, 2023

Let's say the cluster_resolver LB policy receives two discovery mechanisms as part of its configuration: a higher priority EDS and a lower priority LOGICAL_DNS cluster. The cluster_resolver LB policy creates child policy configs only after both discovery mechanisms are resolved.

Let's specifically consider the case where the first EDS response for the higher priority EDS cluster is NACKed by the xDS client. This is handled by the edsDiscoveryMechanism here:

The error is propagated to the cluster_resolver LB policy by pushing it onto a channel here:

rr.updateChannel <- &resourceUpdate{err: err}

At this point, since the cluster_resolver has not created a child policy (because it has not received valid responses from both discovery mechanisms), it will report TransientFailure to its parent here:

b.cc.UpdateState(balancer.State{

Instead the cluster_resolver should treat this error from the xDS client as though it received an EDS response with a single priority and no localities. This will allow it to create the priority child (once it receives a response from the DNS mechanism as well), and will allow it to fail over to it.

Also, the cluster_resolver LB policy should do the same thing if the EDS watcher returns a resource-does-not-exist error. The only difference between a resource NACK error and a resource-does-not-exist error is that the former ignores the error if there was a previous result, whereas the latter does not.

@ulascansenturk
Copy link
Contributor

Hey @easwars, you can assign me.

@dfawley
Copy link
Member

dfawley commented Apr 25, 2023

Actually, this one is better off waiting on another item that @easwars is planning to do.

@easwars
Copy link
Contributor Author

easwars commented May 9, 2023

Blocked on #6265.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants