-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xds fallback #11254
base: master
Are you sure you want to change the base?
Xds fallback #11254
Conversation
96cecb2
to
72d9ac9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good overall. I finally start to understand the spirit of it.
*/ | ||
default void assignResourcesToOwner(XdsResourceType<?> type, Collection<String> resources, | ||
Object owner) { | ||
// no-op - useful for test cases where everything is mocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:I'm confused by the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When everything is mocked there is no reason to do anything, so the cases where you want your test to implement ResourceStore you don't need to explicitly define this method that you don't want to use.
} | ||
|
||
@Override | ||
public void handleStreamReady(ServerInfo serverInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the onReady()
callback also supported by stubby xdsTransportFactory? @ejona86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have to have something that recognizes that a stream has come ready that can feed into these calls.
709026a
to
0c1eef8
Compare
af33b25
to
d8f7094
Compare
if (resources != null) { | ||
|
||
Collection<String> resources = | ||
resourceStore.getSubscribedResources(serverInfo, resourceType, authority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targetA(channelA) ----> xdsClientA
resource1 (authority1) --> subscriber1
authority1 : [serverInfo1, serverInfo2]
controlPlaneClient ---> serverInfo1
---> serverInfo2
resource2 (authority2) --> subscriber2
authority2 : [serverInfo3, serverInfo4]
controlPlaneClient ---> serverInfo3
---> serverInfo4
resource3 (authority1) --> subscriber3
authority1 : [serverInfo1, serverInfo2]
---> serverInfo1
controlPlaneClient ---> serverInfo2(never happens)
targetA(channelB) ----> xdsClientB
resource1 (authority1) --> subscriber1
...
Each xdsClinet is an instance and has its own the resourceSubscriber map, and separate ads streams for it's resources.
The resources to send request for in this adstream adsStream.sendDiscoveryReuqest
should be the resources with the authority the same as in this adsStream.
…d completely disjoint resource names to assuming that any duplicate names would be compatible, particularly cross authority.
Fallback is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what I have, mostly minor, I haven't looked deep enough.
@Nullable | ||
@Override | ||
public Collection<String> getSubscribedResources(ServerInfo serverInfo, | ||
XdsResourceType<? extends ResourceUpdate> type) { | ||
public Collection<String> getSubscribedResources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify this method to something like this?
resources.entrySet().filter( e -> e.serverInfos.contains(serverInfo)).collect...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the authority approach instead of the ServerInfo because fallback is per authority, so you can have a situation where one authority was complete and another did fallback and requests resources, the first authority's resources would include the fallback server's ServerInfo. We don't want to get an update from the fallback server for the first authority so we don't want to request its resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what I have.
…figure out the current CPC. Add all live ControlPlaneClients to active list rather than skipping ones in backoff. Changed isConnected() logic to be based on AdsStream call's success or failure
} | ||
|
||
for (String authority : getAuthoritiesForServerInfo(serverInfo)) { | ||
if (controlPlaneClient.hasSubscribedResources(authority)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. There's nothing related to the controlPlaneClient
in this method. If we need this, it should just be a private method in XdsClientImpl/resourceStore. But it looks like implying it has something to do with the controlPlaneClient is causing this code to make a misleading decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the checking for subscribed resources out of ControlPlaneClient as this is the only place it is used. I believe it originated from some idea of the ControlPlaneClient knowing which authorities it supported and therefore having ownership of the logic, but the code evolved so it no longer made sense.
Ugh. Github sent the review before I was done with it. Looking over the sent comments, I think they make sense alone/aren't wrong. But I'm still working on the review. |
args.nonce); | ||
if (controlPlaneClient.isStartingUp()) { | ||
controlPlaneClient.startingUpCompleted(); | ||
shutdownLowerPriorityCpcs(controlPlaneClient); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong being only for if errors.isEmpty()
. Why is it dependent on whether we are ACKing or NACKing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private void internalHandleStreamReady( | ||
ServerInfo serverInfo, ControlPlaneClient controlPlaneClient, String authority) { | ||
ControlPlaneClient activeCpClient = getActiveCpc(authority); | ||
addCpcToAuthority(authority, controlPlaneClient); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding to the list here? Shouldn't we have done that in fallBackToCpc()
? It looks like if it isn't registered for the authority yet, then that means it shouldn't be added because a higher priority server is being used for that authority.
Using getAuthoritiesForServerInfo()
before getting here looks to be a mistake. It should be based on activeCpClients
, not bootstrapinfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a need to add the first one in this method. You are right that it is cleaner to make that explicit rather than use the general method which could potentially cover up some problem in the fallback logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a need to add the first one in this method.
Is there? Why would the first be special? I see nothing that would guarantee this is the first ServerInfo in the authority's list, so it seems this could also make things out-of-order.
We're looping through all authorities that this serverInfo can potentially serve, not just the ones it has been requested on. By definition if it isn't in the list already it isn't supposed to be answering for this authority. So it seems clearly wrong to go about adding to the list.
We got here because the authority has subscribers. Is it even possible for an authority to have a subscriber, non-empty serverInfos, and yet not have any cpc in the list? It looks like watchXdsResource()
would add to the list (via addCpcToAuthority()
) when the subscriber was created.
I don't see the point of using getAuthoritiesForServerInfo()
(which loops through all server info registrations) and then figure out if we care about the authority with hasSubscribers()
and activeCpClients.get(authority).contains(readyCpc)
. Seems all we need to do is:
for (Map.Entry<String, List<ControlPlaneClient>> me : activeCpClients.entrySet()) {
if (me.getValue().contains(controlPlaneClient)) {
internalHandleStreamReady(controlPlaneClient, me.getValue());
}
}
There's not need to check isShutDown()
as shut down cleans up activeCpClients
. With this construction, it is impossible for getActiveCpc()
to return null
. There's no need to check activeCpClient.isReady() && !activeCpClients.get(authority).contains(readyCpc)
.
if (ignoreResourceDeletionEnabled && type.isFullStateOfTheWorld() && data != null) { | ||
if (!resourceDeletionIgnored) { | ||
logger.log(XdsLogLevel.FORCE_WARNING, | ||
"xds server {0}: ignoring deletion for resource type {1} name {2}}", | ||
serverInfo.target(), type, resource); | ||
getTarget(), type, resource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of re-discovering what caused this, can we just pass ControlPlaneClient, ServerInfo, or target into onAbsent()
? We have that information in the two places we call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to get target from activeCpc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to use activeCpc
(to call getActiveCpc()
at all). Why re-discover what cpc caused this? It seems easy to add the argument. Would it be a pain to update tests or something like that?
Fix fallback logic for new watch triggering fallback to use the right CPC. Some non-functional clarity improvement Change hasSubscribers detection to be across all types and not trying to involve any logic in ControlPlaneClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things done in commit 003348b
} | ||
|
||
for (String authority : getAuthoritiesForServerInfo(serverInfo)) { | ||
if (controlPlaneClient.hasSubscribedResources(authority)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the checking for subscribed resources out of ControlPlaneClient as this is the only place it is used. I believe it originated from some idea of the ControlPlaneClient knowing which authorities it supported and therefore having ownership of the logic, but the code evolved so it no longer made sense.
… ADS stream gets an error. Retry after error is an error.
… there hasn't been an active client for that authority rather than using the generic add method all of the time.
… instead use locally available value.
…ill in the list for some other authority, call adjustResourceSubscriptions on all resource types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what I have.
// try to fallback to lower priority control plane client | ||
if (checkForFallback | ||
&& doFallbackForAuthority( | ||
cpcThatClosed, serverInfo, subscriber.serverInfos, subscriber.authority)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs for doFallbackForAuthority()
say server info should be for the active CPC. But here it is identical to cpcThatClosed.getServerInfo()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are only trying to do fallback for authorities for which the closed cpc was the active cpc, they are the same. However, since they are the same, and this is the only place that this method is called, there is no reason to pass the serverInfo.
authoritiesForClosedCpc = getActiveAuthorities(cpcThatClosed)
...
if (subscriber.hasResult() || !authoritiesForClosedCpc.contains(subscriber.authority)) {
continue;
}
…rsion that takes a single serverInfo.
…o lookup the ControlPlaneClient instead of sending it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what I have.
int i = serverInfos.indexOf(activeServerInfo); | ||
List<ServerInfo> fallbackServers = (i > -1) | ||
? serverInfos.subList(i + 1, serverInfos.size()) | ||
: null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableList.of()
? Then remove the fallbackServers != null
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
} | ||
|
||
public ControlPlaneClient getOrCreateControlPlaneClient(ServerInfo serverInfo) { | ||
public ControlPlaneClient getOrCreateControlPlaneClient(ImmutableList<ServerInfo> serverInfos) { | ||
if (serverInfos == null || serverInfos.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove isEmpty()
, as it isn't needed.
The callers are going out of their way to avoid passing null
here, so either delete this check or simplify the callers.
(A combined method with fallBackToCpc()
as I discuss in other comments would also probably change what happens here, as you'd probably pass in the authority
as the argument and do the lookup of the serverInfos within this method.)
} | ||
} | ||
|
||
public ControlPlaneClient getOrCreateControlPlaneClient(ImmutableList<ServerInfo> serverInfos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
?
|
||
ControlPlaneClient cpcToUse = subscriber.loadControlPlane(); | ||
|
||
if (subscriber.errorDescription != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition can leak a freshly-created cpcToUse
. Move loadControlPlane()
after the condition?
} | ||
|
||
doFallbackIfNecessary(cpcToUse, subscriber.authority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method re-discovers the code flow taken by getOrCreateControlPlaneClient()
. Can that be integrated into one method to avoid the error-prone handling? Every call to getOrCreateControlPlaneClient()
needs to add the fallback if necessary, and the only other call to getOrCreateControlPlaneClient()
just calls fallBackToCpc()
unconditionally. It would be better to create and update the state together in one method so we know the state remains consistent.
} | ||
|
||
// Everything existed and is in backoff so return null | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning null
here is confusing. The name would not suggest it could return null
("you said you'd create it"). And getOrCreateControlPlaneClient(ServerInfo)
doesn't return null
. So it is very hard to read code using this correctly. For example, the ControlPlaneClient cpc = getOrCreateControlPlaneClient(serverInfo);
in this function (properly) doesn't check the null
return, but it is type-sensitive whether that is appropriate. I suggest renaming this method (potentially also combining it with fallBackToCpc()
that I mention elsewhere).
|
||
private final Map<ServerInfo, LoadStatsManager2> loadStatsManagerMap = new HashMap<>(); | ||
final Map<ServerInfo, LoadReportClient> serverLrsClientMap = new HashMap<>(); | ||
/** Map of authority to its active control plane client (affected by xds fallback). */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"active" here has a different meaning than getActiveCpc()
. We should probably use a different term on one of them.
private void internalHandleStreamReady( | ||
ServerInfo serverInfo, ControlPlaneClient controlPlaneClient, String authority) { | ||
ControlPlaneClient activeCpClient = getActiveCpc(authority); | ||
addCpcToAuthority(authority, controlPlaneClient); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a need to add the first one in this method.
Is there? Why would the first be special? I see nothing that would guarantee this is the first ServerInfo in the authority's list, so it seems this could also make things out-of-order.
We're looping through all authorities that this serverInfo can potentially serve, not just the ones it has been requested on. By definition if it isn't in the list already it isn't supposed to be answering for this authority. So it seems clearly wrong to go about adding to the list.
We got here because the authority has subscribers. Is it even possible for an authority to have a subscriber, non-empty serverInfos, and yet not have any cpc in the list? It looks like watchXdsResource()
would add to the list (via addCpcToAuthority()
) when the subscriber was created.
I don't see the point of using getAuthoritiesForServerInfo()
(which loops through all server info registrations) and then figure out if we care about the authority with hasSubscribers()
and activeCpClients.get(authority).contains(readyCpc)
. Seems all we need to do is:
for (Map.Entry<String, List<ControlPlaneClient>> me : activeCpClients.entrySet()) {
if (me.getValue().contains(controlPlaneClient)) {
internalHandleStreamReady(controlPlaneClient, me.getValue());
}
}
There's not need to check isShutDown()
as shut down cleans up activeCpClients
. With this construction, it is impossible for getActiveCpc()
to return null
. There's no need to check activeCpClient.isReady() && !activeCpClients.get(authority).contains(readyCpc)
.
} | ||
|
||
restartMatchingSubscriberTimers(authority); | ||
readyCpc.sendDiscoveryRequests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't limited to the current authority
, so won't this send discovery multiple times?
…eateControlPlaneClient and do fallback. Move some logic into watchXdsResource from private methods, such as doFallbackIfNecessary, and streamline the execution. Stop storing serverInfos in ResourceSubscriber and just construct it from authority as needed.
…ageControlPlaneClient
Implementation of https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md in Java