-
Notifications
You must be signed in to change notification settings - Fork 5.4k
clarify xds docs on resource deletion #20523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
7bbebd4
clarify xds docs on resource deletion
ramaraochavali 797401c
lint
ramaraochavali 79377aa
lint and paragraph
ramaraochavali 572fac0
lint
ramaraochavali 716d1f1
fix space
ramaraochavali d55d107
fix space
ramaraochavali d9a87ee
change header type
ramaraochavali File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. The incremental protocol variant does explicitly indicate resource removal for all resource types, not just for wildcard resource types.
Please fix this in a subsequent PR.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think implementation for delta even though removed_resources are specified, would cleanup resources only on parent deletion for non wildcard types like RDS and EDS
cc: @howardjohn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is my understanding of how Envoy EDS works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example RDS:
envoy/source/common/rds/rds_route_config_subscription.cc
Line 104 in 575cab4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the current Envoy implementation works that way, but I think that's because the current implementation does not correctly separate the xDS data model layer from the the xDS transport protocol layer.
The xDS transport protocol layer is just a generic pub/sub mechanism where clients can subscribe to arbitrary resources by name and resource type. The transport protocol layer does not know or care about the contents of those resources; it treats them as opaque blobs that are passed up to the data model layer. The data model layer is where the contents of the resources are actually understood -- e.g., when a Listener resource is provided to the data model layer, the data model layer knows what all of the fields in the proto mean and how to use them.
At the xDS transport protocol layer, there are two things matching up:
Note that (1) is determined at the data model layer. Envoy initially makes a wildcard query for Listener and Cluster resources, and the returned set of Listener and Cluster resources tell it what RouteConfiguration and ClusterLoadAssignment resources to subscribe to.
At the transport protocol layer, during the interval between when the client subscribes to a resource and when it unsubscribes from that resource, the server must keep the client updated as to the current state of that resource. "Does not exist" is just one possible distinct value for the resource. If the resource is removed on the server, the server must inform the client that the resource does not exist. If the resource is later recreated on the server and the client is still subscribed to it, the server must then inform the client that the resource now has a new value. The server stops sending updates for the resource only once the client unsubscribes from it.
When a Listener resource goes away, that tells the client (at the data model layer) that it no longer needs to subscribe to the RouteConfiguration resource that was mentioned in the now-removed Listener resource. That is a change to (1), not a change to (2). However, because Envoy does not currently draw a distinction between those two things, it uses that same signal for both (1) and (2).
This behavior is not really correct, but it mostly works because of the fact that Envoy does not currently have a real xDS transport protocol layer that provides proper caching. However, as per discussion in #13009, Envoy will soon get such a separate xDS transport protocol layer (@adisuissa is starting to look at this now). When that happens, it will need to implement proper support for resource deletion at the transport protocol layer.
This spec is talking about the xDS transport protocol layer, and it is a spec about how the protocol is intended to work. It is not attempting to describe the data model layer. In other words, the deletion that we're talking about here is the deletion of the resource in that transport protocol-layer cache, not the deletion of the resource at the data model layer.
I believe that the original text of this document was already correct. I don't think we should change the spec based on Envoy's current implementation, which (a) isn't really correct and (b) is going to change in the not-too-distant future.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most of what you said
It is not clear what client should do with it? Basically I want to understand what is the behaviour as per xds spec when a resource is sent by server in removed_resources when there is a reference to it in dependent resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client sees (e.g.) a Listener resource that points to a non-existent RouteConfiguration resource, I think that's fundamentally the same type of problem as a case where the Listener resource contained invalid data in one of its fields. This is a problem to be handled at the data model layer, not at the transport protocol layer, because the transport protocol layer does not know anything about the contents of the resources or the relationships between the resources; it treats them simply as opaque blobs, each of which is completely independent of any other.
In practice, I think there are a couple of basic choices for how to handle problems at the data model layer. The most obvious option is to handle it exactly the same way as if the requested resource never existed in the first place. (It's worth noting that at the transport protocol layer, if the client first starts up and requests a resource that does not exist, the server should report it as being deleted, exactly the same way that it would if the resource was deleted later, after having already been sent to the client.) For example, what would happen if Envoy first started up and was given a Listener resource that pointed to a RouteConfiguration resource that did not exist? I suspect it would fail traffic at L7 in that case, but maybe it could fail to warm the listener or something.
The client does have another option in the case where it had already gotten a copy of the resource from before it was deleted: it can choose to just keep running with the copy it already had, essentially ignoring the deletion. This is probably safer to avoid an outage, but if the client does that, it should probably also provide some signal that it's doing so, so that humans aren't blind to the fact that the client would stop working if it was restarted at that point. (I plan to work on a design for that in the near future as part of #11396.)
In any case, though, this document is a spec for the transport protocol layer; it does not define anything about the data model layer (e.g., it does not say anything about what fields need to be populated with what values in any given resource type), so I don't think it's in scope for this document to define the behavior in cases where there is a config problem at that layer. You may argue (and I think you would be correct to do so :) ) that we should also document the error handling behavior at the data model layer, but I think that would need to be a separate document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks for the explanation. I think as it stands today this is not just transport protocol spec though - it explains few cases on error handling or how to do certain things like when to send an update etc. But I get your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me at specific things you see in the doc that are not about the transport protocol? I think that shouldn't be the case.
Note that when to send an update is actually a function of the transport protocol layer, not the data model layer.