clarify xds docs on resource deletion#20523
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
docs/root/api-docs/xds_protocol.rst
Outdated
| In the incremental protocol variants, the server signals the client that a resource should be | ||
| deleted via the :ref:`removed_resources <envoy_v3_api_field_service.discovery.v3.DeltaDiscoveryResponse.removed_resources>` | ||
| field of the response. This tells the client to remove the resource from its local cache. | ||
| field of the response for wildcard resource types like |
There was a problem hiding this comment.
How about add a section named "wildcard resource" and remove like here?
There was a problem hiding this comment.
added "wildcard resource" paragraph
|
Also fix CI please. |
|
docs are rendered here https://storage.googleapis.com/envoy-pr/20523/docs/index.html |
phlax
left a comment
There was a problem hiding this comment.
@ramaraochavali small nit with the headers - other than that lgtm from docs pov
probably would be good to get it checked over by someone who knows xds better than me
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@phlax thank you. fixed. PTAL |
phlax
left a comment
There was a problem hiding this comment.
lgtm from docs pov - thanks @ramaraochavali
cc @lizan for api review
|
|
||
| Other Resource Types | ||
| """""""""""""""""""" | ||
| In both SotW and incremental protocol variants, deletions are indicated implicitly by parent resources being |
There was a problem hiding this comment.
This is incorrect. The incremental protocol variant does explicitly indicate resource removal for all resource types, not just for wildcard resource types.
Please fix this in a subsequent PR.
There was a problem hiding this comment.
The incremental protocol variant does explicitly indicate resource removal for all resource types, not just for wildcard resource types.
I think implementation for delta even though removed_resources are specified, would cleanup resources only on parent deletion for non wildcard types like RDS and EDS
cc: @howardjohn
There was a problem hiding this comment.
Yeah that is my understanding of how Envoy EDS works.
There was a problem hiding this comment.
Example RDS:
There was a problem hiding this comment.
I understand that the current Envoy implementation works that way, but I think that's because the current implementation does not correctly separate the xDS data model layer from the the xDS transport protocol layer.
The xDS transport protocol layer is just a generic pub/sub mechanism where clients can subscribe to arbitrary resources by name and resource type. The transport protocol layer does not know or care about the contents of those resources; it treats them as opaque blobs that are passed up to the data model layer. The data model layer is where the contents of the resources are actually understood -- e.g., when a Listener resource is provided to the data model layer, the data model layer knows what all of the fields in the proto mean and how to use them.
At the xDS transport protocol layer, there are two things matching up:
- The set of resources that the client is subscribed to. This is controlled by the client, not the server.
- The current value of each resource. This is controlled by the server, not the client.
Note that (1) is determined at the data model layer. Envoy initially makes a wildcard query for Listener and Cluster resources, and the returned set of Listener and Cluster resources tell it what RouteConfiguration and ClusterLoadAssignment resources to subscribe to.
At the transport protocol layer, during the interval between when the client subscribes to a resource and when it unsubscribes from that resource, the server must keep the client updated as to the current state of that resource. "Does not exist" is just one possible distinct value for the resource. If the resource is removed on the server, the server must inform the client that the resource does not exist. If the resource is later recreated on the server and the client is still subscribed to it, the server must then inform the client that the resource now has a new value. The server stops sending updates for the resource only once the client unsubscribes from it.
When a Listener resource goes away, that tells the client (at the data model layer) that it no longer needs to subscribe to the RouteConfiguration resource that was mentioned in the now-removed Listener resource. That is a change to (1), not a change to (2). However, because Envoy does not currently draw a distinction between those two things, it uses that same signal for both (1) and (2).
This behavior is not really correct, but it mostly works because of the fact that Envoy does not currently have a real xDS transport protocol layer that provides proper caching. However, as per discussion in #13009, Envoy will soon get such a separate xDS transport protocol layer (@adisuissa is starting to look at this now). When that happens, it will need to implement proper support for resource deletion at the transport protocol layer.
This spec is talking about the xDS transport protocol layer, and it is a spec about how the protocol is intended to work. It is not attempting to describe the data model layer. In other words, the deletion that we're talking about here is the deletion of the resource in that transport protocol-layer cache, not the deletion of the resource at the data model layer.
I believe that the original text of this document was already correct. I don't think we should change the spec based on Envoy's current implementation, which (a) isn't really correct and (b) is going to change in the not-too-distant future.
There was a problem hiding this comment.
I agree with most of what you said
"Does not exist" is just one possible distinct value for the resource. If the resource is removed on the server, the server must inform the client that the resource does not exist.
It is not clear what client should do with it? Basically I want to understand what is the behaviour as per xds spec when a resource is sent by server in removed_resources when there is a reference to it in dependent resource?
There was a problem hiding this comment.
If the client sees (e.g.) a Listener resource that points to a non-existent RouteConfiguration resource, I think that's fundamentally the same type of problem as a case where the Listener resource contained invalid data in one of its fields. This is a problem to be handled at the data model layer, not at the transport protocol layer, because the transport protocol layer does not know anything about the contents of the resources or the relationships between the resources; it treats them simply as opaque blobs, each of which is completely independent of any other.
In practice, I think there are a couple of basic choices for how to handle problems at the data model layer. The most obvious option is to handle it exactly the same way as if the requested resource never existed in the first place. (It's worth noting that at the transport protocol layer, if the client first starts up and requests a resource that does not exist, the server should report it as being deleted, exactly the same way that it would if the resource was deleted later, after having already been sent to the client.) For example, what would happen if Envoy first started up and was given a Listener resource that pointed to a RouteConfiguration resource that did not exist? I suspect it would fail traffic at L7 in that case, but maybe it could fail to warm the listener or something.
The client does have another option in the case where it had already gotten a copy of the resource from before it was deleted: it can choose to just keep running with the copy it already had, essentially ignoring the deletion. This is probably safer to avoid an outage, but if the client does that, it should probably also provide some signal that it's doing so, so that humans aren't blind to the fact that the client would stop working if it was restarted at that point. (I plan to work on a design for that in the near future as part of #11396.)
In any case, though, this document is a spec for the transport protocol layer; it does not define anything about the data model layer (e.g., it does not say anything about what fields need to be populated with what values in any given resource type), so I don't think it's in scope for this document to define the behavior in cases where there is a config problem at that layer. You may argue (and I think you would be correct to do so :) ) that we should also document the error handling behavior at the data model layer, but I think that would need to be a separate document.
There was a problem hiding this comment.
Ok. Thanks for the explanation. I think as it stands today this is not just transport protocol spec though - it explains few cases on error handling or how to do certain things like when to send an update etc. But I get your point.
There was a problem hiding this comment.
Can you point me at specific things you see in the doc that are not about the transport protocol? I think that shouldn't be the case.
Note that when to send an update is actually a function of the transport protocol layer, not the data model layer.
|
I think this PR should probably be reverted. It seems that the whole purpose of the PR was to change the doc to say something that is not accurate. |
|
I've created #20568 to revert this. |
This reverts commit 642a7b6. Signed-off-by: Mark D. Roth <roth@google.com>
Resource deletion for non wildcard resource types like eds and rds will only happen on parent resource deletion for both Sotw and Incremental variants. This PR clarifies it in doc. Risk Level: N/A Testing: N/A Docs Changes: N/A Release Notes:N/A Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
…voyproxy#20568) This reverts commit 642a7b6. Signed-off-by: Mark D. Roth <roth@google.com>
|
@markdroth we are observing that delta RDS resource removal is getting ignored as stated above
and finally getting removed after the delay. I am not sure if it is getting removed due to the removal of parent resource. I am verifying the configuration using the envoy admin interface. |
|
You cannot remove it. Its probably getting removed at the "delay" of listener deletion (which has a drain period, so its not instant), which cleans up the last references |

Commit Message: clarify xds docs on resource deletion
Additional Description: Resource deletion for non wildcard resource types like eds and rds will only happen on parent resource deletion for both Sotw and Incremental variants. This PR clarifies it in doc.
Risk Level: N/A
Testing: N/A
Docs Changes: N/A
Release Notes:N/A
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]