Incremental XDS proposal.#3470
Incremental XDS proposal.#3470mattklein123 merged 11 commits intoenvoyproxy:masterfrom nt:incremental-xds
Conversation
zuercher
left a comment
There was a problem hiding this comment.
Looks good. One minor comment and some copy editing. (I'm happy to take a closer copy editing pass later, if you'd like.)
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
typos: known and corresponding. CDS should be XDS, I think.
There was a problem hiding this comment.
I think the type/field name are also swapped here.
There was a problem hiding this comment.
Re "I think the type/field name are also swapped here.": I'm not sure what you mean here. My intention is to have keys be resource names and values be their version.
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
Is the intention here to try to keep implementations from relying on Envoy setting this value in requests?
I think this documentation should just describe the client behavior (e.g., the client returns the value it saw in the last ACKed response) and leave off the "debugging purposes only" statements.
There was a problem hiding this comment.
See my comment below, I don't think it is needed.
There was a problem hiding this comment.
I updated the comment to match having it in responses only.
|
Please merge master to make sure you pick up the formatter fixes. |
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
Do we need to supply this at all? I realize this is what we do in v1, but what is the function in v2?
There was a problem hiding this comment.
IMO this is still useful even in the incremental case to cover the add/remove case. What version are we at if we have no resources? Obviously a deployment can leave this empty, but I would suggest we don't remove the possibility of populating it.
There was a problem hiding this comment.
Oh right, this is the request. Yes probably don't need that. We should leave in the response though...
There was a problem hiding this comment.
Done, kept it in the response.
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
See my comment below, I don't think it is needed.
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
When we get agreement on the basic protos and sequencing, we should add sequence diagrams and a section to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR.
api/envoy/api/v2/discovery.proto
Outdated
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
Nit: be consistent with whether it is "incremental xDS" or "Incremental xDS".
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
How does this work in an incremental sense? If an incremental request has the empty list does that then become "subscribe all"?
There was a problem hiding this comment.
I removed that comment and made the field resource_name_subscribe.
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
I think the type/field name are also swapped here.
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
Is there a better name than "system version"? It's more like "upstream config pipeline provenance version", which is a bit of a mouthful..
There was a problem hiding this comment.
I'm open to suggestions, if we can move away form the version keyword that might help avoid confusion. Something like short_description maybe?
|
@nt can you please merge master, fix DCO, and look at CI issues? |
api/envoy/api/v2/discovery.proto
Outdated
There was a problem hiding this comment.
Can you describe the nonce value in this case here?Should it match the last nonce we received from the server? If not, how do we avoid it being conflated with ACK?
There was a problem hiding this comment.
I updated the PR. The nonce should only be included when the Requests is an ACK / NACK. otherwise it is omitted.
api/envoy/api/v2/discovery.proto
Outdated
api/envoy/api/v2/discovery.proto
Outdated
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
|
re: @mattklein123 I rebased on master and fixed DCO. PTAL. |
mattklein123
left a comment
There was a problem hiding this comment.
Overall looks great. Just some docs nits and more comment asks. @htuch can you also take a look?
api/envoy/api/v2/discovery.proto
Outdated
| // 2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. | ||
| // In this case the response_nonce is set to the nonce value in the Response. | ||
| // ACK or NACK is determined by the absence or presence of error_detail. | ||
| // In the case of a ACK the system_version_info must match the value in the |
There was a problem hiding this comment.
I think this is out of date, right? We no longer send system_version_info?
api/envoy/api/v2/discovery.proto
Outdated
| repeated string resource_names_unsubscribe = 4; | ||
|
|
||
| // This map must be populated when the IncrementalDiscoveryRequest is the | ||
| // first in a stream. The keys are know resources names to the xDS client and |
There was a problem hiding this comment.
Typo "keys are know" / a bit confusing in general. Clarify?
There was a problem hiding this comment.
I rewrote this, let me know if it make better sense now.
| // required for ADS. | ||
| string type_url = 2; | ||
|
|
||
| // A list of Resource names to add to the list of tracked resources. |
There was a problem hiding this comment.
Can we clarify here when this should be empty? For example, on an initial request for CDS when Envoy wants to be told about all relevant clusters by the management server and isn't doing any Envoy side dynamic loading stuff? I guess in general, given that we send this message in multiple different situations, it would be useful for each field to have more detail on when it is and is not set. I think it would make the documentation a lot better.
There was a problem hiding this comment.
I added a paragraph to clarify the use of these two fields. Let me know if it makes sense to you.
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
This looks great. Two more comments for discussion but I think in general LGTM.
| // All resource names in the resource_names_subscribe list are added to the | ||
| // set of tracked resources and all resource names in the resource_names_unsubscribe | ||
| // list are removed from the set of tracked resources. | ||
| // Unlike in non incremental xDS, an empty resource_names_subscribe or |
There was a problem hiding this comment.
I think we should still describe here this is effectively a hint, right? Meaning, the management server is still allowed to send additional resources? (Though it must send the requested ones) I think it's worth clarifying this as it's similar to non-incremental in this regard?
api/envoy/api/v2/discovery.proto
Outdated
| // first in a stream. The keys are the resources names of the xDS resources | ||
| // known to the xDS client. The values in the map are the associated resource | ||
| // level version info. | ||
| map<string, string> resource_versions = 5; |
There was a problem hiding this comment.
WDYT about calling this initial_local_resource_versions just so the intent is more clear? I don't feel strongly about this. Open to other names too.
There was a problem hiding this comment.
I think that is a great idea. Renamed it initial_resource_versions. I think the "local" part should be obvious.
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, nice work! Will defer to @htuch for further review.
htuch
left a comment
There was a problem hiding this comment.
Looks good, but would like to get the protocol spelled out in detail in XDS_PROTOCOL.md.
| // ACK or NACK is determined by the absence or presence of error_detail. | ||
| // 3. Spontaneous IncrementalDiscoveryRequest from the client. | ||
| // This can be done to dynamically add or remove elements from the tracked | ||
| // resource_names set. In this case response_nonce must be omitted. |
There was a problem hiding this comment.
It would be good to convince and explain why this doesn't have the racy behavior that originally motivated the nonce. I.e. we need sequence diagrams considering races. Can we make the changes to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR?
There was a problem hiding this comment.
Done. Let me know what you think.
There was a problem hiding this comment.
Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.
Consider this example (M is management server, E is Envoy, X/Y/Z are resources):
- M->E (CDS): {X}
- E-> M (CDS): subscribe Y
- E->M (CDS): unsubscribe Y
- M->E (CDS): {X, Y, Z}
What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.
If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
api/XDS_PROTOCOL.md
Outdated
| all 100k clusters when a single cluster is modified, the management server | ||
| only needs to deliver the single cluster that changed. | ||
|
|
||
| An xDS Incremental session is always in the context of a gRPC bidirectional |
There was a problem hiding this comment.
Nit: s/Incremental/incremental/ here, incremental is an adjective not a proper noun.
api/XDS_PROTOCOL.md
Outdated
| connected to it. There is no REST version of Incremental xDS. | ||
|
|
||
| In Incremental xDS the nonce field is required and used to pair | ||
| IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK. |
There was a problem hiding this comment.
Can you add docs links (as done with DiscoveryRequest/Response) above? Also, use quote style for proto message names as done for these messages above.
api/XDS_PROTOCOL.md
Outdated
|
|
||
| IncrementalDiscoveryRequest can be sent in 3 situations: | ||
| 1. Initial message in a xDS bidirectional gRPC stream. | ||
| 2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. |
api/XDS_PROTOCOL.md
Outdated
| IncrementalDiscoveryRequest can be sent in 3 situations: | ||
| 1. Initial message in a xDS bidirectional gRPC stream. | ||
| 2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. | ||
| In this case the response_nonce is set to the nonce value in the Response. |
There was a problem hiding this comment.
Nit: use quote style for response_nonce and error_detail here.
api/XDS_PROTOCOL.md
Outdated
| This can be done to dynamically add or remove elements from the tracked | ||
| resource_names set. In this case response_nonce must be omitted. | ||
|
|
||
| In this first example the client connects as receive a first update that it ACKs. |
api/XDS_PROTOCOL.md
Outdated
|
|
||
| In this first example the client connects as receive a first update that it ACKs. | ||
| The second update fails and the client NACKs the update. Later the xDS client | ||
| spontaneously requests the "wc" resource. |
There was a problem hiding this comment.
I think the wc resource_list_subscribe arrow is the wrong way around in the diagram.
| // ACK or NACK is determined by the absence or presence of error_detail. | ||
| // 3. Spontaneous IncrementalDiscoveryRequest from the client. | ||
| // This can be done to dynamically add or remove elements from the tracked | ||
| // resource_names set. In this case response_nonce must be omitted. |
There was a problem hiding this comment.
Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.
Consider this example (M is management server, E is Envoy, X/Y/Z are resources):
- M->E (CDS): {X}
- E-> M (CDS): subscribe Y
- E->M (CDS): unsubscribe Y
- M->E (CDS): {X, Y, Z}
What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.
If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.
htuch
left a comment
There was a problem hiding this comment.
Thanks, LGTM modulo remaining comments.
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
|
For some reason I can not directly reply to your race comment about nonce-less spontaneous subscribes/unsubscribes:
I do not think this is a problem as these are "hints" to the XDS server. |
htuch
left a comment
There was a problem hiding this comment.
LGTM, this is great, modulo two nits. @mattklein123 any further thoughts before merging?
api/XDS_PROTOCOL.md
Outdated
| [`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) | ||
| to a [`IncrementalDiscoveryRequest`](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) | ||
| ACK or NACK. | ||
| Optionaly, a response message level system_version_info is present for |
api/XDS_PROTOCOL.md
Outdated
| connected to it. There is no REST version of Incremental xDS. | ||
|
|
||
| In incremental xDS the nonce field is required and used to pair a | ||
| [`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) |
There was a problem hiding this comment.
Additional bracket here - you can preview MD generation by clicking the View button on GH (pro tip).
|
fixed both nits. |
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, one question. Thank you!
api/XDS_PROTOCOL.md
Outdated
|
|
||
|  | ||
|
|
||
| The incremental xDS method may be added to CDS and RDS in the future. |
There was a problem hiding this comment.
Why wouldn't this just be supported out of the box? It's not clear to me how any of this is ADS specific?
There was a problem hiding this comment.
I thought adding it on a need-for basis made more sense to keep complexity down. We can add them now if you prefer. Let me know.
There was a problem hiding this comment.
IMO I would just add it now. Unless I'm misunderstanding the implementation, I think it will basically "just work." @htuch thoughts?
| // | ||
| // A list of Resource names to add to the list of tracked resources. | ||
| repeated string resource_names_subscribe = 3; | ||
| // A list of Resource names to remove from the list of tracked resources. |
There was a problem hiding this comment.
nit: newline before this line
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
|
@nt can you fix_format? Ready to merge when that's done. |
Signed-off-by: Nicolas Thiebaud <nicothieb@google.com>
This PR includes proposed additions to the XDS discovery protos as discussed in this design doc.
The implementation is not included and will be done in a follow up PRs.
The tacking bug id is #3229