config: fix delta xDS's use of (un)subscribe fields, more explicit protocol spec#6545
config: fix delta xDS's use of (un)subscribe fields, more explicit protocol spec#6545htuch merged 20 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
|
||
| grpc_stream_.establishNewStream(); | ||
| subscribe(resources); | ||
| // The attempt stat here is maintained for the purposes of having consistency between ADS and |
There was a problem hiding this comment.
It looks like this is referring to the same thing as the TODOs I added. Just how meaningless is update_attempt for ADS? Can we just do nothing with it and let it sit at 0?
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
|
||
| for (const auto& added : diff.added_) { | ||
| setResourceWaitingForServer(added); | ||
| void updateResources(std::vector<std::string> resource_names) override { |
There was a problem hiding this comment.
I'm not sure what the calling code looks like, but would it be possible to have this parameter be a set instead? That way you'd avoid having to pass by value and sorting
There was a problem hiding this comment.
Definitely makes sense, and it doesn't look like any of the implementors would run into problems with this change. However, it is a pretty far reaching change, and there appears to be some tests I'll need to fix. Working on it.
There was a problem hiding this comment.
Done. This uncovered an issue with how the old non-delta gRPC xDS implementation uses its "watches". Previously, when updating your interest from resources 0 and 1 to just 2, you would first send a request for 0,1,2, and then send one for just 2. With my change, you'll now send a request for nothing, and then just 2. Well, I don't know, maybe this is too much churn; maybe the existing logic makes sense. Fortunately, this will all be replaced in the near future with logic that will have the correct behavior (just send one request, for 2).
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks for thinking about these corner cases. It got me thinking about whether we can try make delta discovery requests/responses a bit easier to reason about.
source/common/common/utility.h
Outdated
| * @param delimiter supplies the delimiter to join them together. | ||
| * @return string combining elements of `source` with `delimiter` in between each element. | ||
| */ | ||
| static std::string join(const std::set<std::string>& source, const std::string& delimiter); |
There was a problem hiding this comment.
Can we maybe just have a templatized join that works on iterators? That could replace the one for vector as well.
There was a problem hiding this comment.
Yeah, I was wondering if there might be something like that, but.... seemed like a lot of trouble! This duplicated join() broke some other tests I didn't notice, so I have already addressed that by just inlining the single use of join(). Does that sound ok?
There was a problem hiding this comment.
It's OK, but it's probably only a couple of lines to make it a template with iterators :)
| stats_.update_attempt_.inc(); | ||
| // Tell the server about our new interests (but only if there are any). | ||
| if (!names_added_.empty() || !names_removed_.empty()) { | ||
| kickOffDiscoveryRequest(); |
There was a problem hiding this comment.
How does this affect ACKs for server response nonces?
This had me thinking; for incremental xDS, do we want to consider splitting out ACK/NACK from the DeltaDiscoveryRequest? It made sense in state-of-world, because it was much simpler, but now we have different concerns, so a top-level oneof object might make more sense. You could end up sending two messages, an ack and then the next request.
Do you think ^^ would be easier to reason about?
There was a problem hiding this comment.
By "this" I'm guessing you mean the if condition here? There are a few paths that kickOffDiscoveryRequest(); this is not the one that would ever be doing it in the ACK scenario.
I think that's a very good point. IIRC there's some language like "the nonce field should only be set when ACKing a response", meaning there is somewhat of a de facto oneof already going on. If we do this, there is another bit of perceptual cleanup that would make sense, and that is dumping the Request/Response nomenclature. IMO it is usually not appropriate for bidi streaming, and this is definitely one such case.
One thing: I think the clarity benefits would be almost entirely at the "reading the .proto and docs / understanding the protocol" level (maybe the server implementations would also benefit; no idea). I think Envoy's implementation wouldn't change structure much; you would have one .cc file dealing with both oneof members in relatively close proximity. (Actually, maybe this would be more useful than I can understand, since I already fully "get" when the code is using a DiscoveryRequest as an ACK vs a request).
There was a problem hiding this comment.
Sorry, I wasn't precise; I meant by "this" the idea of coalescing multiple Envoy DeltaDiscoveryRequests, each of which would have been sent in reply to some unique DeltaDiscoveryResponse with its own nonce. We will lose ACK/NACK as a result potentially, or maybe there is some detail that avoids this that I missed.
By splitting out the ACK/NACK as a distinct message and avoiding collapsing it into the request, when we unpause we could issue a series of ACK/NACKs with the relevant nonces, and then the new delta request.
There was a problem hiding this comment.
IIIIII see. Yes, that's definitely right; I will fix this PR accordingly. When you put it that way, I realize I'm pretty sure that xDS, including SotW, has been broken all along! (Due to how happy it is to drop things from the queue, and the fact that the nonce is just directly copied into request_, rather than queued up).
I think this would ideally be cleanest as a repeated field, rather than a series of messages. It would be very easy to implement here in Envoy; not sure how much work it would be in the server. Because it's so easy, there's no need to do it now "while we're already here" or anything like that, though.
There was a problem hiding this comment.
I'm not sure if SotW is broken, since the last DiscoveryResponse is always authoritative, replacing all earlier ones, so it's safe to just have the latest DiscoveryRequest in that situation.
There was a problem hiding this comment.
Oh, because when you're always sending everything, ACKs are automatically "cumulative". Thanks, makes sense!
There was a problem hiding this comment.
Did you decide to punt on making ACK/NACKs a distinct message?
There was a problem hiding this comment.
Hmm, I think yes, punt. My big aggregated delta implementation is already working without that, and I'd rather not go back and add it in. Plus, I think it will be most clear whether we want it / how best to do it once that aggregated delta stuff is submitted, and we're looking at applying the prospective changes on top of it.
Oh, but, with this in place, what I said about "I think I could easily prevent those [no-ops] from being sent, but I was worried that that would be needlessly cute/complex" would probably be much cleaner to implement. So I bet we will want it.
There was a problem hiding this comment.
Ugh, I wrote a whole big response and then github threw it away. Basically, yes. It will be easier to see how we want to implement it once we can start freshly on top of the upcoming aggregated delta implementation. I think it will help make it cleaner to avoid sending the no-op messages, which is nice.
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
Ready, PTAL. Note the "no-op" requests that now have to be expected in the unit tests. I think I could easily prevent those from being sent, but I was worried that that would be needlessly cute/complex. |
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks, looks good now modulo a few comments. This will also be much nicer on the wire for SoW.
/wait
| // add-then-remove case *also* causes the resource to be referred to in the request (as an | ||
| // unsubscribe). | ||
| // Unlike the remove-then-add case, this one really is unnecessary, and ideally we would have | ||
| // the request simply not include any mention of the resource. Oh well. |
There was a problem hiding this comment.
I don't think the code complexity would be worth it, like even if the labor to make the change was free. It would only be avoiding an extraneous-but-not-harmful message in a corner case.
There was a problem hiding this comment.
I think it's not worth the code complexity, like even if the labor was free. The benefit would just be a small efficiency gain in a pretty corner-y corner case.
| stats_.update_attempt_.inc(); | ||
| // Tell the server about our new interests (but only if there are any). | ||
| if (!names_added_.empty() || !names_removed_.empty()) { | ||
| kickOffDiscoveryRequest(); |
There was a problem hiding this comment.
Did you decide to punt on making ACK/NACKs a distinct message?
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
PTAL. BTW it is on my todo list to split this and the others out into .cc's now that they're detemplatized... although maybe not this one, since it's going to be outright replaced by the aggregated PR. |
* master: thread: remove ThreadFactorySingleton (envoyproxy#6658) router: support offseting downstream provided grpc timeout (envoyproxy#6628) router: defer per try timeout until downstream request is done (envoyproxy#6643) update bazel readme for clang-format-8 on mac (envoyproxy#6660) Implement some TODOs in quic_endian_impl.h (envoyproxy#6644) docs: add aspell to mac dependencies to fix check format script (envoyproxy#6661) config: fix delta xDS's use of (un)subscribe fields, more explicit protocol spec (envoyproxy#6545) Signed-off-by: Michael Puncel <mpuncel@squareup.com>
I realized that, with the unreliable queue implementation copied from SotW xDS, delta xDS could get into a state where Envoy thinks it has subscribed, but the server hasn't heard the subscription, with no way for either to realize the mistake. I fixed that by converting the queue setup to a cleaner "do I currently want to send a request?" with the request's (un)subscriptions only populated immediately before the request is actually sent into gRPC.
While doing that, I further realized there was a problem when a given resource was subscribed then unsubscribed (or reversed), all in between request sends. I made sure Envoy handles that sensibly, and added explicit requirements to the xDS protocol spec to ensure servers will also handle it sensibly.
Added unit tests for those fixes.
Risk Level: low
Testing: added unit tests for bugs uncovered
#4991