-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2433 Topology Aware Hints: Adding SameZone heuristic and other tweaks #3765
KEP-2433 Topology Aware Hints: Adding SameZone heuristic and other tweaks #3765
Conversation
b4ebe2e
to
89672c7
Compare
/cc @aojea |
The simplest of all heuristics, this will simply copy the zone of an endpoint to | ||
the hint of the same endpoint. If there are not endpoints available within a zone, | ||
that will mean no endpoints will have hints for that zone. As described above, | ||
when Kube-Proxy cannot find any endpoints with hints for its zone, it will route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implies prescribing changes on the proxies implementations ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, proxies would have to remove the check for the annotation, but otherwise, I don't think this would require any additional changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I understand from the kube-proxy section above is that if there is a service that has any endpoint (regardless of zone) without a hint - kube proxy on all zones will ignore hints for this service - if any endpoints for a Service do not have a hint, kube-proxy will ignore all hints
While the SameZone section basically say that when Kube-Proxy cannot find any endpoints with hints for its zone it will ignore the hints and will route the traffic evenly. (and kube-proxies on other zones will not ignore the hints?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While proxies would need to change, it's probably a good, safe-fallback to assume "there are no endpoints for my zone, I doubt you meant there to be no endpoints at all". I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just configure the proxy with the 3 different behaviors for hints
- Strict: only do what hints says
- Loose (default): absent of hints does not mean drop, do your best
- None: what are hints? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent most of the afternoon thinking about this and discussing with Rob. My conclusion is that this is the wrong path, and I'll capture some of why I think that.
service.kubernetes.io/topology-aware-routing
serves 2 purposes - to enable topology and to choose a heuristic. Today we only have 1 heuristic (well, 2 if you consider "off"). This is basically a wormhole to the implementation telling it to populate the hints fields based on the results of that heuristic (or not at all), and the proxy will consume the hints.
We called hints "hints" because we knew that some implementations might be smart enough to do better than us. We haven't exactly seen that yet, but if you consider service meshes, maybe we have. In effect "hints" is wink-and-nod agreement between EndpointSlice controller and kube-proxy (and others which follow suit). If an implementation ignored "hints", I don't think they would be invalid.
What we're trying to describe with SameZone or SameNode is stronger than that. I think that ignoring this request would probably be wrong.
Yes, we can implement SameZone as described here, but SameNode is more complicated to describe in hints (I think - I didn't YAML it all out).
If we want something stronger, we don't want an annotation, but a field. And I strongly suspect we end up needing to distinguish internal from external traffic. Exploring what values a field would have it feels a whole lot like an extended iTP / eTP to me.
I'll revisit this and #3293 more tomorrow, but I wanted to get some thoughts out.
Each heuristic (`Auto` or `SameZone`) will have different logic to determine how | ||
hints should be allocated. These will be described below: | ||
|
||
### Auto Heuristic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise: "auto" isn't a heuristic - it's a meta-heuristic. It means "choose for me". We didn't give the existing heuristic a distinct name because we didn't really have anything to compare it to.
If we were to go this path, we would probably want to give it a name like "CoreProportionalStatic" or something
The simplest of all heuristics, this will simply copy the zone of an endpoint to | ||
the hint of the same endpoint. If there are not endpoints available within a zone, | ||
that will mean no endpoints will have hints for that zone. As described above, | ||
when Kube-Proxy cannot find any endpoints with hints for its zone, it will route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While proxies would need to change, it's probably a good, safe-fallback to assume "there are no endpoints for my zone, I doubt you meant there to be no endpoints at all". I think?
89672c7
to
a5eb235
Compare
a5eb235
to
4d6c1c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you're ready for a re-review, but I did it anyway.
I think it's useful to be clear that hints are optional, and implementations are not required to respect them. If we add a "WaterfallStrict" mode which means "same node if possible, else same zone, else same region, else cluster" (whether it uses hints or not), we can satisfy the other conversations around inefficacy of the default mode. If it turns out we still need more, we can look at xTP.
It took me a minute to get what you meant about "hints" and some heuristics might not use them. A strict waterfall mode can probably be done proxy-side, but I want to be careful that we a) don't assert that other proxies must implement it the same way (or at all); and b) don't preclude evolution if/when we have dynamic feedback.
available for each zone based on the proportion of CPU cores in each zone. If | ||
it is not possible to determine the number CPU cores, 1 core per node will be | ||
assumed for calculations. | ||
In the future, additional heuristics may be added. Until that point, "Auto" will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User should be able to say Auto or whatever-we-name-the-current thing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it would be safer to leave this unchanged until we have more than one heuristic available. Adding a second heuristic may make naming decisions a bit clearer.
Rob, is this going for 1.27? Or is it too much too late? |
@thockin I'd like to try to squeeze this in if possible. I think the proposed user-facing changes have become fairly small - a new annotation name and maybe a new annotation value. I think the bigger value for this cycle would be shifting how we think about hints into something more generic with room for more than one heuristic. |
65f55d1
to
568369a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
568369a
to
1b4fccd
Compare
Last push was just to clean up PRR YAML file. This is a bit of a backwards PRR approval in that we were previously approved to go to stable, but now that the KEP has changed a bit and we're targeting 1.28 for GA, I've removed the stable approval and now need PRR approval just for that backwards transition. |
/approve It's just a paperwork issue so I'll approve the PR, but keep @wojtek-t for future PRR approvals. |
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/controller/endpointslice/reconciler_test.go#L1641-L1907) | ||
* Hints field is dropped when feature gate is off. [(Strategy Unit | ||
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/registry/discovery/endpointslice/strategy_test.go) | ||
* TODO before GA: Test coverage in EndpointSlice controller for the transition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott - FWIW - it should have happened for beta, as this is when most users generally enable it (beta on by default).
Please keep that in mind for the future (and I learnt that over time and now kind-of require it as beta criteria in many cases).
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnbelamaric, robscott, thockin, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
One-line PR description: This adds a SameZone heuristic and other related tweaks
Issue link: Topology Aware Routing #2433
Other comments:
I'm not sure about this approach, opening this PR to get some initial feedback. It seems like there's broad interest in providing additional topology heuristics, and one of the most commonly requested is simply to route to endpoints in the same zone if they exist. Despite the risks for overload that entails, it could be quite effective if pods were evenly distributed with scheduling.
Although I still think adding hints for something like this where they are guaranteed to be duplicative of the zone field is not ideal, it is consistent with the idea that we'd want to allow additional heuristics to be exposed this way in the future. If that's the case, we probably don't need to special-case this specific heuristic as originally explored by #3293. Based on yesterday's sig-network meeting, it sounds like that other KEP isn't as clearly needed as we originally thought, so this may be a more practical solution.
If we proceed with this approach, we could either separate this new heuristic into a new feature gate, or delay the GA of topology hints by at least one more release. At this point, I think it may be most practical to delay GA until ~1.28.
/sig network
/cc @andrewsykim @bowei @dcbw @danwinship @LiorLieberman
/assign @thockin