-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2433: add new heuristic to topology routing #4003
Conversation
/assign @robscott @thockin |
This seems reasonable but I don't think it solves the problem as stated by several users, which I think distills into "same node if possible, otherwise same zone if possible, otherwise same region if possible, otherwise random". I'm not against adding the heuristic proposed here, but only if we think it is solving for someone's use-case. Is it? |
Agree with @thockin here. I chatted with @aojea about this PR some this afternoon, will add a high level summary here:
|
Fair enough, for hints I still think we need to replace the existing complex heuristic or add a more naive one and easier to think about. For prefer zone then we need a new KEP to reserve the annotation value and add criteria for graduation (at least 2 proxies implement it, or something like that) WDYT? |
I do think a new KEP makes sense. A few things to cover:
1) This is a heuristic which depends on the proxy implementation choosing
to support it. We can fix kube-proxy, for sure. We can try to add support
to other implementations, but we don't control those.
2) I think we should explicitly design it to consider region - it shouldn't
be significantly more complicated and we know some people DO have
multi-regional clusters (or use zone to mean rack and region to mean what
we usually call zone)
3) We should discuss same-node - I kind of feel like leaving it out will
just land us back in this discussion, but I am open to debate it :)
…On Tue, May 16, 2023 at 4:12 AM Antonio Ojea ***@***.***> wrote:
Fair enough, for hints I still think we need to replace the existing
complex heuristic or add a more naive one and easier to think about.
For prefer zone then we need a new KEP to reserve the annotation value and
add criteria for graduation (at least 2 proxies implement it, or something
like that)
WDYT?
—
Reply to this email directly, view it on GitHub
<#4003 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVBHSFGXO7HVN4PJ23DXGNOH7ANCNFSM6AAAAAAYCFVSVI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
zone-a: 2 endpoints | ||
zone-b: 1 endpoint | ||
zone-c: 3 endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I have 4 Endpoints distributed such as:
zone-a: 1 endpoints
zone-b: 1 endpoint
zone-c: 2 endpoints
?
Will the hints be removed again as it is not possible to achieve a balanced distribution of endpoints across the zones?
We are looking into a deterministic way to have the hints set. We already use topologySpreadConstraints and nodeAffinity to spread the endpoints evenly across the zones. The endpoints will be spread evenly when the number of endpoints % number of zones = 0
. But when the number of endpoints % number of zones != 0
then there will be some zones with more replicas than the other. Or, in case of Deployment/StatefuleSet rollout, then the even distribution can be also violated.
Currently we have a mutating webhook that mutates the EndpointSlice's hints such as: it always sets the endpoint's hints to the endpoint's zone.
Shouldn't we consider adding this strategy as well? Or can you elaborate the newly added heuristic whether it is close to it? Because I am confused and cannot judge about it.
Assumptions for this strategy:
- You already have guaranteed appr. balanced distribution of endpoints across zones via scheduling means (topologySpreadConstraint, nodeAffinity)
- Each endpoint has equivalent capacity.
What the strategy does is to maintain the endpoint's zone as hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have a mutating webhook that mutates the EndpointSlice's hints such as: it always sets the endpoint's hints to the endpoint's zone.
sorry for the late response, I was suppose to close this PR to send a new one with the simple heuristic you mention, to copy the zone to the hint for the reasons that Rob mentions #4003 (comment) (specially 2.) but then we need to find a good solution to be completely sure we can handle Tim's comments #4003 (comment)
@ialidzhikov one curiosity, how is copying over directly the zone to hints working for you? is that something that is completely solving your problems?
cc: @robscott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ialidzhikov one curiosity, how is copying over directly the zone to hints working for you? is that something that is completely solving your problems?
Some details for our setup are revealed in kubernetes/kubernetes#110714 (comment). For example, one of our use-case is to use topology-aware routing for the communication between the kube-apiserver and etcd. We run etcd in 3 zones, 1 replica per zone. The kube-apiserver replicas are also spread in similar way but the kube-apiserver replicas may vary between 3/4 replicas. The idea is that each kube-apiserver talks to the etcd Pod in its zone.
We also use topology-aware routing for the webhook communication. When kube-apiserver needs to talk to webhook deployed in the same cluster that the kube-apiserver runs. So, again, kube-apiserver talks to the webhook endpoint that is located in its zone (if there is such, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that I understand the PreferZone
heuristic. It will always maintain the endpoint's zone as its hint, right?
@thockin @robscott @ialidzhikov updated the kep to include PreferZone and keep it in beta during 1.28 , so we can go GA in 1.29 if feedback is correct, PTAL |
Add a new simple heuristic to minimize traffic cost at the expense of higher risk of traffic imbalance and endpoints overload Signed-off-by: Antonio Ojea <[email protected]>
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
KEP-2433: add new heuristic to topology routing
This simplify the existing heuristic that is based on CPUs cores because is difficult to get it working on deployments where the law of large numbers can not help with the statistics.
Add a new heuristic to only PreferZone, ie. traffic will be directed to the endpoints in the same zone if exist, or fall back to cluster wide routing if there are no endpoints in the zone
As a side benefit of this new heuristic, we can abstract the topology using an interface that will help with the KEP #3685 to stage the endpointslice controller.