Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2433 Topology Aware Hints: Adding SameZone heuristic and other tweaks #3765

Merged

Conversation

robscott
Copy link
Member

  • One-line PR description: This adds a SameZone heuristic and other related tweaks

  • Issue link: Topology Aware Routing #2433

  • Other comments:
    I'm not sure about this approach, opening this PR to get some initial feedback. It seems like there's broad interest in providing additional topology heuristics, and one of the most commonly requested is simply to route to endpoints in the same zone if they exist. Despite the risks for overload that entails, it could be quite effective if pods were evenly distributed with scheduling.

Although I still think adding hints for something like this where they are guaranteed to be duplicative of the zone field is not ideal, it is consistent with the idea that we'd want to allow additional heuristics to be exposed this way in the future. If that's the case, we probably don't need to special-case this specific heuristic as originally explored by #3293. Based on yesterday's sig-network meeting, it sounds like that other KEP isn't as clearly needed as we originally thought, so this may be a more practical solution.

If we proceed with this approach, we could either separate this new heuristic into a new feature gate, or delay the GA of topology hints by at least one more release. At this point, I think it may be most practical to delay GA until ~1.28.

/sig network
/cc @andrewsykim @bowei @dcbw @danwinship @LiorLieberman
/assign @thockin

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jan 20, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 20, 2023
@robscott robscott force-pushed the topology-hints-1-27-updates branch 2 times, most recently from b4ebe2e to 89672c7 Compare January 20, 2023 21:55
@robscott
Copy link
Member Author

/cc @aojea

@k8s-ci-robot k8s-ci-robot requested a review from aojea January 20, 2023 21:55
The simplest of all heuristics, this will simply copy the zone of an endpoint to
the hint of the same endpoint. If there are not endpoints available within a zone,
that will mean no endpoints will have hints for that zone. As described above,
when Kube-Proxy cannot find any endpoints with hints for its zone, it will route
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies prescribing changes on the proxies implementations ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, proxies would have to remove the check for the annotation, but otherwise, I don't think this would require any additional changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I understand from the kube-proxy section above is that if there is a service that has any endpoint (regardless of zone) without a hint - kube proxy on all zones will ignore hints for this service - if any endpoints for a Service do not have a hint, kube-proxy will ignore all hints

While the SameZone section basically say that when Kube-Proxy cannot find any endpoints with hints for its zone it will ignore the hints and will route the traffic evenly. (and kube-proxies on other zones will not ignore the hints?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While proxies would need to change, it's probably a good, safe-fallback to assume "there are no endpoints for my zone, I doubt you meant there to be no endpoints at all". I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just configure the proxy with the 3 different behaviors for hints

  1. Strict: only do what hints says
  2. Loose (default): absent of hints does not mean drop, do your best
  3. None: what are hints? :)

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent most of the afternoon thinking about this and discussing with Rob. My conclusion is that this is the wrong path, and I'll capture some of why I think that.

service.kubernetes.io/topology-aware-routing serves 2 purposes - to enable topology and to choose a heuristic. Today we only have 1 heuristic (well, 2 if you consider "off"). This is basically a wormhole to the implementation telling it to populate the hints fields based on the results of that heuristic (or not at all), and the proxy will consume the hints.

We called hints "hints" because we knew that some implementations might be smart enough to do better than us. We haven't exactly seen that yet, but if you consider service meshes, maybe we have. In effect "hints" is wink-and-nod agreement between EndpointSlice controller and kube-proxy (and others which follow suit). If an implementation ignored "hints", I don't think they would be invalid.

What we're trying to describe with SameZone or SameNode is stronger than that. I think that ignoring this request would probably be wrong.

Yes, we can implement SameZone as described here, but SameNode is more complicated to describe in hints (I think - I didn't YAML it all out).

If we want something stronger, we don't want an annotation, but a field. And I strongly suspect we end up needing to distinguish internal from external traffic. Exploring what values a field would have it feels a whole lot like an extended iTP / eTP to me.

I'll revisit this and #3293 more tomorrow, but I wanted to get some thoughts out.

Each heuristic (`Auto` or `SameZone`) will have different logic to determine how
hints should be allocated. These will be described below:

### Auto Heuristic
Copy link
Member

@thockin thockin Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise: "auto" isn't a heuristic - it's a meta-heuristic. It means "choose for me". We didn't give the existing heuristic a distinct name because we didn't really have anything to compare it to.

If we were to go this path, we would probably want to give it a name like "CoreProportionalStatic" or something

The simplest of all heuristics, this will simply copy the zone of an endpoint to
the hint of the same endpoint. If there are not endpoints available within a zone,
that will mean no endpoints will have hints for that zone. As described above,
when Kube-Proxy cannot find any endpoints with hints for its zone, it will route
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While proxies would need to change, it's probably a good, safe-fallback to assume "there are no endpoints for my zone, I doubt you meant there to be no endpoints at all". I think?

@robscott robscott force-pushed the topology-hints-1-27-updates branch from 89672c7 to a5eb235 Compare February 4, 2023 03:00
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 4, 2023
@robscott robscott force-pushed the topology-hints-1-27-updates branch from a5eb235 to 4d6c1c8 Compare February 4, 2023 03:02
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you're ready for a re-review, but I did it anyway.

I think it's useful to be clear that hints are optional, and implementations are not required to respect them. If we add a "WaterfallStrict" mode which means "same node if possible, else same zone, else same region, else cluster" (whether it uses hints or not), we can satisfy the other conversations around inefficacy of the default mode. If it turns out we still need more, we can look at xTP.

It took me a minute to get what you meant about "hints" and some heuristics might not use them. A strict waterfall mode can probably be done proxy-side, but I want to be careful that we a) don't assert that other proxies must implement it the same way (or at all); and b) don't preclude evolution if/when we have dynamic feedback.

keps/sig-network/2433-topology-aware-hints/README.md Outdated Show resolved Hide resolved
available for each zone based on the proportion of CPU cores in each zone. If
it is not possible to determine the number CPU cores, 1 core per node will be
assumed for calculations.
In the future, additional heuristics may be added. Until that point, "Auto" will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should be able to say Auto or whatever-we-name-the-current thing, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it would be safer to leave this unchanged until we have more than one heuristic available. Adding a second heuristic may make naming decisions a bit clearer.

@wojtek-t wojtek-t self-assigned this Feb 8, 2023
@thockin
Copy link
Member

thockin commented Feb 8, 2023

Rob, is this going for 1.27? Or is it too much too late?

@robscott
Copy link
Member Author

robscott commented Feb 8, 2023

@thockin I'd like to try to squeeze this in if possible. I think the proposed user-facing changes have become fairly small - a new annotation name and maybe a new annotation value. I think the bigger value for this cycle would be shifting how we think about hints into something more generic with room for more than one heuristic.

@robscott robscott force-pushed the topology-hints-1-27-updates branch 2 times, most recently from 65f55d1 to 568369a Compare February 8, 2023 23:51
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@robscott robscott force-pushed the topology-hints-1-27-updates branch from 568369a to 1b4fccd Compare February 9, 2023 00:00
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@robscott
Copy link
Member Author

robscott commented Feb 9, 2023

Last push was just to clean up PRR YAML file. This is a bit of a backwards PRR approval in that we were previously approved to go to stable, but now that the KEP has changed a bit and we're targeting 1.28 for GA, I've removed the stable approval and now need PRR approval just for that backwards transition.

/cc @johnbelamaric @wojtek-t

@johnbelamaric
Copy link
Member

/approve

It's just a paperwork issue so I'll approve the PR, but keep @wojtek-t for future PRR approvals.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2023
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/controller/endpointslice/reconciler_test.go#L1641-L1907)
* Hints field is dropped when feature gate is off. [(Strategy Unit
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/registry/discovery/endpointslice/strategy_test.go)
* TODO before GA: Test coverage in EndpointSlice controller for the transition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott - FWIW - it should have happened for beta, as this is when most users generally enable it (beta on by default).
Please keep that in mind for the future (and I learnt that over time and now kind-of require it as beta criteria in many cases).

@wojtek-t
Copy link
Member

wojtek-t commented Feb 9, 2023

/lgtm
/approve PRR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, robscott, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants