KEP-3015: PreferSameZone/PreferSameNode traffic distribution #5140

danwinship · 2025-02-06T14:59:07Z

One-line PR description: Adds PreferSameZone and PreferSameNode as values for Service's TrafficDistribution field, deprecates PreferClose.

Issue link: PreferSameNode Traffic Distribution (formerly PreferLocal traffic policy / Node-level topology) #3015

Other comments: This was originally proposed as a modification to KEP-4444 (KEP-4444: add "prefer same node" semantics to PreferClose #4931), but it requires introducing a new field in EndpointSlice, and we don't want the rest of KEP-4444 to be delayed going to GA, so I'm splitting this out.

MikeZappa87 · 2025-02-06T15:28:15Z

keps/sig-network/3015-prefer-same-node/README.md

+#### DNS
+
+As a cluster administrator, I plan to run a DNS pod on each node, and
+would like DNS requests from other pods to always go to the local DNS


Above you state “preferentially” and here you state always. Probably just wording, does this flag guarantee traffic will always be on the same node?

There's some discussion in #4931 about exactly what "Prefer" implies... if it means "definitely unless there are no preferred endpoints" or if it means "probably unless there's a good reason not to". It is not totally clear. At any rate, the "Prefer" here means the same thing as the "Prefer" in "PreferClose" does.

I think the wiggle room is justified. We've made the mistake of being to prescriptive without a good corpus of implementations in the past.

thockin

LGTM overall. Tell me if I misunderstood the fallback comments

thockin · 2025-02-06T23:04:05Z

keps/sig-network/3015-prefer-same-node/README.md

+`PreferSameNode`, indicating traffic for a service should
+preferentially be routed to endpoints on the same node as the client.
+
+(This is the third attempt at this feature, which was previously


Demonstrating both our willingness to admit we were wrong, and our relentless pursuit of excellence! :)

thockin · 2025-02-06T23:16:19Z

keps/sig-network/3015-prefer-same-node/README.md

+#### DNS
+
+As a cluster administrator, I plan to run a DNS pod on each node, and
+would like DNS requests from other pods to always go to the local DNS


I think the wiggle room is justified. We've made the mistake of being to prescriptive without a good corpus of implementations in the past.

keps/sig-network/3015-prefer-same-node/README.md

thockin · 2025-02-07T01:45:59Z

Maybe I am losing my mind, but why do you need three releases?

You might have nodes which are back rev by three, three but in this specific case, those fall back on the zone hints. I am racking my brain trying to figure out a failure mode that would require three releases but I'm not seeing it

aojea · 2025-02-07T07:57:47Z

keps/sig-network/3015-prefer-same-node/README.md

+
+By checking if any Service has `TrafficDistribution: PreferSameNode`.
+
+###### How can someone using this feature know that it is working for their instance?


if the feature is working you'll see a new field in the EndpointSlices ForNodes populated, in addition to that they need to do the manual inspection you indicate to validate that the traffic directed to those endpoints only goes to those nodes

That only tells you that half of the feature is working. Notably, that would happen even if you still had an old kube-proxy, but the old kube-proxy would ignore the ForNodes hint.

keps/sig-network/3015-prefer-same-node/README.md

aojea · 2025-02-07T08:05:18Z

this LGTM modulo some minor comments
@thockin , are we opting for this in 1.33

adrianmoisey · 2025-02-07T15:05:00Z

keps/sig-network/3015-prefer-same-node/README.md

+### Goals
+
+- Allow configuring a service so that connections will be delivered to
+  a local endpoint when possible, and a remote endpoint if not.


Is there a possibility of allowing PreferSameNode to fall back to PreferSameZone, and PreferSameZone to fall back to any?

I can see that as useful in my environment, but it also increases the complexity of how a user needs to define what they want.

I think that is exactly what is implied

keps/sig-network/3015-prefer-same-node/README.md

thockin · 2025-02-07T16:03:15Z

It's up to Dan whether he wants to go for 33, but we have to decide soon, and we have to decide between this and the PreferClose. They all have a clock on them, so getting it to GA starts the clock

danwinship · 2025-02-07T20:00:27Z

Maybe I am losing my mind, but why do you need three releases?

I dunno. I thought that was just a rule.

OK. So:

I assume PreferSameNode needs to start as Alpha, in which case, it needs its own feature gate, since ServiceTrafficDistribution is already Beta. Right?
I assume PreferSameZone doesn't need to start as Alpha, since we were ready to declare PreferClose GA, and this is just a rename of it. Also, if we know we want to deprecate PreferClose, then we should make its replacement fully-available sooner rather than later, right?

So I feel like the right approach is:

1.33: Update KEP-4444 to include PreferSameZone and deprecate PreferClose, but leave it as Beta. Implement this KEP with PreferSameNode as Alpha behind its own feature gate.
1.34: PreferSameNode goes to Beta. TrafficDistribution+PreferSameZone can go to GA, if we want.
1.35: PreferSameNode goes to GA. TrafficDistribution as a whole goes to GA if it didn't in 1.34.

thockin · 2025-02-07T21:34:35Z

I assume PreferSameNode needs to start as Alpha, in which case, it needs its own feature gate, since ServiceTrafficDistribution is already Beta. Right?

Correct.

I assume PreferSameZone doesn't need to start as Alpha, since we were ready to declare PreferClose GA, and this is just a rename of it. Also, if we know we want to deprecate PreferClose, then we should make its replacement fully-available sooner rather than later, right?

Unfortunately no. If someone sets "PreferSameZone" and then we rollback to 32, their Service is invalid. It has to start behind an off-by-default gate. I suggested to @gauravkghildiyal that he make that "rename" (alias, really) under your same gate. It's unlikely that "PreferClose" would actually ever go away (risk >> reward), so let's just embrace it. If "PreferClose" goes GA in 33, then you don't have to handle the corner case of PreferClose being gated-off but PreferSameZone being on.

From the POV of the PreferSameZone gate, PreferClose is locked to default.

1.33: Update KEP-4444 to include PreferSameZone and deprecate PreferClose, but leave it as Beta. Implement this KEP with PreferSameNode as Alpha behind its own feature gate.

1.34: PreferSameNode goes to Beta. TrafficDistribution+PreferSameZone can go to GA, if we want.

1.35: PreferSameNode goes to GA. TrafficDistribution as a whole goes to GA if it didn't in 1.34.

I think it has to be:

1.33: GA TrafficDistribution+PreferClose, introduce PreferSameZone and PreferSameNode behind an alpha gate
1.34: PreferSame* goes to beta
1.35+: PreferSame* goes GA
1.36: Remove the TrafficDistribution gate
1.38+: Remove the PreferSame* gate

It's simpler and really, I am just pre-NAKing the PR to remove "PreferClose". I would probably NAK it anyway.

danwinship · 2025-02-07T21:55:12Z

Maybe I am losing my mind, but why do you need three releases?

You might have nodes which are back rev by three, three but in this specific case, those fall back on the zone hints.

Ah, I was thinking about past problems with apiserver/kubelet skew, but kube-controller-manager can only be 1 release older than kube-apiserver, so a single release as Alpha addresses that. OK.

adrianmoisey · 2025-02-08T14:43:41Z

In principal: lgtm

I'm actually super keen to get this feature into our clusters!

toVersus · 2025-02-09T03:14:44Z

keps/sig-network/3015-prefer-same-node/README.md

+
+	// forNodes indicates the node(s) this endpoint should be targeted by.
+	// +listType=atomic
+	ForNodes []string `json:"forNodes,omitempty" protobuf:"bytes,2,name=forNodes"`


Is the ForNodes field just a slice of strings, whereas the ForZones field is defined as a slice of ForZone?

The ForZone struct was apparently added to support future ideas like having multiple zones with weight values... (https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2433-topology-aware-hints/README.md#future-api-expansion)

I guess we could do the same for ForNodes... ? I think the idea of weighted hints is based more on the "smart controllers / ambiguous API" model of TopologyAwareHints, while TrafficDistribution has moved more toward "simple controllers / explicit API" so that sort of future API expansion seems unlikely.

As mentioned in #5140 (comment), even if we are in the "smart controllers / ambiguous API" model, actually, there's no need to include weight information in ForZones hints. It’s enough to determine the weight within the service proxy implementation, so it's unnecessary.

However, looking at this wording, it might be difficult to definitively say that a string slice is the best choice...🤷‍♂️ For example, if fields other than trafficDistribution are added, allowing users to specify the behavior in more detail, is there a possibility that Hints would need to include additional information?

Although it is unclear if either expansion will be necessary, the API is designed in a way to make expansions straightforward.

toVersus · 2025-02-09T03:17:23Z

keps/sig-network/3015-prefer-same-node/README.md

+When updating EndpointSlices, if the EndpointSlice controller sees a
+service with `PreferSameNode` traffic distribution, then for each
+endpoint in the slice, it will add a `ForNodes` hint including the
+name of the endpoint's node. (The field is an array for future


Does 'future extensibility' here mean leaving room to implement safeguards against overload later on (e.g., CPU core-based correction in the Topology Aware Routing)? If the safeguard is intended to distribute the load to other nodes, should we add not only the node where the endpoint resides but also the node names that became candidates as a result of the correction?

No, "future extensibility" means other use cases besides simple "PreferSameNode".

For example, you could theoretically implement "prefer same rack" by manually filling in multiple ForNodes values.

(Our assumption has been that if you were going to have some sort of "smart" backoff/fallback behavior, that the behavior would be entirely within the proxy, so nobody would be writing out hints about it. The proxy would just determine that it should avoid Node X and starting using Nodes Y and Z isntead, and then do that.)

thockin · 2025-02-09T23:41:44Z

Thanks!

/lgtm
/approve

keps/prod-readiness/sig-network/3015.yaml

wojtek-t

Few minor comments - other than that lgtm from prr pov.

keps/sig-network/3015-prefer-same-node/README.md

wojtek-t · 2025-02-10T10:05:07Z

keps/sig-network/3015-prefer-same-node/README.md

+
+An initial rollout cannot fail and won't impact already-running
+workloads, because at the time of the initial rollout, there cannot
+already be any `PreferSameZone` or `PreferSameNode` services.


Well - in some catastrophic scenario you may break endpoint-slice controller (fairly unlikely but not technically impossible).
It still won't impact running workloads, but would affect propagating changes to these.

I feel like there's an implied "if there are bugs in the code then arbitrary bad things might happen"? Like, we could crash all of kcm too, but it doesn't seem right to say "an initial rollout might cause the DaemonSet controller to fail".

keps/sig-network/3015-prefer-same-node/README.md

adrianmoisey · 2025-02-10T10:15:51Z

keps/sig-network/3015-prefer-same-node/README.md

+* `PreferSameNode`: Indicates a preference for routing traffic to
+  endpoints that are on the same node as the client. In general, the
+  proxy should always route to a same-node endpoint if any is
+  available.


How does this field interact with the iTP field?

I assume that what is written here will also apply to PreferSameNode: https://kubernetes.io/docs/reference/networking/virtual-ips/#interaction-with-traffic-policies

Yes. All topology-ish features should behave the same way with respect to other features; the only difference is which endpoints they pick as "topologically available".

Local traffic policy ignores topology, because its own routing concerns render topology irrelevant. But if we added other traffic policy types in the future, they might work differently and we'd have to define that then.

sftim

A query about ForNodes, and some style nits.

sftim · 2025-02-10T16:21:05Z

keps/sig-network/3015-prefer-same-node/README.md

+
+### Goals
+
+- Make `TrafficDistribution` less ambiguous.


(nit)

Suggested change

- Make `TrafficDistribution` less ambiguous.

- Make `trafficDistribution` less ambiguous.

In the API, we most commonly see this field name in camelCase. So, use that here.

but in code we see it capitalized. I have no data but I feel like we tend to use field names both ways regularly in KEPs...

sftim · 2025-02-10T16:21:18Z

keps/sig-network/3015-prefer-same-node/README.md

+
+#### DNS
+
+As a cluster administrator, I plan to run a DNS pod on each node, and


Isn't this a user story?

Yes... it's under the User Stories heading. (I just didn't number them.)

keps/sig-network/3015-prefer-same-node/README.md

sftim · 2025-02-10T16:22:46Z

keps/sig-network/3015-prefer-same-node/README.md

+
+When updating EndpointSlices, if the EndpointSlice controller sees a
+service with `PreferSameNode` traffic distribution, then for each
+endpoint in the slice, it will add a `ForNodes` hint including the


Suggested change

endpoint in the slice, it will add a `ForNodes` hint including the

endpoint in the slice, it computes an internal `ForNodes` hint including the

If this isn't an internal detail of kube-controller-manager, I'm puzzled about why https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/ doesn't mention either hints or ForNodes.

It doesn't mention ForNodes because that field is new in this KEP.

It doesn't mention hints because the "Topology information" subsection of that document apparently never got updated to explain the TopologyAwareHints feature... but it's not any more "internal" than anything else in EndpointSlice.

keps/sig-network/3015-prefer-same-node/README.md

wojtek-t · 2025-02-11T08:42:29Z

/lgtm
/approve PRR

k8s-ci-robot · 2025-02-11T08:42:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-network/OWNERS~~ [danwinship,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 6, 2025

k8s-ci-robot requested review from johnbelamaric and shaneutt February 6, 2025 14:59

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2025

danwinship mentioned this pull request Feb 6, 2025

PreferSameNode Traffic Distribution (formerly PreferLocal traffic policy / Node-level topology) #3015

Open

4 tasks

danwinship self-assigned this Feb 6, 2025

shaneutt requested review from aojea and MikeZappa87 February 6, 2025 15:14

danwinship mentioned this pull request Feb 6, 2025

KEP-4444: add "prefer same node" semantics to PreferClose #4931

Closed

shaneutt requested review from robscott and gauravkghildiyal February 6, 2025 15:17

MikeZappa87 reviewed Feb 6, 2025

View reviewed changes

danwinship force-pushed the prefer-same-node branch from 2d70367 to 1185959 Compare February 6, 2025 17:19

thockin self-assigned this Feb 6, 2025

thockin reviewed Feb 6, 2025

View reviewed changes

aojea reviewed Feb 7, 2025

View reviewed changes

keps/sig-network/3015-prefer-same-node/README.md Outdated Show resolved Hide resolved

aojea reviewed Feb 7, 2025

View reviewed changes

keps/sig-network/3015-prefer-same-node/README.md Show resolved Hide resolved

adrianmoisey reviewed Feb 7, 2025

View reviewed changes

keps/sig-network/3015-prefer-same-node/README.md Show resolved Hide resolved

danwinship force-pushed the prefer-same-node branch from 1185959 to b87dea8 Compare February 8, 2025 15:58

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 8, 2025

danwinship force-pushed the prefer-same-node branch from b87dea8 to 0758e62 Compare February 8, 2025 17:36

danwinship changed the title ~~KEP-3015: PreferSameNode traffic distribution~~ KEP-3015: PreferSameZone/PreferSameNode traffic distribution Feb 8, 2025

danwinship mentioned this pull request Feb 8, 2025

KEP-4444: Graduate trafficDistribution:PreferClose to GA #5152

Merged

toVersus reviewed Feb 9, 2025

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2025

wojtek-t reviewed Feb 10, 2025

View reviewed changes

keps/prod-readiness/sig-network/3015.yaml Outdated Show resolved Hide resolved

wojtek-t reviewed Feb 10, 2025

View reviewed changes

wojtek-t self-assigned this Feb 10, 2025

adrianmoisey reviewed Feb 10, 2025

View reviewed changes

danwinship force-pushed the prefer-same-node branch from 0758e62 to 38a6d41 Compare February 10, 2025 15:09

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 10, 2025

sftim reviewed Feb 10, 2025

View reviewed changes

KEP-3015: PreferSameNode traffic distribution

f0283c6

danwinship force-pushed the prefer-same-node branch from 38a6d41 to f0283c6 Compare February 10, 2025 17:39

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2025

k8s-ci-robot merged commit 598575c into kubernetes:master Feb 11, 2025
4 checks passed

k8s-ci-robot added this to the v1.33 milestone Feb 11, 2025

danwinship deleted the prefer-same-node branch February 11, 2025 13:34


		By checking if any Service has `TrafficDistribution: PreferSameNode`.

		###### How can someone using this feature know that it is working for their instance?

	- Make `TrafficDistribution` less ambiguous.
	- Make `trafficDistribution` less ambiguous.


		#### DNS

		As a cluster administrator, I plan to run a DNS pod on each node, and

	endpoint in the slice, it will add a `ForNodes` hint including the
	endpoint in the slice, it computes an internal `ForNodes` hint including the

KEP-3015: PreferSameZone/PreferSameNode traffic distribution #5140

KEP-3015: PreferSameZone/PreferSameNode traffic distribution #5140

Conversation

danwinship commented Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Feb 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea commented Feb 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Feb 7, 2025

danwinship commented Feb 7, 2025

thockin commented Feb 7, 2025 • edited Loading

danwinship commented Feb 7, 2025

adrianmoisey commented Feb 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Feb 9, 2025

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Feb 11, 2025

k8s-ci-robot commented Feb 11, 2025

danwinship commented Feb 6, 2025 •

edited

Loading

thockin commented Feb 7, 2025 •

edited

Loading