WIP: Support EndpointSlice in sdn and test handling terminating endpoints#271
WIP: Support EndpointSlice in sdn and test handling terminating endpoints#271smarterclayton wants to merge 3 commits intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@Miciah this i where i'm testing it |
|
EDIT: Feature gate enablement was in wrong place |
16d1b86 to
c918927
Compare
|
well, test went green except for an unidling test (i may have broken it just because of the use of the slice for annotations instead of endpoints). will kick some upgrade jobs and see how it plays out as well as a network stress test |
This is a test cherry-pick based on the current vendor state containing upstream 97238, which allows the proxier to handle terminating endpoints. This is not sufficient by itself because we need to test endpoint slices, but ensures the right code is in place.
|
Ok, got one run of e2e two runs of upgrade with endpoint slices on, but the termination gate off, and behavior was similar to endpoints except for the idling bug. Now testing with termination gate on in https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1368251013236002816 as an upgrade and https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1368251449502339072 as an e2e |
|
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Passes on those (for the factors that matter) without EndpointSliceTerminating set on the server side (so we're safe to roll this out first to Kube-proxy). |
| // detect when a service become unidling | ||
| klog.V(6).Infof("hybrid proxy: (always) add ep %s/%s in unidling proxy", endpoints.Namespace, endpoints.Name) | ||
| p.unidlingProxy.OnEndpointsAdd(endpoints) | ||
| p.unidlingProxy.OnEndpointSliceAdd(endpoints) |
There was a problem hiding this comment.
I'm not familiar at all with the sdn/proxy implementation, maybe this information is redundant, but can be multiple slices for the same service, and each slice can have duplicate endpoints, kube-proxy uses a cache
https://github.com/kubernetes/kubernetes/blob/2bcbc527a760106ec89647fcf6852f37c804f4ed/pkg/proxy/endpointslicecache.go#L43-L49
There was a problem hiding this comment.
the limit of endpoints per slice is 100, so if you have more than 100 endpoints, let's say 110 for service X you'll receive two slices Y1 and Y2, maybe with 100 and 10 endpoints each
There was a problem hiding this comment.
We really should have an e2e test in upstream that creates a service with > 100 endpoints then to exercise this. Is there one you know of I can crib?
There was a problem hiding this comment.
Actually, for idling, wondering whether we even need to support > three or four endpoints. The only time the user space proxy should be in play is on an idle service which has no endpoints.
There was a problem hiding this comment.
her we even need to support > three or four endpoints.
as I said, I'm not familiar with this code, just raising some points that I think may be taking into consideration, if that is the case, it seems we should't worry about this
We really should have an e2e test in upstream that creates a service with > 100 endpoints
this is well tested upstream
There was a problem hiding this comment.
The unidling proxy can just ignore the endpointslices and work with the endpoints like it always did. Endpoints objects always contain the full set of endpoints, even in cases where the EndpointSlice controller would start splitting things up; it just means that code working with the Endpoints objects doesn't get the efficiency wins that code working with the EndpointSlice objects would get.
danwinship
left a comment
There was a problem hiding this comment.
Idling is broken because this flips us to only using EndpointSlice, but the userspace proxy (which is used by idling) doesn't support EndpointSlice.
| return err | ||
| } | ||
|
|
||
| // DO NOT MERGE: hack endpoint slice on |
There was a problem hiding this comment.
You need to revert 12b21f8 from #227.
(That will result in EndpointSlice and EndpointSliceProxying being enabled since they're enabled by default.) For EndpointSliceTerminatingCondition we should eventually handle that like other feature gates, but CNO doesn't watch the FeatureGate resource yet (https://issues.redhat.com/browse/SDN-1325).
There was a problem hiding this comment.
oh, wait, no, Aniket already reverted that a while back; we'll need to revert the relevant part of openshift/cluster-network-operator#905
| proxyconfig.NoopEndpointSliceHandler | ||
| // TODO implement https://github.com/kubernetes/enhancements/pull/640 | ||
| proxyconfig.NoopNodeHandler | ||
| NoopEndpointsHandler |
There was a problem hiding this comment.
The HybridProxier can't no-op Endpoints handling; it has to pass EndpointSlice events down to the iptables proxier and Endpoints events to the userspace proxier. And since OsdnProxy acts as a filter on top of HybridProxier, it needs to also pass both sets of events down to the proxier it's wrapping.
There was a problem hiding this comment.
I updated userspace proxier to use EndpointSlice, I thought we were already going to have to switch to use Service instead of Endpoints for idling.
There was a problem hiding this comment.
Oops, I see what you mean. Why wasn't userspace proxier updated? Just no one signed up for it?
There was a problem hiding this comment.
Why wasn't userspace proxier updated? Just no one signed up for it?
Upstream doesn't care about the userspace proxier any more (Tim would probably have already deleted it if OCP wasn't using it for unidling) and Red Hat had thought we weren't going to have to use EndpointSlice in openshift-sdn, so we didn't care about updating it either.
At any rate, I think we don't actually need to update userspace to use EndpointSlice; we just need to make HybridProxier and OsdnProxy pass both endpoint events and endpointslice events down to their wrapper proxiers, and then eventually the iptables proxy will act on the endpointslice events and the userspace proxy will act on the endpoint events.
|
FYI #296 is a more complete EndpointSlice PR |
|
@smarterclayton: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/close |
|
@danwinship: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
86ceaca (Clayton Coleman, 5 minutes ago)
DO NOT MERGE: Force EndpointSliceProxying on
bf1fd9b (Clayton Coleman, 12 minutes ago)
DO NOT MERGE: UPSTREAM: 97238: Handle terminating endpoints
This is a test cherry-pick based on the current vendor state containing
upstream 97238, which allows the proxier to handle terminating endpoints.
This is not sufficient by itself because we need to test endpoint slices,
but ensures the right code is in place.