Skip to content

[WIP][TEST] New service controller using endpoint slices#392

Closed
aojea wants to merge 8 commits intoopenshift:masterfrom
aojea:slices
Closed

[WIP][TEST] New service controller using endpoint slices#392
aojea wants to merge 8 commits intoopenshift:masterfrom
aojea:slices

Conversation

@aojea
Copy link
Contributor

@aojea aojea commented Dec 21, 2020

test only

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign dcbw after the PR has been reviewed.
You can assign the PR to them by writing /assign @dcbw in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Contributor Author

aojea commented Dec 22, 2020

/retest

@aojea
Copy link
Contributor Author

aojea commented Dec 22, 2020

this doesn't make any sense, now the gcp job works without any change, it was failing to install before

@aojea
Copy link
Contributor Author

aojea commented Dec 22, 2020

/retest

@aojea
Copy link
Contributor Author

aojea commented Dec 23, 2020

this doesn't make any sense, now the gcp job works without any change, it was failing to install before

ok, I think that this may explain it, the kubernetes version has to be bumped independently in each component
openshift/cluster-dns-operator#222
and then accepted in the release
https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasetag/4.7.0-0.okd-2020-12-21-154920?from=4.7.0-0.okd-2020-12-17-131741

@aojea
Copy link
Contributor Author

aojea commented Jan 5, 2021

/retest

1 similar comment
@aojea
Copy link
Contributor Author

aojea commented Jan 11, 2021

/retest

@aojea
Copy link
Contributor Author

aojea commented Jan 12, 2021

Ok, this time it went through the 3 cloud providers,

the reject packet for service with endpoints test keeps failing though on the 3 jobs

@aojea
Copy link
Contributor Author

aojea commented Jan 13, 2021

/retest

3 similar comments
@aojea
Copy link
Contributor Author

aojea commented Jan 14, 2021

/retest

@aojea
Copy link
Contributor Author

aojea commented Jan 14, 2021

/retest

@aojea
Copy link
Contributor Author

aojea commented Jan 14, 2021

/retest

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 15, 2021
Antonio Ojea and others added 8 commits January 16, 2021 00:41
Signed-off-by: Antonio Ojea <aojea@redhat.com>
We were looking for the kubernetes version to detect if the cluster
had enabled the DualStack API, however, this detection fails if
the apiserver version is built from a "dirty" git branch.

Instead, we can detect if the services has enabled the ClusterIPs
field, that is much more reliable.

Signed-off-by: Antonio Ojea <aojea@redhat.com>
The new service controller for endpoint slices has a repair
loop that runs once, during the controller startup process,
so we can delete out of sync data between OVN and Kubernetes,
specially to deal with upgrades and OVN new features.

The repair loop was using the controller service tracker as a
source of true, but this is racy because the service tracker
is updated just after the OVN Load Balancer has been configured,
so it can happen that the Service has an OVN loadbalancer, but
doesn´t exist on the service tracker. As a consequence the repair
loops things that is a stale loadbalancer and deletes it.

The repair loop must use the informer cache as source of true, also
it must run before starting the informers workers, so we avoid any
races.

Signed-off-by: Antonio Ojea <aojea@redhat.com>
The new service controller was removing the reject ACL
independently that the service had or not endpoints.

It must remove the ACL only if the service doesn´t exist or
if it has endpoints.

Signed-off-by: Antonio Ojea <aojea@redhat.com>
We were detecting the default gateway based in the assumption that
the netlink library was populating the de Gw field with it, however,
it turns out that when multipath is enabled that field is empty and
the corresponding information is in the MultiPath field.

Also, in order to obtain the default gateway, we were dumping all
routes filtering by FAMILY_ALL, and iterating on the results to get
the ones we are interested. We can filter directly by the IP Family instead,
to avoid future possibles problems with the FAMILY_ALL interpretion.

Signed-off-by: Antonio Ojea <aojea@redhat.com>
OVN automatically rejects packets to loadbalancer vips without
backends, so we don´t need to handle that logic anymore in OVN-Kube.

Signed-off-by: Antonio Ojea <aojea@redhat.com>
Signed-off-by: Antonio Ojea <aojea@redhat.com>
@aojea
Copy link
Contributor Author

aojea commented Jan 16, 2021

/test all

@aojea
Copy link
Contributor Author

aojea commented Jan 16, 2021

this is pretty good @dcbw 2/2 passes with the new ovn version and the new feature in OVN that reject the services without endpoints
8175142
a15efbd

What do you think if instead of trying to fix the ACLs with the new service endpoint slice controller I wait for this version?

@aojea
Copy link
Contributor Author

aojea commented Jan 17, 2021

/test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 17, 2021

@aojea: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 a15efbd link /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-ovn-hybrid-step-registry a15efbd link /test e2e-ovn-hybrid-step-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@aojea aojea closed this Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants