Skip to content

Remove reject ACLs regardless of cache#1738

Merged
trozet merged 1 commit into
ovn-kubernetes:masterfrom
trozet:fix_reject_acl_removal
Sep 30, 2020
Merged

Remove reject ACLs regardless of cache#1738
trozet merged 1 commit into
ovn-kubernetes:masterfrom
trozet:fix_reject_acl_removal

Conversation

@trozet
Copy link
Copy Markdown
Contributor

@trozet trozet commented Sep 28, 2020

Previously we were just checking if there as a known ACL which existed
in the cache. If the cache is ever out of sync, we could accidentally
leave stale reject ACLs configured in OVN.

This patch adds checking during service sync for any known stale ACLs
with reject actions belonging to a service. During regular reject ACL
delete a new check is added to see if cache is in an invalid state, and
if so attempt to remove the ACL by querying OVN.

Signed-off-by: Tim Rozet trozet@redhat.com

@trozet
Copy link
Copy Markdown
Contributor Author

trozet commented Sep 28, 2020

/hold

Want to test this more and maybe also add another commit for handling stale ACL removal in syncServices.

Comment thread go-controller/pkg/util/net.go
Comment thread go-controller/pkg/ovn/endpoints.go
Comment thread go-controller/pkg/ovn/loadbalancer.go
@trozet trozet force-pushed the fix_reject_acl_removal branch 2 times, most recently from bfd27a4 to 16fa58b Compare September 28, 2020 15:21
@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented Sep 28, 2020

If we have to do some kind of "sync OVN database to Kube API state" when ovnkube restarts, we should do that in the sync functions if possible.

@trozet trozet force-pushed the fix_reject_acl_removal branch from 16fa58b to 608adcd Compare September 29, 2020 03:36
@trozet
Copy link
Copy Markdown
Contributor Author

trozet commented Sep 29, 2020

If we have to do some kind of "sync OVN database to Kube API state" when ovnkube restarts, we should do that in the sync functions if possible.

@dcbw I added some handling in the services sync function. This is a little tricky because these are ACLs, so we can't be 100% sure that they belong to services unless the name matches. Therefore I only blow those away if I can tell that they have the expected name of a service and are stale (aka have endpoints). I fixed the regular delete operation to use the cache. However, when we can tell the cache is in an invalid state, last resort is still to try to query OVN.

@trozet
Copy link
Copy Markdown
Contributor Author

trozet commented Sep 29, 2020

running downstream on openshift/ovn-kubernetes#295

@coveralls
Copy link
Copy Markdown

coveralls commented Sep 29, 2020

Coverage Status

Coverage increased (+0.1%) to 56.35% when pulling 3e00ff5 on trozet:fix_reject_acl_removal into 273400a on ovn-org:master.

Comment thread go-controller/pkg/ovn/service.go
Comment thread go-controller/pkg/ovn/loadbalancer.go Outdated
@trozet trozet force-pushed the fix_reject_acl_removal branch from 608adcd to cee6292 Compare September 29, 2020 19:17
Comment thread go-controller/pkg/ovn/service.go Outdated
@trozet trozet force-pushed the fix_reject_acl_removal branch from cee6292 to 79223d6 Compare September 29, 2020 22:11
@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented Sep 30, 2020

/lgtm
but the testcase failures both seem service-related; do you think they are relevant?

Comment thread go-controller/pkg/ovn/service.go
Previously we were just checking if there as a known ACL which existed
in the cache. If the cache is ever out of sync, we could accidentally
leave stale reject ACLs configured in OVN.

This patch adds checking during service sync for any known stale ACLs
with reject actions belonging to a service. During regular reject ACL
delete a new check is added to see if cache is in an invalid state, and
if so attempt to remove the ACL by querying OVN.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet trozet force-pushed the fix_reject_acl_removal branch from 79223d6 to 3e00ff5 Compare September 30, 2020 03:22
@trozet
Copy link
Copy Markdown
Contributor Author

trozet commented Sep 30, 2020

/lgtm
but the testcase failures both seem service-related; do you think they are relevant?

I didn't look closely. I saw there were a few flakes, and that's about what we usually get out of our CI these days 🥇

Let's see what the next round of CI results bring.

@trozet
Copy link
Copy Markdown
Contributor Author

trozet commented Sep 30, 2020

I think CI looks OK. These changes aren't IPv6 or shared gw specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants