[wip] June 1st downstream merge part 2 by tssurya · Pull Request #1693 · openshift/ovn-kubernetes

tssurya · 2023-06-01T07:04:45Z

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

which mean reset, not ignore. When acls are updated on initial sync (by handling add events) if namespace log levels were reset, acls should also be updated with nil Severity. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

intention from implementation. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Add new BuildACLFromDbIDs function, that should be the only one used in the end of acls indexes update. Add 2 owner Types for global and namespaced multicast resources. Generate ACL name based on dbIDs, don't include owner information to save some symbols. Add (objectIDs *DbObjectIDs) StringNoOwner() for this purpose. Add cleanup function for disabled multicast support and namespaced multicast cleanup. Move noneMatch to the gress_policy.go, where it is used. Add tests for multicast acl sync and cleanup. Rename address_set_syncer package to external_ids_syncer, add acl syncer sub-package. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

1. Use dbIDs with new NetpolNodeOwnerType 2. Add aclSync for allow from node ACL 3. Rework AllowFromNode ACL unit tests to run everything for ipv4 and ipv6 Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Move gress-related constants from policy.go to gress_policy.go. Use old externalIDs for syncs only. policy test "correctly retries recreating a network policy with the same name" last expected data was updated, since newly generated ACLs Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Add NetpolNamespaceOwnerType for namespace-wide default deny acls. Add acl_sync PrimaryID check to make sure in case more than 1 ACLs have the same primaryID, only of them will be updated. This also eliminates the need for "deleting a network policy that failed half-way through creation succeeds" test, which is based on a "found multiple results for provided predicate" condition. Replace tests "stale ACLs should be cleaned up or updated at startup via syncNetworkPolicies" and "ACLs with long names and run syncNetworkPolicies" with the new "reconciles an existing networkPolicy updating stale ACLs with long names", and a part that deletes stale ARP ACLs is now updated and tested as a part of acl_sync. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Add EgressFirewallOwnerType for egress firewall ACLs. Update syncEgressFirewall to use passed egressFirewalls objects for cleanup, create updateEgressFirewallACLsDbIndex function that updates old formatted ACLs. Update egressfirewall_test.go, move a part of setup to BeforeEach. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

1. rename BuildACLFromDbIndex to BuildACL, since it is the only method that should be used now. 2. Rename getACLMatchFromACLDir to getACLMatch, remove stale version. 3. update isEquivalentACL to checkACLPrimaryID, that matches on primaryID until we get client indexes. 4. Add acls.md doc explaining all acls that are used by ovn-k and their dependencies with examples 5. Update multicast docs Add missing links to README.md Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Now Egress firewall ACLs are named "EF:<namespace>:<priority>" default deny netpol ACLs "NP:<namespace>:<direction>" gress ACLs "NP:<policyNamespace>:<policyName>:<direction>:<gressIdx>" Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

The current util.NextIP() function returns the wrong IP for IPv6 address with leading zeros. We can directly use the NextIP() function in vendor containernetworking/plugins instead. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Acls new indexes

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

…once Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

…t that. Spotted in upstream ovn-org/ovn CI when running against ovn versions <=22.09 which don't support component templates: https://github.com/ovn-org/ovn/actions/runs/4628617882 Reported error: failed to sync chassis: error: failed to get template var list: error: Wrong parameter type (*nbdb.ChassisTemplateVar): Model not found in Database Model Fixes: 4b3475a ("services: Use OVN template load balancers for NodePort services.") Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Asserting on `netexec` response to not containing errors does not allow using retries, as the test would fail during the first attempt. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

services: Don't try to list/cleanup templates when OVN doesn't suppor…

Minor performance fixes

Use PrimaryID as a client index for ACL.

For unknown reasons, during the kube rebase, we just decided to flip from sorting the list of endpoints to not sorting it? : 3850450#diff-d3aa58d9b58a0a09264f072df46ab01d0501eb508c4656411ae2dc1ac68fb3c4L579 The allocateLBNodePorts feature depends on the list being sorted, else our logic is skewed and the random probability model we use to generate iptables gets skewed up as well. So during endpoints update and such we will behave badly and have left over DNATs towards pods that don't exist anymore which will lead to black-holing the traffic. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Fix the formatting of the log messages (%w is only valid when creating new errors, not when logging) and update the master test suite name since it hasn't been only gateway init operations for a long time. Signed-off-by: Dan Williams <dcbw@redhat.com>

the fakeAddresset code was originally created so that during testing we could directly access and figure out what ip addresses where added to addressSets because we could not access an ovn northbound database. Since the introduction of libovsdb we can use the fake database that is provided for us instread of the the fakeAddressSetFactory and fakeAddressSet Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>

Fix e2e etp=local flake happening in LGW mode lanes

The compact mode means running ovnkube master and node in the same process. The OVN-K runs in this configuration in Microshift. Signed-off-by: Peng Liu <pliu@redhat.com>

Use a single node cluster with compact-mode enabled to mimic the ovnkube setup in Microshift. Signed-off-by: Peng Liu <pliu@redhat.com>

Signed-off-by: Peng Liu <pliu@redhat.com>

NextIP returns the wrong IP for IPv6 address with leading zeros

…tion Remove the no-longer-used node host subnet annotation to allow the default network controller's node code to proceed. Also redo how the default network controller counts host subnets to make sure we actually have the right number of each IP family, rather than just counting numbers. Because the allocation handling was split out into the ClusterManager, the network controller's addNode() no longer has the guarantees of a correct set of annotations before it runs, as it and the cluster manager run in parallel. Signed-off-by: Dan Williams <dcbw@redhat.com>

Users can specify IPV4 and IPv6 gateway nexthop for dual-stack cluster with '--gateway-nexthop=<ipv4_addr>,<ipv6_addr>' Signed-off-by: Peng Liu <pliu@redhat.com>

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

When a node gets a new IP addresses, node loadbalancers should be updated to serve NodePort services on the new address. Update node_tracker.go to track both `host-addresses` and `l3-gateway-config` IP addresses, as they are used in different ways for building load balancers. Add End2End test to control-plane suite. Update load balancer unit tests to include a multi address node. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

During the kube 1.26 bump the sets[type].List() call was deprecated and replaced with unsortedList(). This could cause some flakes as we depend on the order of the set in some cases. This fix updates the unsortedList calls to use the sets List function Signed-off-by: Ben Pickard <bpickard@redhat.com> s s

clustermanager: update node annotations on dual->single stack conversion

Make --gateway-nexthop support dual-stack

Revert predicate search optimization 3369d38 as predicate search should not bee needed anymore Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Use loadbalancer.Name as client index

Convert unsortedList() to sets.List()

Start to remove the fakeAddressSet

If the interface specified with '--gateway-interface' is already a bridge port, use the bridge name as the gateway interface. Signed-off-by: Peng Liu <pliu@redhat.com>

Scalable Functions (SFs) are "lightweight" Virtual Functions (VFs), meaning they are netdevices that implement representor model and can be used as container network interfaces. Unlike of VFs, SFs are Auxiliary devices and located on the respective system bus. Change the current code base to also accomodate the use of SF Auxiliary devices instead of only VF PCI devices for containers. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Hareesh Puthalath <hareeshp@nvidia.com>

For NodePort Services with ExternalTrafficPolicy = Local, incoming connections shouldn't be DNATted to masqueradeIP. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

Serve NodePort services on secondary IP addresses

Detect bridge name when 'gateway-interface' is specified

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Temporary metallb fix

…nto june-1st-downstream-merge-part-2 Conflicts: go-controller/pkg/libovsdb/libovsdb.go

In CI we are observing: E0531 19:52:52.309098 1 obj_retry.go:627] Failed to update *v1.Node, old=ip-10-0-133-36.ec2.internal, new=ip-10-0-133-36.ec2.internal, error: error creating gateway for node ip-10-0-133-36.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-133-36.ec2.internal: unable to fetch podIPs for pod openshift-multus/network-metrics-daemon-cr75v: pod openshift-multus/network-metrics-daemon-cr75v: no pod IPs found I0531 19:52:52.309185 1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-133-36.ec2.internal", UID:"65797716-b2a8-43fc-a216-8154fecee781", APIVersion:"v1", ResourceVersion:"203498", FieldPath:""}): type: 'Warning' reason: 'ErrorUpdatingResource' error creating gateway for node ip-10-0-133-36.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-133-36.ec2.internal: unable to fetch podIPs for pod openshift-multus/network-metrics-daemon-cr75v: pod openshift-multus/network-metrics-daemon-cr75v: no pod IPs found which is happening because pod is scheduled but IP allocation has not happened and since this sync is called from watch nodes, node add does not succeed, so we have a chicken-egg problem where watch pods can't start. In reality we don't care about pods that don't have an IP because there is nothing to cleanup for that pod in that case. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 91e8a3b) (cherry picked from commit fe63418)

openshift-ci · 2023-06-01T07:05:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tssurya
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit d556bde)

tssurya · 2023-06-01T08:20:23Z

/test e2e-aws-ovn-shared-to-local-gateway-mode-migration

tssurya · 2023-06-01T08:41:27Z

/test e2e-aws-ovn-windows

openshift-ci · 2023-06-01T10:43:21Z

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-hypershift	`2390d0f`	link	false	`/test e2e-aws-ovn-hypershift`
ci/prow/e2e-vsphere-windows	`2390d0f`	link	true	`/test e2e-vsphere-windows`
ci/prow/e2e-azure-ovn	`2390d0f`	link	false	`/test e2e-azure-ovn`
ci/prow/unit	`2390d0f`	link	true	`/test unit`
ci/prow/e2e-aws-ovn-upgrade-local-gateway	`2390d0f`	link	true	`/test e2e-aws-ovn-upgrade-local-gateway`
ci/prow/e2e-aws-ovn-windows	`2390d0f`	link	true	`/test e2e-aws-ovn-windows`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

tssurya · 2023-06-01T19:56:19Z

closed in favour of #1692

npinaeva and others added 30 commits April 3, 2023 12:54

rename aclType to aclPipelineType, add aclDirection type to separate

18dcd72

intention from implementation. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Allow from node ACL:

cb4c4b3

1. Use dbIDs with new NetpolNodeOwnerType 2. Add aclSync for allow from node ACL 3. Rework AllowFromNode ACL unit tests to run everything for ipv4 and ipv6 Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

vendor: bump containernetworking/plugins to v1.2.0

ab51224

The current util.NextIP() function returns the wrong IP for IPv6 address with leading zeros. We can directly use the NextIP() function in vendor containernetworking/plugins instead. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Separate make modelgen from make codegen

fc17f7e

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Merge pull request #3334 from npinaeva/acls-new-indexes

0254d0f

Acls new indexes

Use PrimaryID as a client index for ACL.

9d216fc

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

optimize deleteStaleNetpolPeerAddrSets to iterate over all acls only …

b7afb8a

…once Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

delete getNewLocalPolicyPorts log

ffdcf1b

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Make pokeEndpoint not failing in case of error

c1cf2db

Asserting on `netexec` response to not containing errors does not allow using retries, as the test would fail during the first attempt. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

Merge pull request #3523 from dceara/fix-no-template-support

0b3196b

services: Don't try to list/cleanup templates when OVN doesn't suppor…

Merge pull request #3522 from npinaeva/netpol-perf-fixes

87b62db

Minor performance fixes

Merge pull request #3521 from npinaeva/acl-use-client-index

9e21b3a

Use PrimaryID as a client index for ACL.

Merge pull request #3505 from tssurya/fix-etp-local-e2e-flake

8e696e5

Fix e2e etp=local flake happening in LGW mode lanes

Add compact mode support to kind.sh

2d6b870

The compact mode means running ovnkube master and node in the same process. The OVN-K runs in this configuration in Microshift. Signed-off-by: Peng Liu <pliu@redhat.com>

Add compact-mode test to CI

fd69b62

Use a single node cluster with compact-mode enabled to mimic the ovnkube setup in Microshift. Signed-off-by: Peng Liu <pliu@redhat.com>

Remove the local gw node-ip migration lane from CI

e069c2b

Signed-off-by: Peng Liu <pliu@redhat.com>

Add -scm to the output of kind.sh --help

71232c9

Signed-off-by: Peng Liu <pliu@redhat.com>

Merge pull request #3511 from cathy-zhou/nextip

c9ef83b

NextIP returns the wrong IP for IPv6 address with leading zeros

pliurh and others added 19 commits April 14, 2023 22:43

Make --gateway-nexthop support dual-stack

978e4d9

Users can specify IPV4 and IPv6 gateway nexthop for dual-stack cluster with '--gateway-nexthop=<ipv4_addr>,<ipv6_addr>' Signed-off-by: Peng Liu <pliu@redhat.com>

Improve e2e test assertion message

09f7548

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

Merge pull request #3524 from dcbw/dual-single-downgrade

f46c944

clustermanager: update node annotations on dual->single stack conversion

Merge pull request #3532 from pliurh/single_node

b7c5ff7

Make --gateway-nexthop support dual-stack

Use loadbalancer.Name as client index

ad052c1

Revert predicate search optimization 3369d38 as predicate search should not bee needed anymore Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

Merge pull request #3533 from npinaeva/client-index-lb

faae59a

Use loadbalancer.Name as client index

Merge pull request #3529 from bpickard22/unsorted-lists-fix

6a119d1

Convert unsortedList() to sets.List()

Merge pull request #3525 from JacobTanenbaum/startRemoveFakeAddressSet

a053d08

Start to remove the fakeAddressSet

Detect bridge name when 'gateway-interface' is specified

c6dc8e2

If the interface specified with '--gateway-interface' is already a bridge port, use the bridge name as the gateway interface. Signed-off-by: Peng Liu <pliu@redhat.com>

Fix ETP iptables for shared gateway mode

8bd9b3e

For NodePort Services with ExternalTrafficPolicy = Local, incoming connections shouldn't be DNATted to masqueradeIP. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>

Merge pull request #3328 from zeeke/ocpbugs-4485

7bc2e16

Serve NodePort services on secondary IP addresses

Merge pull request #3537 from pliurh/single_node

f3f0f75

Detect bridge name when 'gateway-interface' is specified

Temporary fix for metallb issue

c55c557

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Merge pull request #3545 from jcaamano/metallb-fix

92f45bd

Temporary metallb fix

Merge remote-tracking branch 'ovnorg-origin/44-commit-metalb-merge' i…

8f3bf5a

…nto june-1st-downstream-merge-part-2 Conflicts: go-controller/pkg/libovsdb/libovsdb.go

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2023

openshift-ci bot requested review from dcbw and jcaamano June 1, 2023 07:05

Fix unit test for localWithFallback

2390d0f

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit d556bde)

kyrtapz mentioned this pull request Jun 1, 2023

[wip][poc] CNO with ovn-ic openshift/cluster-network-operator#1825

Closed

tssurya closed this Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] June 1st downstream merge part 2#1693

[wip] June 1st downstream merge part 2#1693
tssurya wants to merge 59 commits intoopenshift:masterfrom
tssurya:june-1st-downstream-merge-part-2

tssurya commented Jun 1, 2023

Uh oh!

openshift-ci bot commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

openshift-ci bot commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

tssurya commented Jun 1, 2023

Uh oh!

openshift-ci bot commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

openshift-ci bot commented Jun 1, 2023

Uh oh!

tssurya commented Jun 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants