Skip to content

[wip] June 1st downstream merge part 2#1693

Closed
tssurya wants to merge 59 commits intoopenshift:masterfrom
tssurya:june-1st-downstream-merge-part-2
Closed

[wip] June 1st downstream merge part 2#1693
tssurya wants to merge 59 commits intoopenshift:masterfrom
tssurya:june-1st-downstream-merge-part-2

Conversation

@tssurya
Copy link
Contributor

@tssurya tssurya commented Jun 1, 2023

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

npinaeva and others added 30 commits April 3, 2023 12:54
which mean reset, not ignore.
When acls are updated on initial sync (by handling add events) if
namespace log levels were reset, acls should also be updated with nil
Severity.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
intention from implementation.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Add new BuildACLFromDbIDs function, that should be the only one used in
the end of acls indexes update.

Add 2 owner Types for global and namespaced multicast resources.
Generate ACL name based on dbIDs, don't include owner information to
save some symbols. Add (objectIDs *DbObjectIDs) StringNoOwner() for
this purpose.

Add cleanup function for disabled multicast support and namespaced
multicast cleanup.

Move noneMatch to the gress_policy.go, where it is used.

Add tests for multicast acl sync and cleanup.
Rename address_set_syncer package to external_ids_syncer, add acl syncer
sub-package.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
1. Use dbIDs with new NetpolNodeOwnerType
2. Add aclSync for allow from node ACL
3. Rework AllowFromNode ACL unit tests to run everything for ipv4 and
ipv6

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Move gress-related constants from policy.go to gress_policy.go.
Use old externalIDs for syncs only.

policy test "correctly retries recreating a network policy with the
same name" last expected data was updated, since newly generated ACLs

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Add NetpolNamespaceOwnerType for namespace-wide default deny acls.

Add acl_sync PrimaryID check to make sure in case more than 1 ACLs have
the same primaryID, only of them will be updated.
This also eliminates the need for "deleting a network policy that
failed half-way through creation succeeds" test, which is based on a
"found multiple results for provided predicate" condition.

Replace tests
"stale ACLs should be cleaned up or updated at startup via
syncNetworkPolicies" and
"ACLs with long names and run syncNetworkPolicies"
with the new "reconciles an existing networkPolicy updating stale
ACLs with long names", and a part that deletes stale ARP ACLs is now
updated and tested as a part of acl_sync.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Add EgressFirewallOwnerType for egress firewall ACLs.

Update syncEgressFirewall to use passed egressFirewalls objects for
cleanup, create updateEgressFirewallACLsDbIndex function that updates
old formatted ACLs.
Update egressfirewall_test.go, move a part of setup to BeforeEach.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
1. rename BuildACLFromDbIndex to BuildACL, since it is the only method
that should be used now.
2. Rename getACLMatchFromACLDir to getACLMatch, remove stale version.
3. update isEquivalentACL to checkACLPrimaryID, that matches on primaryID
until we get client indexes.
4. Add acls.md doc explaining all acls that are used by ovn-k and their
dependencies with examples
5. Update multicast docs

Add missing links to README.md

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Now Egress firewall ACLs are named "EF:<namespace>:<priority>"
default deny netpol ACLs "NP:<namespace>:<direction>"
gress ACLs "NP:<policyNamespace>:<policyName>:<direction>:<gressIdx>"

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
The current util.NextIP() function returns the wrong IP for IPv6 address
with leading zeros. We can directly use the NextIP() function in vendor
containernetworking/plugins instead.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
…once

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
…t that.

Spotted in upstream ovn-org/ovn CI when running against ovn versions
<=22.09 which don't support component templates:

https://github.com/ovn-org/ovn/actions/runs/4628617882

Reported error:
  failed to sync chassis: error: failed to get template var list: error:
  Wrong parameter type (*nbdb.ChassisTemplateVar): Model not found in
  Database Model

Fixes: 4b3475a ("services: Use OVN template load balancers for NodePort services.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Asserting on `netexec` response to not containing errors
does not allow using retries, as the test would fail during
the first attempt.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
services: Don't try to list/cleanup templates when OVN doesn't suppor…
Use PrimaryID as a client index for ACL.
For unknown reasons, during the kube rebase, we just
decided to flip from sorting the list of endpoints to
not sorting it? : 3850450#diff-d3aa58d9b58a0a09264f072df46ab01d0501eb508c4656411ae2dc1ac68fb3c4L579
The allocateLBNodePorts feature depends on the list
being sorted, else our logic is skewed and the random
probability model we use to generate iptables gets skewed up
as well. So during endpoints update and such we will behave
badly and have left over DNATs towards pods that don't
exist anymore which will lead to black-holing the traffic.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Fix the formatting of the log messages (%w is only valid when
creating new errors, not when logging) and update the master
test suite name since it hasn't been only gateway init operations
for a long time.

Signed-off-by: Dan Williams <dcbw@redhat.com>
the fakeAddresset code was originally created so that during testing we
could directly access and figure out what ip addresses where added to
addressSets because we could not access an ovn northbound database.
Since the introduction of libovsdb we can use the fake database that is
provided for us instread of the the fakeAddressSetFactory and
fakeAddressSet

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
Fix e2e etp=local flake happening in LGW mode lanes
The compact mode means running ovnkube master and node in the same
process. The OVN-K runs in this configuration in Microshift.

Signed-off-by: Peng Liu <pliu@redhat.com>
Use a single node cluster with compact-mode enabled to mimic the
ovnkube setup in Microshift.

Signed-off-by: Peng Liu <pliu@redhat.com>
Signed-off-by: Peng Liu <pliu@redhat.com>
Signed-off-by: Peng Liu <pliu@redhat.com>
NextIP returns the wrong IP for IPv6 address with leading zeros
…tion

Remove the no-longer-used node host subnet annotation to allow the
default network controller's node code to proceed.

Also redo how the default network controller counts host subnets
to make sure we actually have the right number of each IP family,
rather than just counting numbers. Because the allocation handling
was split out into the ClusterManager, the network controller's
addNode() no longer has the guarantees of a correct set of annotations
before it runs, as it and the cluster manager run in parallel.

Signed-off-by: Dan Williams <dcbw@redhat.com>
pliurh and others added 19 commits April 14, 2023 22:43
Users can specify IPV4 and IPv6 gateway nexthop for dual-stack cluster
with '--gateway-nexthop=<ipv4_addr>,<ipv6_addr>'

Signed-off-by: Peng Liu <pliu@redhat.com>
Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
When a node gets a new IP addresses, node loadbalancers
should be updated to serve NodePort services on the new
address.

Update node_tracker.go to track both `host-addresses` and
`l3-gateway-config` IP addresses, as they are used in different
ways for building load balancers.

Add End2End test to control-plane suite.

Update load balancer unit tests to include a multi address node.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
During the kube 1.26 bump the sets[type].List() call was deprecated and
replaced with unsortedList(). This could cause some flakes as we
depend on the order of the set in some cases. This fix updates the
unsortedList calls to use the sets List function

Signed-off-by: Ben Pickard <bpickard@redhat.com>

s

s
clustermanager: update node annotations on dual->single stack conversion
Make --gateway-nexthop support dual-stack
Revert predicate search optimization
3369d38 as predicate search should not
bee needed anymore

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Use loadbalancer.Name as client index
If the interface specified with '--gateway-interface' is already
a bridge port, use the bridge name as the gateway interface.

Signed-off-by: Peng Liu <pliu@redhat.com>
Scalable Functions (SFs) are "lightweight" Virtual Functions (VFs),
meaning they are netdevices that implement representor model and can be
used as container network interfaces. Unlike of VFs, SFs are Auxiliary
devices and located on the respective system bus.

Change the current code base to also accomodate the use of SF Auxiliary
devices instead of only VF PCI devices for containers.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Hareesh Puthalath <hareeshp@nvidia.com>
For NodePort Services with ExternalTrafficPolicy = Local,
incoming connections shouldn't be DNATted to masqueradeIP.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
Serve NodePort services on secondary IP addresses
Detect bridge name when 'gateway-interface' is specified
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
…nto june-1st-downstream-merge-part-2

Conflicts:
	go-controller/pkg/libovsdb/libovsdb.go
In CI we are observing:
E0531 19:52:52.309098       1 obj_retry.go:627] Failed to update *v1.Node, old=ip-10-0-133-36.ec2.internal, new=ip-10-0-133-36.ec2.internal, error: error creating gateway for node ip-10-0-133-36.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-133-36.ec2.internal: unable to fetch podIPs for pod openshift-multus/network-metrics-daemon-cr75v: pod openshift-multus/network-metrics-daemon-cr75v: no pod IPs found
I0531 19:52:52.309185       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-133-36.ec2.internal", UID:"65797716-b2a8-43fc-a216-8154fecee781", APIVersion:"v1", ResourceVersion:"203498", FieldPath:""}): type: 'Warning' reason: 'ErrorUpdatingResource' error creating gateway for node ip-10-0-133-36.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-133-36.ec2.internal: unable to fetch podIPs for pod openshift-multus/network-metrics-daemon-cr75v: pod openshift-multus/network-metrics-daemon-cr75v: no pod IPs found

which is happening because pod is scheduled but IP allocation
has not happened and since this sync is called from watch nodes,
node add does not succeed, so we have a chicken-egg problem
where watch pods can't start.

In reality we don't care about pods that don't have
an IP because there is nothing to cleanup for that pod in
that case.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
(cherry picked from commit 91e8a3b)
(cherry picked from commit fe63418)
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2023
@openshift-ci openshift-ci bot requested review from dcbw and jcaamano June 1, 2023 07:05
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 1, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tssurya
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit d556bde)
@tssurya
Copy link
Contributor Author

tssurya commented Jun 1, 2023

/test e2e-aws-ovn-shared-to-local-gateway-mode-migration

@tssurya
Copy link
Contributor Author

tssurya commented Jun 1, 2023

/test e2e-aws-ovn-windows

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 1, 2023

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift 2390d0f link false /test e2e-aws-ovn-hypershift
ci/prow/e2e-vsphere-windows 2390d0f link true /test e2e-vsphere-windows
ci/prow/e2e-azure-ovn 2390d0f link false /test e2e-azure-ovn
ci/prow/unit 2390d0f link true /test unit
ci/prow/e2e-aws-ovn-upgrade-local-gateway 2390d0f link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/e2e-aws-ovn-windows 2390d0f link true /test e2e-aws-ovn-windows

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tssurya
Copy link
Contributor Author

tssurya commented Jun 1, 2023

closed in favour of #1692

@tssurya tssurya closed this Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.