Bug 2092579: [DownstreamMerge] 6-1-22 by flavio-fernandes · Pull Request #1121 · openshift/ovn-kubernetes

flavio-fernandes · 2022-06-01T20:32:31Z

@trozet @jcaamano @andreaskaris @dcbw @oribon @tssurya @ricky-rav @kyrtapz PTAL

Transactions is an all or nothing thing so we just need one wait per transaction to check against duplicates. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Running the LB predicate that matches on name takes a long time if the LB table has many LBs. For example, looking up ~40 LBs in a table with ~200k rows took aproximately 3s. The service controller has a second level cache and knows which LBs need to be added and which need to be updated. Avoid this lookup for LBs that are to be added. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

We've seen instances where the migration initially failed because the db wasn't ready yet. This change adds some retrying to the migration to handle that situation more gracefully and efficient, as failing this a couple of times currently results in getting the pod into crashlooop backoff. A sample transient error for initial failures looks like e.G. this: ``` F0511 04:31:57.928725 1 ovndbmanager.go:44] NBDB Upgrade failed: %!w(*fmt.wrapError=&{failed to get schema version for NBDB, stderr: "ovsdb-client: transaction returned error: {\"details\":\"get_schema request specifies database OVN_Northbound which is not yet available because it has not completed joining its cluster\",\"error\":\"database not available\"}\n", error: OVN command '/usr/bin/ovsdb-client -t 10 get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound' failed: exit status 1 0xc00042a160}) ``` Signed-off-by: Alvaro Aleman <alvaroaleman@users.noreply.github.com>

Update libovsdb to commit: 50ec17900991efba29734b2d11afe3dfc9ee02e0 Signed-off-by: Andreas Karis <ak.karis@gmail.com>

Extend ACL logging to EgressFirewall objects. Signed-off-by: Andreas Karis <ak.karis@gmail.com>

`exportloopref` golangci linter finds pointers exported from within for-range loop, see https://github.com/kyoh86/exportloopref for details. The linter does expose 2 violations, the one in `pkg/ovn/pods.go` is a real bug. Signed-off-by: xqu <xqu@nvidia.com>

Concurrent create transactions can result in the cache being stale for a very short period. Adding a wait for logical router policies create transactions should prevent duplicate policies in these cases where the predicate does not find the policy in time. Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>

libovsdbops: only one wait per txn

Don't lookup LBs that don't exist in cache

Add wait for logical router policies

…cl-3 EgressFirewall: Enable OVN-Kubernetes logging for egress firewall

Signed-off-by: Yun Zhou <yunz@nvidia.com>

This commit ensures node name is added into EgressIPStatusItem so that executeCloudPrivateIPConfigOps removes the egress ip from cloud config store even though assignment didn't actually happen on the node. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

fix make check error

addLogicalPort() looks up the LSP early and then looks it up again if the pod had no IP/MAC annotations. This is kinda pointless and just costs time and lock contention in libovsdb since if the port was found early we can just use that one to get the MAC/IPs. Since the only other user of GetPortAddresses() was the hybrid overlay code move the port lookup into the hybrid overlay code and just keep the address extraction code common. Signed-off-by: Dan Williams <dcbw@redhat.com>

Add node name into egress ip status for the removal

Direction of the ACLs built for multicast egress policy was always "from-lport", so we cannot determine if we need to update egress ACL direction based on the first egress ACL that found. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

pods: don't look up LSP twice

…into-pod-creation Perf: Combine perPodSNAT add and LSP add into same transact

Remove time.Tick usages since "without a way to shut it down the underlying Ticker cannot be recovered by the garbage collector; it "leaks"." [https://pkg.go.dev/time#Tick] Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

1. change time.After() usage in loops to once-created NewTicker. time.After() will create a new timer instance on every iteration and is not efficient: "The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed." [https://pkg.go.dev/time#After] 2. make sure to Stop() all tickers to release associated resources. [https://pkg.go.dev/time#NewTicker] Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

OVNDBManager: Retry migrations

KIND 0.13.0 or later is required for k8s 1.24.0 bump, as explained in https://github.com/kubernetes-sigs/kind/releases. Taking latest stable KIND release: 0.14.0 Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>

Bump KIND version to 0.14.0 to prepare for k8s 1.24.0 bump

update all egress ACLs' direction to "from-lport"

Timers fixes

Signed-off-by: Zenghui Shi <zshi@redhat.com>

Fix lflow-cache-limit-kb ovs external-id

Using ovs-dpctl can cause severe issues when there is a version mismatch between ovs-dpctl and ovs-vswitchd. To address that use ovs-appctl dpctl/* since it provides the same data that is in the same format. Signed-off-by: Patryk Diak <pdiak@redhat.com>

Use ovs-appctl dpctl/* instead of ovs-dpctl

Recent retry logic was a little too harsh on deletion when the port is not present in the annotation. Closes: openshift#2973 Signed-off-by: Flavio Fernandes <flaviof@redhat.com>

pods: deleteLogicalPort should not fail if port is already deleted

openshift-ci · 2022-06-01T20:33:10Z

@flavio-fernandes: This pull request references Bugzilla bug 2092579, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.11.0) matches configured target release for branch (4.11.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @anuragthehatter

Details

In response to this:

Bug 2092579: [DownstreamMerge] 6-1-22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-06-01T20:34:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flavio-fernandes
To complete the pull request process, please assign jacobtanenbaum after the PR has been reviewed.
You can assign the PR to them by writing /assign @jacobtanenbaum in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>

openshift-ci · 2022-06-02T01:17:14Z

@flavio-fernandes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-openstack-ovn	`44f4bb6`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-metal-ipi-ovn-ipv6	`44f4bb6`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-vsphere-windows	`44f4bb6`	link	false	`/test e2e-vsphere-windows`
ci/prow/e2e-aws-ovn-upgrade	`44f4bb6`	link	true	`/test e2e-aws-ovn-upgrade`
ci/prow/okd-e2e-gcp-ovn	`44f4bb6`	link	false	`/test okd-e2e-gcp-ovn`
ci/prow/e2e-vsphere-ovn	`44f4bb6`	link	false	`/test e2e-vsphere-ovn`
ci/prow/e2e-metal-ipi-ovn-dualstack	`44f4bb6`	link	true	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-ovn-windows	`44f4bb6`	link	true	`/test e2e-aws-ovn-windows`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

andreaskaris · 2022-06-02T10:26:43Z

I thought that after FF, we only did downstream cherry-picks? I've got a 4.12 feature in there (the egressfw logging)

Release Cycle Timeline
OpenShift SDN’s release cycle operates with a feature freeze window. During initial release cycle development, all changes are pulled downstream via a git merge of upstream. This includes features as well as fixes. Post feature freeze, only fixes will be pulled downstream. This is done via git cherry-picking rather than merge.

https://docs.google.com/document/d/11Ax17QT0FB_OMV9_ifwWX3ppoLlc3OujjAikZP3J1v4/edit#heading=h.jp6iz3u39y1l

openshift-ci · 2022-06-02T10:26:51Z

@flavio-fernandes: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

flavio-fernandes · 2022-06-02T14:24:38Z

Please ignore. We are not doing any downstream merges until 4.12 branch is created.
Closing.

openshift-ci · 2022-06-02T14:24:45Z

@flavio-fernandes: This pull request references Bugzilla bug 2092579. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 2092579: [DownstreamMerge] 6-1-22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

flavio-fernandes · 2022-06-02T14:34:23Z

I thought that after FF, we only did downstream cherry-picks? I've got a 4.12 feature in there (the egressfw logging)
Release Cycle Timeline
OpenShift SDN’s release cycle operates with a feature freeze window. During initial release cycle development, all changes are pulled downstream via a git merge of upstream. This includes features as well as fixes. Post feature freeze, only fixes will be pulled downstream. This is done via git cherry-picking rather than merge.
https://docs.google.com/document/d/11Ax17QT0FB_OMV9_ifwWX3ppoLlc3OujjAikZP3J1v4/edit#heading=h.jp6iz3u39y1l

yup; my bad! 🤦

…cceeded Scalability: Delete logical ports for completed pods

jcaamano and others added 30 commits May 3, 2022 15:05

libovsdbops: only one wait per txn

1aaad50

Transactions is an all or nothing thing so we just need one wait per transaction to check against duplicates. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Update libovsdb to latest 50ec1790099

9d7678a

Update libovsdb to commit: 50ec17900991efba29734b2d11afe3dfc9ee02e0 Signed-off-by: Andreas Karis <ak.karis@gmail.com>

EgressFirewall: Enable OVN-Kubernetes logging for egress firewall

e1f7da0

Extend ACL logging to EgressFirewall objects. Signed-off-by: Andreas Karis <ak.karis@gmail.com>

Merge pull request openshift#2963 from jcaamano/one-wait-per-txn

3b52e57

libovsdbops: only one wait per txn

Merge pull request openshift#2978 from jcaamano/dont-lookup-created-lbs

4d79168

Don't lookup LBs that don't exist in cache

Merge pull request openshift#2822 from oribon/lrp_wait

322b1b6

Add wait for logical router policies

Merge pull request openshift#2927 from andreaskaris/egress-firewall-a…

32a69e2

…cl-3 EgressFirewall: Enable OVN-Kubernetes logging for egress firewall

fix make check error

4fc8ac5

Signed-off-by: Yun Zhou <yunz@nvidia.com>

Merge pull request #2995 from cathy-zhou/checkerror

7f91f93

fix make check error

Merge pull request #2998 from pperiyasamy/delete-egress-ip-fix

fc66d17

Add node name into egress ip status for the removal

update all egress ACLs' direction to "from-lport"

bc941c2

Direction of the ACLs built for multicast egress policy was always "from-lport", so we cannot determine if we need to update egress ACL direction based on the first egress ACL that found. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Combine SNAT add into LSP add

d984217

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Merge pull request #2999 from dcbw/dont-lookup-lsp-twice

8e661f4

pods: don't look up LSP twice

Merge pull request openshift#2769 from tssurya/combine-snat-creation-…

1d976e4

…into-pod-creation Perf: Combine perPodSNAT add and LSP add into same transact

Merge pull request openshift#2983 from alvaroaleman/retries

b4c8535

OVNDBManager: Retry migrations

Merge pull request #3003 from ricky-rav/kind0140

02c43fd

Bump KIND version to 0.14.0 to prepare for k8s 1.24.0 bump

Merge pull request #2997 from cathy-zhou/egressAclDir

4cb2e82

update all egress ACLs' direction to "from-lport"

Merge pull request openshift#2979 from npinaeva/timer-fixes

62f85dd

Timers fixes

Fix lflow-cache-limit-kb ovs external-id

bb01489

Signed-off-by: Zenghui Shi <zshi@redhat.com>

Merge pull request #3005 from zshi-redhat/fix-lflow-cache

dd571cc

Fix lflow-cache-limit-kb ovs external-id

dcbw and others added 3 commits June 1, 2022 09:56

Merge pull request #3007 from kyrtapz/remove_ovs_dpctl_usage

f5e06fc

Use ovs-appctl dpctl/* instead of ovs-dpctl

pods: deleteLogicalPort should not fail if port is already deleted

871c1fb

Recent retry logic was a little too harsh on deletion when the port is not present in the annotation. Closes: openshift#2973 Signed-off-by: Flavio Fernandes <flaviof@redhat.com>

Merge pull request openshift#2974 from flavio-fernandes/fix_delete_lsp

28b5e7b

pods: deleteLogicalPort should not fail if port is already deleted

openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 1, 2022

openshift-ci bot requested review from JacobTanenbaum, abhat and anuragthehatter June 1, 2022 20:33

Merge remote-tracking branch 'upstream/master' into merge-6-1-22

44f4bb6

Signed-off-by: Flavio Fernandes <flaviof@redhat.com>

flavio-fernandes force-pushed the merge-6-1-22 branch from d8cd08d to 44f4bb6 Compare June 1, 2022 20:50

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 2, 2022

flavio-fernandes closed this Jun 2, 2022

flavio-fernandes deleted the merge-6-1-22 branch June 2, 2022 14:24

danwinship pushed a commit to danwinship/ovn-kubernetes-1 that referenced this pull request Jun 6, 2022

Merge pull request openshift#1121 from Billy99/billy99-del-logport-su…

f293c6a

…cceeded Scalability: Delete logical ports for completed pods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Bug 2092579: [DownstreamMerge] 6-1-22#1121

Bug 2092579: [DownstreamMerge] 6-1-22#1121
flavio-fernandes wants to merge 34 commits intoopenshift:masterfrom
flavio-fernandes:merge-6-1-22

flavio-fernandes commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

andreaskaris commented Jun 2, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

flavio-fernandes commented Jun 2, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

flavio-fernandes commented Jun 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Comments

Conversation

flavio-fernandes commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 1, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

andreaskaris commented Jun 2, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

flavio-fernandes commented Jun 2, 2022

Uh oh!

openshift-ci bot commented Jun 2, 2022

Uh oh!

flavio-fernandes commented Jun 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants