Bug 2092579: [DownstreamMerge] 6-1-22#1121
Bug 2092579: [DownstreamMerge] 6-1-22#1121flavio-fernandes wants to merge 34 commits intoopenshift:masterfrom
Conversation
Transactions is an all or nothing thing so we just need one wait per transaction to check against duplicates. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Running the LB predicate that matches on name takes a long time if the LB table has many LBs. For example, looking up ~40 LBs in a table with ~200k rows took aproximately 3s. The service controller has a second level cache and knows which LBs need to be added and which need to be updated. Avoid this lookup for LBs that are to be added. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
We've seen instances where the migration initially failed because the db
wasn't ready yet. This change adds some retrying to the migration to
handle that situation more gracefully and efficient, as failing this a
couple of times currently results in getting the pod into crashlooop
backoff.
A sample transient error for initial failures looks like e.G. this:
```
F0511 04:31:57.928725 1 ovndbmanager.go:44] NBDB Upgrade failed: %!w(*fmt.wrapError=&{failed to get schema version for NBDB, stderr: "ovsdb-client: transaction returned error: {\"details\":\"get_schema request specifies database OVN_Northbound which is not yet available because it has not completed joining its cluster\",\"error\":\"database not available\"}\n", error: OVN command '/usr/bin/ovsdb-client -t 10 get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound' failed: exit status 1 0xc00042a160})
```
Signed-off-by: Alvaro Aleman <alvaroaleman@users.noreply.github.com>
Update libovsdb to commit: 50ec17900991efba29734b2d11afe3dfc9ee02e0 Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Extend ACL logging to EgressFirewall objects. Signed-off-by: Andreas Karis <ak.karis@gmail.com>
`exportloopref` golangci linter finds pointers exported from within for-range loop, see https://github.com/kyoh86/exportloopref for details. The linter does expose 2 violations, the one in `pkg/ovn/pods.go` is a real bug. Signed-off-by: xqu <xqu@nvidia.com>
Concurrent create transactions can result in the cache being stale for a very short period. Adding a wait for logical router policies create transactions should prevent duplicate policies in these cases where the predicate does not find the policy in time. Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
libovsdbops: only one wait per txn
Don't lookup LBs that don't exist in cache
Add wait for logical router policies
…cl-3 EgressFirewall: Enable OVN-Kubernetes logging for egress firewall
Signed-off-by: Yun Zhou <yunz@nvidia.com>
This commit ensures node name is added into EgressIPStatusItem so that executeCloudPrivateIPConfigOps removes the egress ip from cloud config store even though assignment didn't actually happen on the node. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
fix make check error
addLogicalPort() looks up the LSP early and then looks it up again if the pod had no IP/MAC annotations. This is kinda pointless and just costs time and lock contention in libovsdb since if the port was found early we can just use that one to get the MAC/IPs. Since the only other user of GetPortAddresses() was the hybrid overlay code move the port lookup into the hybrid overlay code and just keep the address extraction code common. Signed-off-by: Dan Williams <dcbw@redhat.com>
Add node name into egress ip status for the removal
Direction of the ACLs built for multicast egress policy was always "from-lport", so we cannot determine if we need to update egress ACL direction based on the first egress ACL that found. Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
pods: don't look up LSP twice
…into-pod-creation Perf: Combine perPodSNAT add and LSP add into same transact
Remove time.Tick usages since "without a way to shut it down the underlying Ticker cannot be recovered by the garbage collector; it "leaks"." [https://pkg.go.dev/time#Tick] Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
1. change time.After() usage in loops to once-created NewTicker. time.After() will create a new timer instance on every iteration and is not efficient: "The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed." [https://pkg.go.dev/time#After] 2. make sure to Stop() all tickers to release associated resources. [https://pkg.go.dev/time#NewTicker] Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
OVNDBManager: Retry migrations
KIND 0.13.0 or later is required for k8s 1.24.0 bump, as explained in https://github.com/kubernetes-sigs/kind/releases. Taking latest stable KIND release: 0.14.0 Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Bump KIND version to 0.14.0 to prepare for k8s 1.24.0 bump
update all egress ACLs' direction to "from-lport"
Timers fixes
Signed-off-by: Zenghui Shi <zshi@redhat.com>
Fix lflow-cache-limit-kb ovs external-id
Using ovs-dpctl can cause severe issues when there is a version mismatch between ovs-dpctl and ovs-vswitchd. To address that use ovs-appctl dpctl/* since it provides the same data that is in the same format. Signed-off-by: Patryk Diak <pdiak@redhat.com>
Use ovs-appctl dpctl/* instead of ovs-dpctl
Recent retry logic was a little too harsh on deletion when the port is not present in the annotation. Closes: openshift#2973 Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
pods: deleteLogicalPort should not fail if port is already deleted
|
@flavio-fernandes: This pull request references Bugzilla bug 2092579, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: flavio-fernandes The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
d8cd08d to
44f4bb6
Compare
|
@flavio-fernandes: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
I thought that after FF, we only did downstream cherry-picks? I've got a 4.12 feature in there (the egressfw logging) |
|
@flavio-fernandes: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Please ignore. We are not doing any downstream merges until 4.12 branch is created. |
|
@flavio-fernandes: This pull request references Bugzilla bug 2092579. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
yup; my bad! 🤦 |
…cceeded Scalability: Delete logical ports for completed pods
@trozet @jcaamano @andreaskaris @dcbw @oribon @tssurya @ricky-rav @kyrtapz PTAL