[release-4.18] OCPBUGS-48710: DownStream Merge Sync from 4.19 [04-03-2025]#2507
[release-4.18] OCPBUGS-48710: DownStream Merge Sync from 4.19 [04-03-2025]#2507jluhrsen wants to merge 237 commits into
Conversation
The document is refactored to meet the ovn-org feature template [0]. [0] https://github.com/ovn-org/ovn-kubernetes/blob/master/docs/features/template.md Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
This picks up the following relevant bug fixes: https://issues.redhat.com/browse/FDP-906 "ovn-controller: lib/ovsdb-idl.c:3596: assertion row->new_datum != NULL failed in ovsdb_idl_txn_write__()" 6448f5e364 pinctrl: Skip non-local mac bindings in run_buffered_binding(). ea35347320 pinctrl: Skip deleted mac bindings in run_buffered_binding(). 33a6ae53f4 pinctrl: Use correct map size in pinctrl_handle_put_fdb(). 8eaa7d5991 controller: Fix "use after free" issue in statctrl_run(). 8579859f51 mac-cache: Properly handle deletion of SB mac_bindings. https://issues.redhat.com/browse/FDP-752 "ovn-northd IPAM incorrectly reports duplicate IP when part of excluded_ips" 2a24b03f7f ipam: Do not report error for static assigned IPs. https://issues.redhat.com/browse/FDP-786 "When an ECMP symmetric route is removed, northd removes all logical flows from SBDB for ECMP" 7b00627433 northd: Respect --ecmp-symmetric-reply for single routes. Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: arkadeepsen <arsen@redhat.com>
OCPBUGS-42707: Bump OVN to ovn24.09-24.09.1-10.el9fdp
Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
This reverts commit 1243011.
OCPBUGS-48330,OCPBUGS-42609,OCPBUGS-46585,SDN-4930: Downstream Merge [01-23-2025]
Fixes a null pointer exception when network policy port has no protocol. If the protocol is missing in the network policy port definition, it should be assumed to be TCP. Signed-off-by: Tim Rozet <trozet@redhat.com>
ShallowClone has to copy all factories. Signed-off-by: Patryk Diak <pdiak@redhat.com>
Commit 6dda0b5 ("factory: Bump the event queue size to 1K.") increased the event queue size to 1K events. However, in combination with fe17136 ("factory: Reduce contention on informer locks.") which configures 201 internal informers this might end up using too much memory in cases when controllers cannot consume events as fast as they're queued by the kube API. For each kubernetes API object type we consume: N_internal_informers x N_queues x N_events x sizeof(event) memory. That currently translates to: N_internal_informers = 201 N_queues = 15 N_events = 1000 sizeof(event) = 32B => ~92MB of memory per object type Given that ovn-kubernetes processes need to be informed about multiple object types this can grow to a significantly large number when controllers that are supposed to consume events from the internal informer queues are slow. Reduce the queue size, making it 100, in order to lower the worst case scenario memory usage: N_internal_informers = 201 N_queues = 15 N_events = 100 sizeof(event) = 32B => ~9.2MB of memory per object type Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Clone raFactory in ShallowClone
Signed-off-by: Patryk Diak <pdiak@redhat.com>
Clone frrFactory in ShallowClone
factory: Set default event queue size to 100.
Previously, if a new NAD was added to an existing network after a pod referencing it, the pod would never start. This is fixed by reconciling pending pods when the secondary network controller reconciles a new NAD. Signed-off-by: Patryk Diak <pdiak@redhat.com>
Fix doc: Replace ovn-org with ovn-kubernetes to reflect repo move
Reconcile pending pods when a NAD is added to an existing network
Fixes NPE seen at: openshift#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
Fixes NPE seen at: openshift#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
SDN-4930: Downstream Merge [01-28-2025]
Compare annotations directly if possible. For network specific map entries only compare raw json entries without parsing the map in full. Co-authored-by: Tim Rozet <trozet@redhat.com> Signed-off-by: Patryk Diak <pdiak@redhat.com>
Instead of always parsing all node/join subnets parse the raw json map and only compute the results for the affected network. Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
|
/hold |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jluhrsen The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/payload 4.18 ci blocking |
|
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/98520a70-10d4-11f0-8675-e7786abf1ec6-0 trigger 11 job(s) of type blocking for the nightly release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/98520a70-10d4-11f0-8675-e7786abf1ec6-1 |
…rom-4.19-04-03-2025-has-dups 3 fixes needed in both pkg/clustermanager/network_cluster_controller.go and pkg/clustermanager/zone_cluster_controller.go a function FilterOutResource() was duplicated. the duplicate was removed in pkg/ovn/base_network_controller.go, several import statements were duplicated and one more was not needed. dups and uneccessary imports were removed
bc87a6f to
f7f731f
Compare
|
/retest |
|
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/076bdd10-11ce-11f0-92e5-f93319f4b8e0-0 trigger 11 job(s) of type blocking for the nightly release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/076bdd10-11ce-11f0-92e5-f93319f4b8e0-1 |
|
e2e looks good. 3 payload jobs with failures I want to double check is all: /payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade |
|
@jluhrsen: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/aea15470-13cc-11f0-89df-aacf2a9dfa41-0 |
|
FTR I don't mind whether we do duplicate commits OR do a rebase. I want us back to old state asap. @jcaamano looks like you want to do it the duplicate commits way? Does this PR satisfy what you wanted and if so can you lgtm? |
yes, @jcaamano , this is good now. the payloads were clear. the only job that didn't pass the 2nd time through was 4.18-e2e-aws-upgrade-ovn-single-node which failed for two different reasons now and really only passes like 50% of the time anyway. let's get this in. |
|
/hold cancel this looks like the route we want to take. I will close the other idea. |
|
/test e2e-gcp-ovn |
|
For |
|
/test e2e-metal-ipi-ovn-dualstack |
|
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
good 👀 . that happened with the revert commits to try to skip the ART bumps that went in to 4.19. I have a plan to get this right and will update this PR soon. for now... |
|
lets use #2512 now |
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-48710. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
THIS HAS DUPLICATE COMMITS
this is a git merge -X theirs release-4.19 which contains 53 duplicate commits that came in with
the cherry-pick bot during the GA time crunch for 4.18 as well as any one-off commits that were let
in.
as a proof of concept, you can see this same PR merged in my fork. and a new commit was made
on the 4.19 branch (in my fork) which you can see was cleanly added in a new git merge in this
PR.
here are the 53 dups: