SDN-4930: Downstream Merge [01-28-2025]#2427
SDN-4930: Downstream Merge [01-28-2025]#2427openshift-merge-bot[bot] merged 14 commits intoopenshift:masterfrom
Conversation
Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
ShallowClone has to copy all factories. Signed-off-by: Patryk Diak <pdiak@redhat.com>
Commit 6dda0b5 ("factory: Bump the event queue size to 1K.") increased the event queue size to 1K events. However, in combination with fe17136 ("factory: Reduce contention on informer locks.") which configures 201 internal informers this might end up using too much memory in cases when controllers cannot consume events as fast as they're queued by the kube API. For each kubernetes API object type we consume: N_internal_informers x N_queues x N_events x sizeof(event) memory. That currently translates to: N_internal_informers = 201 N_queues = 15 N_events = 1000 sizeof(event) = 32B => ~92MB of memory per object type Given that ovn-kubernetes processes need to be informed about multiple object types this can grow to a significantly large number when controllers that are supposed to consume events from the internal informer queues are slow. Reduce the queue size, making it 100, in order to lower the worst case scenario memory usage: N_internal_informers = 201 N_queues = 15 N_events = 100 sizeof(event) = 32B => ~9.2MB of memory per object type Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Clone raFactory in ShallowClone
|
@kyrtapz: This pull request references SDN-4930 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test ? |
|
@kyrtapz: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Patryk Diak <pdiak@redhat.com>
|
/hold |
Clone frrFactory in ShallowClone
b2565a4 to
42dc7f2
Compare
|
/test e2e-metal-ipi-ovn-ipv4-bgp-techpreview |
|
/retest |
factory: Set default event queue size to 100.
Previously, if a new NAD was added to an existing network after a pod referencing it, the pod would never start. This is fixed by reconciling pending pods when the secondary network controller reconciles a new NAD. Signed-off-by: Patryk Diak <pdiak@redhat.com>
|
/retest |
|
extra tests... /test e2e-metal-ipi-ovn-ipv6-techpreview also, let's get a look at payload jobs: /payload 4.19 ci blocking |
|
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b852b640-dda7-11ef-97d7-67b21c37cb57-0 trigger 14 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b852b640-dda7-11ef-97d7-67b21c37cb57-1 |
Fix doc: Replace ovn-org with ovn-kubernetes to reflect repo move
Reconcile pending pods when a NAD is added to an existing network
|
let's do these again now that this got updated with a few more commits: /test e2e-metal-ipi-ovn-ipv6-techpreview also, let's get a look at payload jobs: /payload 4.19 ci blocking |
|
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/933cc5d0-ddd9-11ef-9127-a9e908a896c4-0 trigger 14 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/933cc5d0-ddd9-11ef-9127-a9e908a896c4-1 |
|
/label acknowledge-critical-fixes-only |
Fixes NPE seen at: openshift/ovn-kubernetes#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
Fixes NPE seen at: openshift#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
|
/test e2e-metal-ipi-ovn-ipv6-techpreview |
|
/test e2e-metal-ipi-ovn-ipv4-bgp-techpreview |
|
/hold cancel |
|
/approve |
1 similar comment
|
/approve |
|
/test e2e-aws-ovn-hypershift-conformance-techpreview |
|
/test e2e-vsphere-ovn |
|
I didn't see any UDN test case failures or flakes in any of the techpreview jobs. the BGP job still has another hour or two before we know if that panic is fixed. I'm not sure what to think of these 1, 2 jobs that had disruption test case failures which are the same in both. but I'm not sure it's something very stable according to this sippy link. re-running: /test 4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade this job had a networking related failure and that test case seems very stable, although it has failed a couple of times in the past month, I didn't see this specific NURP with a failure. I'm re-running it again now just to make sure it comes back clean. don't want to sneak a regression in at this point. /test e2e-azure-ovn-upgrade I would think that if all of the above re-runs come back clean it is likely safe to add the lgtm here and let this in. |
Fixes NPE seen at: openshift/ovn-kubernetes#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, knobunc, kyrtapz, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
|
/override ci/prow/e2e-aws-ovn-windows |
|
@trozet: Overrode contexts on behalf of trozet: ci/prow/e2e-aws-ovn-windows, ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
dcb1b19
into
openshift:master
|
/cherry-pick release-4.18 |
|
@kyrtapz: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@trozet: new pull request created: #2428 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes NPE seen at: openshift#2427 (comment) Certain network types may not have a pod handler or retry framework for cluster manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
|
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-base |
|
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-microshift |
|
[ART PR BUILD NOTIFIER] Distgit: ose-ovn-kubernetes |
No description provided.