CORENET-5625, OCPBUGS-54245, SDN-5772: Downstream merge 2025-03-21#2501
Conversation
For host networking, external bridge acts as the input/output port
with Node IP configured on the bridge itself as a local port.
When hardware acceleration capable devices, like ConnectX or
Bluefield2 cards are used, pods can use hardware accelerated
Virtual Functions (VFs) or SubFunctions(SFs) as interfaces,
and fully offload all kubernetes traffic flows.
But for host networking pods or when the host itself is the traffic
endpoint, not all kubernetes flows are accelerated since current
CT infrastructure cannot offload CT flows where external bridge
is the in/out port.
To allow accelerated traffic flows for host networking, this patch
allows specifying a gateway accelerated interface via the
`--gateway-accelerated-interface` flag. This can either be a
switchdev VF or SF, connected to the external bridge and holding
the Node IP.
┌──────────┐
│ br-ext │
┌─────┴──┐ │ ┌──────────┐
│ eth0 │ │ │ br-int │
└─────┬──┘ │ │ │
│ X────X │
┌────────┐ ┌─────┴──┐ │ │ │
│ eth0v0 ├─────┤ eth0_0 │ │ │ │
└────────┘ └─────┬──┘ │ └──────────┘
NODE_IP │ │
└──────────┘
where, eth0v0 and eth0_0 are, for ex., VF and VF representor of eth0 uplink.
Note that used netdevice must be excluded from device plugin pools,
so it won't be used for workload pods.
This flag should be used mutually exclusive to the existing
gateway option `--gateway-interface` flag.
Signed-off-by: Hareesh Puthalath <hareeshp@nvidia.com>
Use accelerated device as Gateway interface
If MultiProtocol is enabled (default) then a BGP session carries prefixes of both IPv4 and IPv6 families. Our problem is that with an IPv4 session, FRR can incorrectly pick the masquerade IPv6 address (instead of the real address) as next hop for IPv6 prefixes and that won't work. Note that with a dedicated IPv6 session that can't happen since FRR will use the same address that was used to stablish the session. Let's enforce the use of DisableMP for now. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
On every node update we were syncing the node in cluster manager. While there were checks in place to limit updating the node annotation, there were not checks in place to limit the other functionality (like marking subnets allocated). This code would execute everytime, which would spam the logs with messages like: 2025-02-14T01:25:53.598240753Z I0214 01:25:53.598230 1 node_allocator.go:510] Allowed existing subnets [10.132.5.0/24] on node ip-10-0-58-12.us-west-2.compute.internal 2025-02-14T01:25:53.598305025Z I0214 01:25:53.598279 1 node_allocator.go:510] Allowed existing subnets [10.132.8.0/24] on node ip-10-0-114-225.us-west-2.compute.internal 2025-02-14T01:25:53.598305025Z I0214 01:25:53.594125 1 node_allocator.go:488] Valid subnet 10.132.21.0/24 allocated on node ip-10-0-58-12.us-west-2.compute.internal 2025-02-14T01:25:53.598305025Z I0214 01:25:53.594137 1 node_allocator.go:488] Valid subnet 10.132.28.0/24 allocated on node ip-10-0-58-12.us-west-2.compute.internal 2025-02-14T01:25:53.598305025Z I0214 01:25:53.594143 1 node_allocator.go:488] Valid subnet 10.132.4.0/24 allocated on node ip-10-0-58-12.us-west-2.compute.internal 2025-02-14T01:25:53.598305025Z I0214 01:25:53.594148 1 node_allocator.go:488] Valid subnet 10.132.6.0/24 allocated on node ip-10-0-58-12.us-west-2.compute.internal This floods the log. The "Valid subnet" just happens when the subnet is marked as allocated. It doesn't mean anything new was allocated. Removed this log. Allowed existing subnets message jsut means the existing subnets on the node were already allocated. These log messages also dont reference network name, so they are pretty useless. Logs remain which indicate if new subnets were allocated and for what network. Additionally we dont need to run the update logic if the node was already sync'ed on node add. Once the node is allocated, nothing changes on the node that would force us to need to allocate again (other than a node going from hybrid overlay -> ovn). Added a sync map to track if a node needs to be updated again. Finally, simplified some of the logic in the sync node network annotations. No need to annotate the network id on the node unless it already existed and is somehow incorrect. Also only release the tunnel ID if it was allocated and failed to be annotated. Signed-off-by: Tim Rozet <trozet@redhat.com>
On every node event, ZCC will call kube patch. Reduce it to a single time. Before patch: trozet@fedora:~/go/src/github.com/ovn-org/ovn-kubernetes/go-controller$ go test -mod=vendor -v ./pkg/clustermanager -ginkgo.v -ginkgo.focus=".*Node subnet allocations.*Linux nodes$" | grep -i "setting annotations" I0218 11:18:14.168191 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:3] on node node1 I0218 11:18:14.168187 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:2] on node node2 I0218 11:18:14.168200 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:4] on node node3 I0218 11:18:14.168964 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:3] on node node1 I0218 11:18:14.169120 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:4] on node node3 I0218 11:18:14.169152 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:2] on node node2 I0218 11:18:14.169395 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:3] on node node1 I0218 11:18:14.169430 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:2] on node node2 I0218 11:18:14.169492 310203 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:4] on node node3 After patch: trozet@fedora:~/go/src/github.com/ovn-org/ovn-kubernetes/go-controller$ go test -mod=vendor -v ./pkg/clustermanager -ginkgo.v -ginkgo.focus=".*Node subnet allocations.*Linux nodes$" | grep -i "setting annotations" I0218 11:28:16.991114 338949 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:2] on node node2 I0218 11:28:16.991133 338949 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:4] on node node3 I0218 11:28:16.991130 338949 kube.go:130] Setting annotations map[k8s.ovn.org/node-id:3] on node node1 Signed-off-by: Tim Rozet <trozet@redhat.com>
Out of an abundance of caution, check that a node has annotations before skipping it during the update event. The only reasons I can think of this being necessary is if: 1. we missed an add event (kapi informer bug) 2. someone deleted the annotation on the node For context: Our other handlers on the ovnkube-controller side I don't think handle the above scenarios correctly. For example, we check if the gatewayInit failed only in the sync map for the node event handler, and if it did not we ignore the update. We would never have processed it if we missed the add event and therefore could result in perma fail. Signed-off-by: Tim Rozet <trozet@redhat.com>
Function was updated for node network controllers but was not for zone network controllers. It needs to try to find the networkID from the NAD first instead of the nodes. Signed-off-by: Tim Rozet <trozet@redhat.com>
RouteAdvertisements: fail if DisableMP is unset
The networkID is stored in the NAD itself, and the network manager code in OVNK will refuse the start the network controller if it does not have the networkID. For backwards compatibility, when the NAD syncAll happens it checks for the networkID on a node and then copies it as well to the NAD in case it was missing previously. There were stale functions in these network controllers that were relying on setting a cached struct value of networkID, derived from either the NAD or from the annotation on nodes at runtime. This is duplicate information as the controllers all hold a reference to the NAD itself, which is updated through network controller reconicliation. This commit removes controller struct variables that store networkID, and instead rely on the embedded NAD to get it. Also, removes network controllers looking up networkID from nodes. The controllers should all have the networkID on start up derived from the associated NAD. Signed-off-by: Tim Rozet <trozet@redhat.com>
Just make them consistent. Signed-off-by: Tim Rozet <trozet@redhat.com>
InvalidID was being used for both networkID and tunnelID. noID was previously used for just tunnelID and I overloaded it to be used for networkID as well. This was not a great choice as it causes even more confusion because noID (value 0) has the same value as DefaultNetworkID. This commit refactors the variables and moves them into our global constants file. It changes noID to be noTunnelID and declares DefaultNetworkID there in a single place. It also creates a noNetworkID with a value that doesn't collide with DefaultNetworkID. Now logically the code should be much easier to read. Also removes a function and unit test that are no longer needed. Signed-off-by: Tim Rozet <trozet@redhat.com>
Limit cluster manager node allocator updates/logs
We add the current host as a printerColumn to have a nicer way to understand which node is hosting the service: ``` $ kubectl get egressservice NAME ASSIGNED HOST example-service ovn-worker ``` Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
EgressService: add additionalPrinterColumn for .status.host
GetActiveNetworkForNamespaceFast returns the primary network for the namespace if any or the default network otherwise. It is faster than GetActiveNetworkForNamespace because it does not copy the network and it does not verify against UDNs. To be used by controllers capable of reconciling primary network changes. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Add support to advertise EIPs for UDNs in cluster manager RouteAdvertisements controller. Selected Egress IPs are those that - are served on the same namespaces as where the selected networks are serving, and - are assigned to a selected node - are on the default network subnet for that node Egress IPs, just as with Pod IPs, will be advertised on routers on the target VRF on the selected nodes. `auto` is not supported as target VRF for Egress IPs. Better support for Egress IPs on subnets other that the default network node subnet, including any support for VRF-Lite interface subnets, is left for a future exercise. We would need cluster manager to be able to: - map non VRF-Lite interface subnets to the proper BGP sessions - tell apart VRF-Lite interface subnets from other secondary interface subnets Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
In 1.0.1 `endPort` support was added. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
For example, to focus on a given test here is what I write (exact string): Multi Homing multiple pods connected to the same OVN-K secondary network multi-network policies multi-network policies configure traffic allow lists for a pure L2 overlay when the multi-net policy describes the allow-list using pod selectors Now it will be: Multi Homing multiple pods connected to the same OVN-K secondary network with multi-network policies that configure traffic allow lists using pod selectors for a pure L2 overlay Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Add support to advertise EIPs for UDNs
Update community meeting timing and platform details
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Split the NAD Spec generation from the NAD generation. This will be useful in future commits when only the NAD.spec will need to be patched. Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add test to check MTU on pod is updated both before and after NAD reconcile. Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add test that changes the available IP allocation to a specific range, then make sure a new pod follows this new restrictions. Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add tests that make sure that: - the N/S connectivity is broken after NAD updating the VLAN-ID. - the N/S connectivity is restored after the server networking is reconfigured to the new VLAN-ID. Signed-off-by: Ram Lavi <ralavi@redhat.com>
awesome-pages became awesome-nav
The KubeVirt version v1.5.0 is breaking tcp connections at live migration during our e2e tests, this change ping kubevirt to last known good version v1.4.0 https://github.com/kubevirt/kubevirt/releases/tag/v1.5.0 Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
When there are no available IP addresses in the IP pool, there is no indication sent to the pod, and it ends up hanging with the generic warning event: failed to get pod annotation. Adding an event indicating the lack of available IP in the pool as the cause for the failure. Signed-off-by: Ram Lavi <ralavi@redhat.com>
|
/retest |
|
/retitle OCPBUGS-54245, SDN-5772: Downstream merge 2025-03-21 |
|
@jcaamano: This pull request references Jira Issue OCPBUGS-54245, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references SDN-5772 which is a valid jira issue. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/jira refresh |
|
@maiqueb: This pull request references Jira Issue OCPBUGS-54245, which is invalid:
Comment This pull request references SDN-5772 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@maiqueb: This pull request references Jira Issue OCPBUGS-54245, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (ysegev@redhat.com), skipping review request. This pull request references SDN-5772 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jcaamano: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@jcaamano: Jira Issue OCPBUGS-54245: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-54245 has been moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-base |
|
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-microshift |
|
[ART PR BUILD NOTIFIER] Distgit: ose-ovn-kubernetes |
|
Fix included in accepted release 4.19.0-0.nightly-2025-04-02-065200 |
|
Fix included in accepted release 4.19.0-0.nightly-2025-04-02-170034 |
|
Fix included in accepted release 4.19.0-0.nightly-2025-04-04-023411 |
|
Fix included in accepted release 4.19.0-0.nightly-2025-04-04-170728 |
Before openshift#2501 Signed-off-by: Tim Rozet <trozet@redhat.com>
|
/retitle CORENET-5625, OCPBUGS-54245, SDN-5772: Downstream merge 2025-03-21 |
|
@jcaamano: Jira Issue OCPBUGS-54245 is in an unrecognized state (Closed) and will not be moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@maiqueb: Jira Issue OCPBUGS-54245 is in an unrecognized state (Closed) and will not be moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cc @trozet @tssurya @hareeshpc @oribon @npinaeva @RamLavi