Conversation
Adds check to ensure that the status on an EIP matches the IP we have allocated in our cache. If it doesn't consider the status invalid and attempt reallocation. There is a race exposed by the unit test "should update invalid assignments on duplicated node", where an EIP has 2 IPs assigned to node1, and a bogus IP assigned to node2. The test expects that EIP should rebalance the EIPs with one on node 1 and one on node 2. However a race can happen where EIP node allocator cache becomes corrupt due to lack of validation of status. This can happen: 1. EgressIP has node1: EIP1, EIP2 2. Test starts WatchEgressIP 3. EIP->Sync->Annotates 50000 mark on EIP 4. This triggers an update event with 50000 mark, node1: EIP1, EIP2 5. EIP initial add starts with original event of node1: EIP1, EIP2 6. EIP rebalances one EIP per node, patches EIP object, creates update event with 50000 mark, node1: EIP1, node2: EIP2 7. Next update event processed -> RACE happens, retry framework will grab "latest" object. If the informer was lagging, informer cache has object from step 4 8. EIP validation logic considers this a pass, because cache has the both nodes, each with 1 egress IP. No patch is done to the EIP object, but when validStatus is achieved the eNode cache is updated 9. enode cache is updated so now node1 has 2 allocations for EIP 10. Update event from 6 is processed. Validation state fails because node1 has 2 EIPs. During rebalancing it wont be able to find a suitable node for the EIP due to cache corruption, so nothing will be updated. Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Add OKEP for `ovn-kubernetes-mcp` repo
Enhances EIP status validation
Fixes: #5552 Signed-off-by: Tim Rozet <trozet@nvidia.com>
For backwards compatibility reasons, we cannot update the definition of all the attributes that use the CIDR type - only of the new, unreleased objects. Hence, this check is only performed for the `reservedSubnets` and `infrastructureSubnets` attributes. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
The infrastructure IPv4 subnet of the L2 UDN was mistakenly not masked. This commit changes that, effectively masking the provided infrastructure CIDR. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
OKEP-5552: Add support for dynamic UDN allocation per node
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Since commit fa282e7, we test that the allocator cache matches the status of an EgressIP. This change updates both IPv4 and dual-stack test variants to properly validate cache-based stability. Each test now initializes the allocator cache with assignments matching the EgressIP status, then verifies neither the cache nor status change after running WatchEgressIP. A new egressIPsMatch helper performs comparison of EgressIP specs and status items. Signed-off-by: Andreas Karis <ak.karis@gmail.com>
These tests assert that unmasked reserved / infrastructure subnets are caught (both IPv4 and IPv6). We need to skip the test when it tries to run on an unsupported platform, otherwise, it will not fail as expected. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
configure IP/VRF rules only on full/dpu-host mode, and confiugre openflow rules only on full/dpu mode. Signed-off-by: Yun Zhou <yunz@nvidia.com>
EgressIP: Unit test IPv4 should not update valid assignment fix
api, udn: subnets must be masked
Fixes OF flows to be VLAN aware as well as add DPU checks for certain features
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode don't work for labels without a value. The incorrect handling of these labels caused an incorrect sequence of nodes whem migrating different than what the tests intended to test. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
There's two circumstances when IPs were being released incorrectly: * when a live migratable pod completed with no migration ongoing it was not being released due to IsMigratedSourcePodStale outright assuming a completed pod was stale. * when a live migratable pod completed on a different node than the VM's original as part of a migration it was being released when it shouldn't, we were simply not checking if it was a migration. It also improves the tests to check for IP release. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Don't attempt to release IPs that are not managed by the local zone which can happen with live migratable pods, otherwise we would get distracting error logs on release. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
ConditionalIPRelease would always return false when checking IPs not tracked in the local zone so in that case we were not correctly checking for colliding pods. This was hidden by the fact that IsMigratedSourcePodStale was used just before instead of AllVMPodsAreCompleted until a very recent fix and that would always return false for a completed live migratable pod. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Or completion of a failed target pod Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
As it is the most complex scenario and a superset of testing without it Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
new target lint-run-natively will now pull the golangci-lint binary based on the version in lint.sh locally and run the lint check. This is useful in the case that some system does not have a container runtime available. Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
|
Skipping CI for Draft Pull Request. |
|
/test lint |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kyrtapz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kyrtapz: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.