Skip to content

Comments

[wip] lint fix#2787

Closed
kyrtapz wants to merge 24 commits intoopenshift:masterfrom
kyrtapz:lint_fix_ds
Closed

[wip] lint fix#2787
kyrtapz wants to merge 24 commits intoopenshift:masterfrom
kyrtapz:lint_fix_ds

Conversation

@kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Oct 8, 2025

No description provided.

trozet and others added 24 commits September 15, 2025 15:35
Adds check to ensure that the status on an EIP matches the IP we have
allocated in our cache. If it doesn't consider the status invalid and
attempt reallocation.

There is a race exposed by the unit test "should update invalid
assignments on duplicated node", where an EIP has 2 IPs assigned to
node1, and a bogus IP assigned to node2. The test expects that EIP
should rebalance the EIPs with one on node 1 and one on node 2. However
a race can happen where EIP node allocator cache becomes corrupt due to
lack of validation of status. This can happen:

1. EgressIP has node1: EIP1, EIP2
2. Test starts WatchEgressIP
3. EIP->Sync->Annotates 50000 mark on EIP
4. This triggers an update event with 50000 mark, node1: EIP1, EIP2
5. EIP initial add starts with original event of node1: EIP1, EIP2
6. EIP rebalances one EIP per node, patches EIP object, creates update
   event with 50000 mark, node1: EIP1, node2: EIP2
7. Next update event processed -> RACE happens, retry framework will
   grab "latest" object. If the informer was lagging, informer cache has
   object from step 4
8. EIP validation logic considers this a pass, because cache has the
   both nodes, each with 1 egress IP. No patch is done to the EIP
   object, but when validStatus is achieved the eNode cache is updated
9. enode cache is updated so now node1 has 2 allocations for EIP
10. Update event from 6 is processed. Validation state fails because
   node1 has 2 EIPs. During rebalancing it wont be able to find a
   suitable node for the EIP due to cache corruption, so nothing will
   be updated.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Fixes: #5552

Signed-off-by: Tim Rozet <trozet@nvidia.com>
For backwards compatibility reasons, we cannot update the definition of
all the attributes that use the CIDR type - only of the new, unreleased
objects.

Hence, this check is only performed for the `reservedSubnets` and
`infrastructureSubnets` attributes.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
The infrastructure IPv4 subnet of the L2 UDN was mistakenly not masked.
This commit changes that, effectively masking the provided infrastructure CIDR.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
OKEP-5552: Add support for dynamic UDN allocation per node
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Since commit fa282e7, we test that the allocator cache matches
the status of an EgressIP.

This change updates both IPv4 and dual-stack test variants to
properly validate cache-based stability. Each test now initializes
the allocator cache with assignments matching the EgressIP status,
then verifies neither the cache nor status change after running
WatchEgressIP. A new egressIPsMatch helper performs comparison of
EgressIP specs and status items.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
These tests assert that unmasked reserved / infrastructure subnets are
caught (both IPv4 and IPv6).

We need to skip the test when it tries to run on an unsupported
platform, otherwise, it will not fail as expected.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
configure IP/VRF rules only on full/dpu-host mode, and confiugre
openflow rules only on full/dpu mode.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
EgressIP: Unit test IPv4 should not update valid assignment fix
Fixes OF flows to be VLAN aware as well as add DPU checks for certain features
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode
don't work for labels without a value. The incorrect handling of these
labels caused an incorrect sequence of nodes whem migrating different
than what the tests intended to test.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
There's two circumstances when IPs were being released incorrectly:

* when a live migratable pod completed with no migration ongoing it was
  not being released due to IsMigratedSourcePodStale outright assuming a
  completed pod was stale.
* when a live migratable pod completed on a different node than the VM's
  original as part of a migration it was being released when it
  shouldn't, we were simply not checking if it was a migration.

It also improves the tests to check for IP release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Don't attempt to release IPs that are not managed by the local zone
which can happen with live migratable pods, otherwise we would get
distracting error logs on release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
ConditionalIPRelease would always return false when checking IPs not
tracked in the local zone so in that case we were not correctly checking
for colliding pods.

This was hidden by the fact that IsMigratedSourcePodStale was used just
before instead of AllVMPodsAreCompleted until a very recent fix and that
would always return false for a completed live migratable pod.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Or completion of a failed target pod

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
As it is the most complex scenario and a superset of testing without it

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
new target lint-run-natively will now pull the golangci-lint binary
based on the version in lint.sh locally and run the lint check. This
is useful in the case that some system does not have a container
runtime available.

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 8, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 8, 2025

/test lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kyrtapz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 8, 2025

@kyrtapz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 8e457b1 link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kyrtapz kyrtapz closed this Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants