Skip to content

OCPCLOUD-3215: Add support for dual networking stack services#135

Merged
openshift-merge-bot[bot] merged 6 commits intoopenshift:mainfrom
nrb:OCPCLOUD-3215-dual-stack
Mar 27, 2026
Merged

OCPCLOUD-3215: Add support for dual networking stack services#135
openshift-merge-bot[bot] merged 6 commits intoopenshift:mainfrom
nrb:OCPCLOUD-3215-dual-stack

Conversation

@nrb
Copy link
Copy Markdown

@nrb nrb commented Mar 18, 2026

Carry the upstream change kubernetes#1313 in our fork.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 18, 2026

@nrb: This pull request references OCPCLOUD-3215 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Carry the upstream change kubernetes#1313 in our fork.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58669a24-dae7-4fc5-860d-7b39054b1bae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds IPv6/dual‑stack support: documentation for dual‑stack Services; IPv6-aware CIDR collection and security‑group rule creation; propagate and reconcile ELBv2/NLB IpAddressType through listeners/target groups; IP family validations; and updated fakes and unit tests.

Changes

Cohort / File(s) Summary
Documentation
docs/service_controller.md
New "Dual stack services with IPv6" section: requirements, limitations, notes about AWS Load Balancer Controller webhook, and two YAML examples for RequireDualStack and PreferDualStack.
Provider: core networking & SGs
pkg/providers/v1/aws.go
Collect IPv6 CIDRs from subnets, add separateIPv4AndIPv6CIDRs, extend createSecurityGroupRules/ensureNLBSecurityGroupRules signatures to accept IPv6 ranges, and ensure ::/0 is appended when service requests IPv6.
Load balancer (ELBv2/NLB) logic
pkg/providers/v1/aws_loadbalancer.go
Add helpers (getTargetGroupIPAddressTypeFromService, getLoadBalancerIPAddressTypeFromService, serviceRequestsIPv6, isIPv6CIDR); propagate ipAddressType through ensureLoadBalancerv2, createListenerV2, ensureTargetGroup; recreate target groups on IpAddressType change; include ipAddressType in target-group naming; IPv6-aware SG and MTU/ICMP rules.
Validations
pkg/providers/v1/aws_validations.go
Add validateIPFamilyInfo and canFallbackToIPv4 to validate/default Spec.IPFamilies and IPFamilyPolicy combinations; import k8s.io/utils/ptr.
Unit tests: load balancer
pkg/providers/v1/aws_loadbalancer_test.go
Update TestBuildTargetGroupName to include ipAddressType; add mock ELBV2 wrapper for target limits; add tests for target-group ipAddressType mapping, IPv6 detection, and target registration behavior.
Unit tests: provider core
pkg/providers/v1/aws_test.go
Mock ELBv2 now persists IpAddressType and implements SetIpAddressType; extend TestEnsureLoadBalancer to accept ipFamilies/ipFamilyPolicy; add TestSeparateIPv4AndIPv6CIDRs; update TestCreateSecurityGroupRules to include IPv6 ranges.
Validation tests
pkg/providers/v1/aws_validations_test.go
Add TestCanFallbackToIPv4 and TestValidateIPFamilyInfo table‑driven tests covering SingleStack/PreferDualStack/RequireDualStack and edge cases.
Test fakes
pkg/providers/v1/aws_fakes.go
Add IpAddressType field to FakeELBV2 and implement SetIpAddressType to record/set the value for tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/providers/v1/aws.go (1)

2013-2019: ⚠️ Potential issue | 🟠 Major

IPv6 ICMP rules require ICMPv6 protocol, not ICMP.

The ICMP fragmentation rule uses IPv4 ICMP protocol (icmp) with type 3 code 4. For IPv6, MTU discovery requires ICMPv6 (icmpv6) with type 2 (Packet Too Big). Adding Ipv6Ranges to an IPv4 ICMP permission will not work correctly for IPv6 path MTU discovery.

Consider creating separate security group rules for IPv4 (ICMP type 3/4) and IPv6 (ICMPv6 type 2).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws.go` around lines 2013 - 2019, The current
ec2types.IpPermission named permission mixes IPv4 ICMP ("icmp", FromPort 3,
ToPort 4) with Ipv6Ranges which is invalid for IPv6; split into two permissions:
keep one IPv4 permission using IpProtocol "icmp", FromPort 3, ToPort 4 and
IpRanges set to ec2SourceRanges (remove Ipv6Ranges), and add a separate IPv6
permission using IpProtocol "icmpv6" with FromPort set to the ICMPv6 type (2)
and ToPort set to the ICMPv6 code (0) and Ipv6Ranges set to ec2Ipv6SourceRanges
(no IpRanges).
🧹 Nitpick comments (3)
pkg/providers/v1/aws_test.go (2)

4335-4355: The dual-stack SG tests don't assert the IPv6 path yet.

These cases still pass if createSecurityGroupRules ignores ec2Ipv6SourceRanges, because the assertions only inspect IpRanges on the injected IPv4 ICMP rule. Please also assert Ipv6Ranges/the ICMPv6 rule, or the EC2 ingress request, so the new IPv6 branch is actually covered.

Also applies to: 4417-4476

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4335 - 4355, The test case
"successful security group dual stack rule creation" in
pkg/providers/v1/aws_test.go currently only asserts IPv4 IpRanges, so update the
test to also assert IPv6 behavior: when exercising createSecurityGroupRules (and
similar dual-stack cases around the other block at lines ~4417-4476), verify
that the generated ec2 ingress request includes the expected Ipv6Ranges (e.g.,
contains CidrIpv6 "::/128") and/or that an ICMPv6 IpPermission (ICMPv6
type/code) was added to the rules map; locate the test entry by name and add
assertions on Ipv6Ranges or the ICMPv6 rule in IPPermissionSet to ensure the
IPv6 branch is actually covered.

4098-4104: Pin this case to the IPv6 validation error.

This new test only asserts "some error". With this much mock setup, an unrelated failure also makes it pass, so it doesn't actually prove EnsureLoadBalancer rejected single-stack IPv6 for the intended reason.

Also applies to: 4289-4292

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4098 - 4104, The test case named
"reject single stack IPv6 on NLB" currently only checks that some error was
returned, which can hide unrelated failures; update the assertion to pin the
failure to the IPv6 validation by asserting the returned error from
EnsureLoadBalancer contains (or matches) the specific IPv6 validation message or
sentinel error used by the code (e.g., the IPv6 single-stack NLB validation
error) instead of only checking wantErr=true; locate the test case by its name
in pkg/providers/v1/aws_test.go and change the error assertion logic (and
similarly for the other case around lines 4289-4292) to verify the error text or
use errors.Is against the specific validation error exported/returned by
EnsureLoadBalancer.
pkg/providers/v1/aws_validations_test.go (1)

569-574: RequireDualStack and nil-policy defaulting still aren't covered here.

Line 619 and Line 625 use PreferDualStack, not RequireDualStack, and Line 641 always passes a non-nil IPFamilyPolicy. That leaves the new RequireDualStack success path and the nil-defaulting branch in validateIPFamilyInfo untested.

Also applies to: 618-642

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_validations_test.go` around lines 569 - 574, The tests
slice in pkg/providers/v1/aws_validations_test.go is missing coverage for the
RequireDualStack success path and the nil IPFamilyPolicy defaulting; add test
cases that (1) set ipFamilyPolicy to v1.RequireDualStack with ipFamilies
[]v1.IPFamily{v1.IPv4, v1.IPv6} and expectedError "" to cover the
RequireDualStack success, and (2) set ipFamilyPolicy to the zero value
(nil/default by omitting or using the typed zero) with ipFamilies covering
dual-stack to exercise the nil-defaulting branch in validateIPFamilyInfo; ensure
these new entries are included in the same tests slice used by the
TestValidateIPFamilyInfo (or equivalent) loop so validateIPFamilyInfo gets
exercised for both RequireDualStack and the nil-policy default case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/providers/v1/aws_validations.go`:
- Around line 171-199: The validation mutates service.Spec (setting
service.Spec.IPFamilies and service.Spec.IPFamilyPolicy); instead compute local
values instead (e.g., ipFamilies := service.Spec.IPFamilies; if ipFamilies ==
nil { ipFamilies = make([]v1.IPFamily,0) } and ipFamilyPolicy :=
service.Spec.IPFamilyPolicy; if ipFamilyPolicy == nil { ipFamilyPolicy =
ptr.To(v1.IPFamilyPolicySingleStack) }) and then use these local variables
(ipFamilies and ipFamilyPolicy) for all length checks and comparisons (the
checks currently referencing service.Spec.IPFamilies and dereferencing
service.Spec.IPFamilyPolicy) so no fields on the passed-in Service are modified
during validation.

---

Outside diff comments:
In `@pkg/providers/v1/aws.go`:
- Around line 2013-2019: The current ec2types.IpPermission named permission
mixes IPv4 ICMP ("icmp", FromPort 3, ToPort 4) with Ipv6Ranges which is invalid
for IPv6; split into two permissions: keep one IPv4 permission using IpProtocol
"icmp", FromPort 3, ToPort 4 and IpRanges set to ec2SourceRanges (remove
Ipv6Ranges), and add a separate IPv6 permission using IpProtocol "icmpv6" with
FromPort set to the ICMPv6 type (2) and ToPort set to the ICMPv6 code (0) and
Ipv6Ranges set to ec2Ipv6SourceRanges (no IpRanges).

---

Nitpick comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 4335-4355: The test case "successful security group dual stack
rule creation" in pkg/providers/v1/aws_test.go currently only asserts IPv4
IpRanges, so update the test to also assert IPv6 behavior: when exercising
createSecurityGroupRules (and similar dual-stack cases around the other block at
lines ~4417-4476), verify that the generated ec2 ingress request includes the
expected Ipv6Ranges (e.g., contains CidrIpv6 "::/128") and/or that an ICMPv6
IpPermission (ICMPv6 type/code) was added to the rules map; locate the test
entry by name and add assertions on Ipv6Ranges or the ICMPv6 rule in
IPPermissionSet to ensure the IPv6 branch is actually covered.
- Around line 4098-4104: The test case named "reject single stack IPv6 on NLB"
currently only checks that some error was returned, which can hide unrelated
failures; update the assertion to pin the failure to the IPv6 validation by
asserting the returned error from EnsureLoadBalancer contains (or matches) the
specific IPv6 validation message or sentinel error used by the code (e.g., the
IPv6 single-stack NLB validation error) instead of only checking wantErr=true;
locate the test case by its name in pkg/providers/v1/aws_test.go and change the
error assertion logic (and similarly for the other case around lines 4289-4292)
to verify the error text or use errors.Is against the specific validation error
exported/returned by EnsureLoadBalancer.

In `@pkg/providers/v1/aws_validations_test.go`:
- Around line 569-574: The tests slice in
pkg/providers/v1/aws_validations_test.go is missing coverage for the
RequireDualStack success path and the nil IPFamilyPolicy defaulting; add test
cases that (1) set ipFamilyPolicy to v1.RequireDualStack with ipFamilies
[]v1.IPFamily{v1.IPv4, v1.IPv6} and expectedError "" to cover the
RequireDualStack success, and (2) set ipFamilyPolicy to the zero value
(nil/default by omitting or using the typed zero) with ipFamilies covering
dual-stack to exercise the nil-defaulting branch in validateIPFamilyInfo; ensure
these new entries are included in the same tests slice used by the
TestValidateIPFamilyInfo (or equivalent) loop so validateIPFamilyInfo gets
exercised for both RequireDualStack and the nil-policy default case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 560d7cba-5046-44bc-b44b-11ad5011ac9e

📥 Commits

Reviewing files that changed from the base of the PR and between 24a5524 and 336c0a8.

📒 Files selected for processing (7)
  • docs/service_controller.md
  • pkg/providers/v1/aws.go
  • pkg/providers/v1/aws_loadbalancer.go
  • pkg/providers/v1/aws_loadbalancer_test.go
  • pkg/providers/v1/aws_test.go
  • pkg/providers/v1/aws_validations.go
  • pkg/providers/v1/aws_validations_test.go

@nrb nrb force-pushed the OCPCLOUD-3215-dual-stack branch from 336c0a8 to f6f5bae Compare March 18, 2026 21:18
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/providers/v1/aws.go (1)

2013-2019: ⚠️ Potential issue | 🟠 Major

Separate IPv4 ICMP and IPv6 ICMPv6 rules to fix dual-stack PMTU discovery.

Lines 2013-2019 create a single IpPermission with IPv4 ICMP protocol (icmp, type/code 3/4) but apply it to both IpRanges and Ipv6Ranges. AWS EC2 security group rules require protocol-specific definitions—IPv4 ICMP rules ignore IPv6 ranges, leaving IPv6 traffic without MTU discovery rules. This breaks PMTU discovery in dual-stack deployments.

Create separate rules: one for IPv4 ICMP with IpRanges, another for IPv6 ICMPv6 (protocol 58, type 2, code 0) with Ipv6Ranges.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws.go` around lines 2013 - 2019, The current single
ec2types.IpPermission mixes IPv4 and IPv6 ranges which leaves ICMPv6
unconfigured; replace it with two distinct IpPermission objects: one IPv4 rule
using IpProtocol "icmp" with FromPort 3 and ToPort 4 and only IpRanges (use
ec2SourceRanges), and a separate IPv6 rule using IpProtocol "58" (ICMPv6) with
FromPort 2 and ToPort 0 and only Ipv6Ranges (use ec2Ipv6SourceRanges); ensure
these two permissions are added where the original permission variable was used
so IPv4 and IPv6 PMTU discovery each get proper security group rules.
♻️ Duplicate comments (1)
pkg/providers/v1/aws_validations.go (1)

171-174: ⚠️ Potential issue | 🟠 Major

Do not mutate service.Spec inside validation.

Line 172 writes defaults back into the caller object. Validation should be side-effect free; otherwise state can leak into later reconciliation logic.

♻️ Suggested fix
-	// Make sure we have a usable zero value for IPFamilies
-	if service.Spec.IPFamilies == nil {
-		service.Spec.IPFamilies = make([]v1.IPFamily, 0)
-	}
+	// Make sure we have a usable zero value for IPFamilies without mutating input.
+	ipFamilies := service.Spec.IPFamilies
+	if ipFamilies == nil {
+		ipFamilies = []v1.IPFamily{}
+	}
@@
-	ipFamilies := service.Spec.IPFamilies
 	if len(ipFamilies) >= 3 {
 		return fmt.Errorf("ipFamilies requires 1 or 2 entries. got %d", len(ipFamilies))
 	}

Also applies to: 184-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_validations.go` around lines 171 - 174, Validation must
not mutate the caller's object; remove assignments that write back into
service.Spec (e.g., the lines setting service.Spec.IPFamilies = make(...)) and
instead operate on a local variable (for example: ipFamilies :=
service.Spec.IPFamilies; if ipFamilies == nil { ipFamilies = make([]v1.IPFamily,
0) }) when performing checks. Apply the same change for the other similar
mutation at the later block (lines that assign into service.Spec in the same
validation function), and update subsequent validations to reference the local
variables rather than service.Spec so no state is written back during
validation.
🧹 Nitpick comments (1)
pkg/providers/v1/aws_test.go (1)

4335-4355: The new dual-stack SG cases still don't validate the IPv6 branch.

These cases now pass ec2Ipv6SourceRanges, but the shared verification below still only asserts the IPv4-side rule data. A regression that drops the generated IPv6 permission would still pass.

Also applies to: 4417-4431, 4449-4449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4335 - 4355, The test case
"successful security group dual stack rule creation" supplies
ec2Ipv6SourceRanges but the shared verification only asserts IPv4 data; update
the shared verification helper used by these cases to also validate the IPv6
branch: when ec2Ipv6SourceRanges (ec2types.Ipv6Range) is non-empty, assert that
a corresponding ec2types.IpPermission (or separate IPv6 permission) exists with
matching IpProtocol, FromPort/ToPort and that its Ipv6Ranges contain the same
CidrIpv6 values as provided; ensure this validation is added alongside the
existing checks for IPPermissionSet and ec2SourceRanges so regressions that drop
generated IPv6 permissions fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/providers/v1/aws_loadbalancer.go`:
- Around line 347-354: The recreate path uses ensureTargetGroup(nil, ...) but
buildTargetGroupName(...) doesn't include ipAddressType causing name collision
when IpAddressType changes; update calls that construct the target group name
(locations invoking buildTargetGroupName in the recreate/delete flow) to pass
the current targetGroupIPAddressType (and modify buildTargetGroupName signature
if needed) so the generated name includes ipAddressType, ensuring new target
groups get unique names before the old one is deleted (refer to
ensureTargetGroup, buildTargetGroupName, targetGroupIPAddressType,
IpAddressType).
- Around line 198-205: The code only sets IpAddressType at create time (using
getTargetGroupIPAddressTypeFromService and serviceRequestsIPv6) and never
reconciles an existing NLB when the Service changes; update the
reconciliation/update path to detect a mismatch between the desired address
family (computed by getTargetGroupIPAddressTypeFromService/serviceRequestsIPv6)
and the current NLB IpAddressType (retrieve the existing load balancer via the
AWS DescribeLoadBalancers API), call the appropriate AWS modify/set API to
change the load balancer IpAddressType when they differ, and keep
validateIPFamilyInfo checks in place so single-stack IPv6 remains rejected;
ensure the new logic runs both for create and update flows so NLBs converge to
the requested front-end address family.
- Around line 1131-1148: The fast-path that treats client rules as "all open"
must recognize dual-stack defaults too: in updateInstanceSecurityGroupsForNLB,
change the logic that checks clientRules (or existing client CIDR set) for an
all-open shortcut so it returns true when both 0.0.0.0/0 (IPv4) and ::/0 (IPv6)
are present, not only when a single-family all-open CIDR is seen; update the
detection code (where you currently branch before building perms and inserting
into desiredPerms) to check for presence of both families and skip adding
per-subnet health-check rules when both defaults exist. Ensure you use the same
helper isIPv6CIDR when classifying and reference clientRules/desiredPerms and
the function updateInstanceSecurityGroupsForNLB so the change is localized.

---

Outside diff comments:
In `@pkg/providers/v1/aws.go`:
- Around line 2013-2019: The current single ec2types.IpPermission mixes IPv4 and
IPv6 ranges which leaves ICMPv6 unconfigured; replace it with two distinct
IpPermission objects: one IPv4 rule using IpProtocol "icmp" with FromPort 3 and
ToPort 4 and only IpRanges (use ec2SourceRanges), and a separate IPv6 rule using
IpProtocol "58" (ICMPv6) with FromPort 2 and ToPort 0 and only Ipv6Ranges (use
ec2Ipv6SourceRanges); ensure these two permissions are added where the original
permission variable was used so IPv4 and IPv6 PMTU discovery each get proper
security group rules.

---

Duplicate comments:
In `@pkg/providers/v1/aws_validations.go`:
- Around line 171-174: Validation must not mutate the caller's object; remove
assignments that write back into service.Spec (e.g., the lines setting
service.Spec.IPFamilies = make(...)) and instead operate on a local variable
(for example: ipFamilies := service.Spec.IPFamilies; if ipFamilies == nil {
ipFamilies = make([]v1.IPFamily, 0) }) when performing checks. Apply the same
change for the other similar mutation at the later block (lines that assign into
service.Spec in the same validation function), and update subsequent validations
to reference the local variables rather than service.Spec so no state is written
back during validation.

---

Nitpick comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 4335-4355: The test case "successful security group dual stack
rule creation" supplies ec2Ipv6SourceRanges but the shared verification only
asserts IPv4 data; update the shared verification helper used by these cases to
also validate the IPv6 branch: when ec2Ipv6SourceRanges (ec2types.Ipv6Range) is
non-empty, assert that a corresponding ec2types.IpPermission (or separate IPv6
permission) exists with matching IpProtocol, FromPort/ToPort and that its
Ipv6Ranges contain the same CidrIpv6 values as provided; ensure this validation
is added alongside the existing checks for IPPermissionSet and ec2SourceRanges
so regressions that drop generated IPv6 permissions fail.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f285bc93-94f3-4a62-9c60-5aed986ee192

📥 Commits

Reviewing files that changed from the base of the PR and between 336c0a8 and f6f5bae.

📒 Files selected for processing (7)
  • docs/service_controller.md
  • pkg/providers/v1/aws.go
  • pkg/providers/v1/aws_loadbalancer.go
  • pkg/providers/v1/aws_loadbalancer_test.go
  • pkg/providers/v1/aws_test.go
  • pkg/providers/v1/aws_validations.go
  • pkg/providers/v1/aws_validations_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/providers/v1/aws_validations_test.go

@shellyyang1989
Copy link
Copy Markdown

@huali9 for awareness of the downstream change for the aws dual stack feature

nrb added 4 commits March 24, 2026 15:21
- Create load balancers according to the Kubernetes Service API

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Assisted-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Changing the IP address type would invalidate the target group name,
because you cannot have an IPv4 and IPv6 target for the same
port/protocol set.

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@nrb nrb force-pushed the OCPCLOUD-3215-dual-stack branch from f6f5bae to 1abf771 Compare March 24, 2026 19:22
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
pkg/providers/v1/aws_test.go (2)

4383-4404: Remove duplicated dual-stack empty-rule scenario.

The cases around Lines 4375-4389 and Lines 4391-4404 are effectively identical. Keeping one is enough and reduces maintenance noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4383 - 4404, Remove the duplicated
table-driven test case that has name "empty dual stack rule set" (the entry with
rules: IPPermissionSet{}, ec2SourceRanges and ec2Ipv6SourceRanges set to
0.0.0.0/0 and ::/128 and expectError: false); leave a single instance of that
scenario in the test cases slice in pkg/providers/v1/aws_test.go, ensuring the
surrounding test slice (the table variable and its iteration) still compiles and
any comments/indices are updated accordingly.

4098-4104: Strengthen the new negative case with an error-content check.

Line 4103 currently accepts any error, so this can pass on unrelated failures. Assert a meaningful error substring to validate the specific IPv6 single-stack NLB rejection path.

Suggested patch
 type testCase struct {
 	name           string
 	annotations    map[string]string
 	config         func() config.CloudConfig
 	ipFamilies     []v1.IPFamily
 	ipFamilyPolicy *v1.IPFamilyPolicy
 	want           *v1.LoadBalancerStatus
 	wantErr        bool
+	wantErrContains string
 	HookPostChecks func(*testCase, *Cloud, *v1.Service)
 }
@@
 		{
 			name:           "reject single stack IPv6 on NLB",
 			annotations:    map[string]string{ServiceAnnotationLoadBalancerType: "nlb"},
 			ipFamilies:     []v1.IPFamily{v1.IPv6Protocol},
 			ipFamilyPolicy: func() *v1.IPFamilyPolicy { p := v1.IPFamilyPolicySingleStack; return &p }(),
 			wantErr:        true,
+			wantErrContains: "ipv6",
 		},
@@
 			svcStatus, err := c.EnsureLoadBalancer(context.TODO(), TestClusterName, testService, nodes)
 			if test.wantErr {
-				assert.Error(t, err)
+				require.Error(t, err)
+				if test.wantErrContains != "" {
+					assert.Contains(t, strings.ToLower(err.Error()), test.wantErrContains)
+				}
 			} else {
 				assert.NoError(t, err)
 				assert.Equal(t, test.want, svcStatus)
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4098 - 4104, The test case named
"reject single stack IPv6 on NLB" currently only checks that an error occurred;
update its assertion to verify the error message contains a specific substring
indicating IPv6 single‑stack rejection for NLB (e.g., check error.Error()
contains "IPv6" and "NLB" or "single‑stack" text). Locate the table entry with
name "reject single stack IPv6 on NLB" and its run/assert logic and replace the
loose wantErr boolean check with a targeted assertion that the returned error is
non-nil and its message includes the expected IPv6/NLB rejection text; keep the
existing fields (annotations: ServiceAnnotationLoadBalancerType, ipFamilies,
ipFamilyPolicy) unchanged. Ensure the assertion fails if a different unrelated
error is returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/providers/v1/aws_fakes.go`:
- Around line 718-721: The FakeELBV2.SetIpAddressType currently panics causing
tests to crash when ensureLoadBalancerv2 calls it; replace the panic with a
minimal, stateful implementation that accepts the elbv2.SetIpAddressTypeInput,
updates the fake's internal state (e.g., store the provided IpAddressType on the
FakeELBV2 instance so future DescribeLoadBalancers calls reflect the change),
and return a successful elbv2.SetIpAddressTypeOutput and nil error; also ensure
MockedFakeELBV2 (which embeds FakeELBV2) inherits this behavior so tests
exercising dual-stack reconciliation can assert the updated address type.

In `@pkg/providers/v1/aws_loadbalancer_test.go`:
- Around line 1985-2025: The test fails because ensureTargetGroupTargets calls
RegisterTargets before DeregisterTargets, which temporarily bumps a full target
group past its max and can surface TooManyTargets; update
ensureTargetGroupTargets to detect a replacement scenario and either call
DeregisterTargets before RegisterTargets or implement a capacity-aware fallback
(check current target count via DescribeTargetHealth and only Register after
ensuring capacity), ensuring the logic around RegisterTargets/DeregisterTargets
handles replacements without exceeding max targets.

In `@pkg/providers/v1/aws_test.go`:
- Around line 4311-4316: The test currently accepts IPv6 inputs via
ec2Ipv6SourceRanges but only asserts IPv4 via rule.IpRanges, so add assertions
that validate rule.Ipv6Ranges matches ec2Ipv6SourceRanges: check lengths are
equal and that each expected ec2Ipv6SourceRanges entry (the CidrIpv6 values)
appears in rule.Ipv6Ranges (and vice‑versa) in the same table of checks that
currently validates rule.IpRanges/IpProtocol/FromPort/ToPort for the
IPPermissionSet/rules; reference the test locals ec2Ipv6SourceRanges, rules, and
rule.Ipv6Ranges when adding these assertions.

---

Nitpick comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 4383-4404: Remove the duplicated table-driven test case that has
name "empty dual stack rule set" (the entry with rules: IPPermissionSet{},
ec2SourceRanges and ec2Ipv6SourceRanges set to 0.0.0.0/0 and ::/128 and
expectError: false); leave a single instance of that scenario in the test cases
slice in pkg/providers/v1/aws_test.go, ensuring the surrounding test slice (the
table variable and its iteration) still compiles and any comments/indices are
updated accordingly.
- Around line 4098-4104: The test case named "reject single stack IPv6 on NLB"
currently only checks that an error occurred; update its assertion to verify the
error message contains a specific substring indicating IPv6 single‑stack
rejection for NLB (e.g., check error.Error() contains "IPv6" and "NLB" or
"single‑stack" text). Locate the table entry with name "reject single stack IPv6
on NLB" and its run/assert logic and replace the loose wantErr boolean check
with a targeted assertion that the returned error is non-nil and its message
includes the expected IPv6/NLB rejection text; keep the existing fields
(annotations: ServiceAnnotationLoadBalancerType, ipFamilies, ipFamilyPolicy)
unchanged. Ensure the assertion fails if a different unrelated error is
returned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78c51d74-b060-4942-af9d-0e06a4e96526

📥 Commits

Reviewing files that changed from the base of the PR and between f6f5bae and 1abf771.

📒 Files selected for processing (8)
  • docs/service_controller.md
  • pkg/providers/v1/aws.go
  • pkg/providers/v1/aws_fakes.go
  • pkg/providers/v1/aws_loadbalancer.go
  • pkg/providers/v1/aws_loadbalancer_test.go
  • pkg/providers/v1/aws_test.go
  • pkg/providers/v1/aws_validations.go
  • pkg/providers/v1/aws_validations_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/providers/v1/aws_validations_test.go
  • pkg/providers/v1/aws_validations.go
  • pkg/providers/v1/aws.go

Comment on lines +4311 to +4316
name string
sgID string
rules IPPermissionSet
ec2SourceRanges []ec2types.IpRange
ec2Ipv6SourceRanges []ec2types.Ipv6Range
expectError bool
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

IPv6 SG ranges are passed but not actually asserted.

Lines 4315 and 4449 add IPv6 inputs, but post-checks only validate rule.IpRanges. A regression that drops rule.Ipv6Ranges would still pass these tests.

Suggested patch
 			// Verify the ec2SourceRanges were properly set
 			for _, rule := range tc.rules {
 				if aws.ToString(rule.IpProtocol) == "icmp" {
 					assert.Equal(t, tc.ec2SourceRanges, rule.IpRanges)
+					assert.Equal(t, tc.ec2Ipv6SourceRanges, rule.Ipv6Ranges)
 				}
 			}

Also applies to: 4449-4449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4311 - 4316, The test currently
accepts IPv6 inputs via ec2Ipv6SourceRanges but only asserts IPv4 via
rule.IpRanges, so add assertions that validate rule.Ipv6Ranges matches
ec2Ipv6SourceRanges: check lengths are equal and that each expected
ec2Ipv6SourceRanges entry (the CidrIpv6 values) appears in rule.Ipv6Ranges (and
vice‑versa) in the same table of checks that currently validates
rule.IpRanges/IpProtocol/FromPort/ToPort for the IPPermissionSet/rules;
reference the test locals ec2Ipv6SourceRanges, rules, and rule.Ipv6Ranges when
adding these assertions.

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/providers/v1/aws_test.go (1)

2517-2521: ⚠️ Potential issue | 🟠 Major

Don't let the fake's shadow IpAddressType drift from the stored load balancer state.

These changes now maintain IpAddressType in two places: the entry in m.LoadBalancers and the embedded FakeELBV2.IpAddressType field (pkg/providers/v1/aws_fakes.go, around Line 683). They can diverge immediately on create, and SetIpAddressType still updates the shadow field even when no LoadBalancerArn matched. That weakens the new reconciliation tests because the fake can expose conflicting state.

Suggested fix
 func (m *MockedFakeELBV2) CreateLoadBalancer(ctx context.Context, input *elbv2.CreateLoadBalancerInput, optFns ...func(*elbv2.Options)) (*elbv2.CreateLoadBalancerOutput, error) {
@@
 	m.LoadBalancers = append(m.LoadBalancers, &newLB)
+	m.FakeELBV2.IpAddressType = newLB.IpAddressType

 	return &elbv2.CreateLoadBalancerOutput{
 		LoadBalancers: []elbv2types.LoadBalancer{newLB},
 	}, nil
 }

 func (m *MockedFakeELBV2) SetIpAddressType(ctx context.Context, input *elbv2.SetIpAddressTypeInput, optFns ...func(*elbv2.Options)) (*elbv2.SetIpAddressTypeOutput, error) {
+	found := false
 	for _, lb := range m.LoadBalancers {
 		if aws.ToString(lb.LoadBalancerArn) == aws.ToString(input.LoadBalancerArn) {
 			lb.IpAddressType = input.IpAddressType
+			found = true
 			break
 		}
 	}
+	if !found {
+		return nil, fmt.Errorf("load balancer %s not found", aws.ToString(input.LoadBalancerArn))
+	}
 	m.FakeELBV2.IpAddressType = input.IpAddressType
 	return &elbv2.SetIpAddressTypeOutput{IpAddressType: input.IpAddressType}, nil
 }

Also applies to: 2592-2600

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 2517 - 2521, The fake keeps
IpAddressType in two places (the m.LoadBalancers entry and
FakeELBV2.IpAddressType), which can diverge; fix by synchronizing them: when
creating a new LoadBalancer (where newLB is built and stored in m.LoadBalancers)
also set the fake's IpAddressType field (FakeELBV2.IpAddressType) to
input.IpAddressType so they start in sync, and modify SetIpAddressType so it
only mutates FakeELBV2.IpAddressType when it successfully finds and updates a
matching LoadBalancerArn in m.LoadBalancers (do not update the shadow field on
failed/no-match updates).
♻️ Duplicate comments (1)
pkg/providers/v1/aws_test.go (1)

4482-4486: ⚠️ Potential issue | 🟠 Major

Assert the IPv6 ranges too.

The new dual-stack cases pass ec2Ipv6SourceRanges, but the post-checks still only compare rule.IpRanges. A regression that drops rule.Ipv6Ranges would still pass this test.

Suggested fix
 			for _, rule := range tc.rules {
 				if aws.ToString(rule.IpProtocol) == "icmp" {
 					assert.Equal(t, tc.ec2SourceRanges, rule.IpRanges)
+					assert.Equal(t, tc.ec2Ipv6SourceRanges, rule.Ipv6Ranges)
 				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4482 - 4486, The test only asserts
IPv4 ranges (rule.IpRanges) when verifying rules in the loop; update the
post-check to also assert IPv6 ranges by comparing rule.Ipv6Ranges to the test
case's tc.ec2Ipv6SourceRanges for dual-stack cases. Locate the loop that
iterates tc.rules and references aws.ToString(rule.IpProtocol) and, in addition
to the existing assert.Equal(t, tc.ec2SourceRanges, rule.IpRanges), add an
assertion for rule.Ipv6Ranges against tc.ec2Ipv6SourceRanges (only when
tc.ec2Ipv6SourceRanges is set) so a regression that drops IPv6 ranges will fail
the test.
🧹 Nitpick comments (1)
pkg/providers/v1/aws_test.go (1)

4395-4416: empty rule set and empty dual stack rule set are the same case now.

After adding ec2Ipv6SourceRanges to the first row, both rows cover the same input. That removes the IPv4-only empty-rule path and adds redundant test maintenance for no extra signal.

As per coding guidelines, focus on major issues impacting performance, readability, maintainability and security.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_test.go` around lines 4395 - 4416, Two test cases
("empty rule set" and "empty dual stack rule set") are now identical because
ec2Ipv6SourceRanges was added to the first case; revert that so one case
exercises the IPv4-only empty-rule path and the other exercises the dual-stack
empty-rule path. Update the table entries in the test cases for IPPermissionSet
by removing ec2Ipv6SourceRanges from the "empty rule set" case (or alternatively
remove ec2SourceRanges from the "empty dual stack rule set" case) so the rows
are distinct again; reference the test case names "empty rule set" and "empty
dual stack rule set" and the fields ec2SourceRanges and ec2Ipv6SourceRanges to
locate and fix the duplicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 2517-2521: The fake keeps IpAddressType in two places (the
m.LoadBalancers entry and FakeELBV2.IpAddressType), which can diverge; fix by
synchronizing them: when creating a new LoadBalancer (where newLB is built and
stored in m.LoadBalancers) also set the fake's IpAddressType field
(FakeELBV2.IpAddressType) to input.IpAddressType so they start in sync, and
modify SetIpAddressType so it only mutates FakeELBV2.IpAddressType when it
successfully finds and updates a matching LoadBalancerArn in m.LoadBalancers (do
not update the shadow field on failed/no-match updates).

---

Duplicate comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 4482-4486: The test only asserts IPv4 ranges (rule.IpRanges) when
verifying rules in the loop; update the post-check to also assert IPv6 ranges by
comparing rule.Ipv6Ranges to the test case's tc.ec2Ipv6SourceRanges for
dual-stack cases. Locate the loop that iterates tc.rules and references
aws.ToString(rule.IpProtocol) and, in addition to the existing assert.Equal(t,
tc.ec2SourceRanges, rule.IpRanges), add an assertion for rule.Ipv6Ranges against
tc.ec2Ipv6SourceRanges (only when tc.ec2Ipv6SourceRanges is set) so a regression
that drops IPv6 ranges will fail the test.

---

Nitpick comments:
In `@pkg/providers/v1/aws_test.go`:
- Around line 4395-4416: Two test cases ("empty rule set" and "empty dual stack
rule set") are now identical because ec2Ipv6SourceRanges was added to the first
case; revert that so one case exercises the IPv4-only empty-rule path and the
other exercises the dual-stack empty-rule path. Update the table entries in the
test cases for IPPermissionSet by removing ec2Ipv6SourceRanges from the "empty
rule set" case (or alternatively remove ec2SourceRanges from the "empty dual
stack rule set" case) so the rows are distinct again; reference the test case
names "empty rule set" and "empty dual stack rule set" and the fields
ec2SourceRanges and ec2Ipv6SourceRanges to locate and fix the duplicate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7d122468-a5ba-4e3d-9390-e61d252a74d6

📥 Commits

Reviewing files that changed from the base of the PR and between 1abf771 and e0139d8.

📒 Files selected for processing (2)
  • pkg/providers/v1/aws_fakes.go
  • pkg/providers/v1/aws_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/providers/v1/aws_fakes.go

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/providers/v1/aws.go (1)

2014-2020: ⚠️ Potential issue | 🔴 Critical

Create separate IPv6 ICMP permission; IpProtocol: "icmp" with Ipv6Ranges violates AWS EC2 API contract

The code at lines 2014-2020 combines IpProtocol: "icmp" (IPv4) with Ipv6Ranges in a single IpPermission, which AWS EC2 API rejects at runtime. AWS requires IpProtocol: "icmpv6" for IPv6 ICMP ranges. This will fail security group rule authorization when IPv6 CIDRs are present, breaking load balancer ingress reconciliation in dual-stack environments.

Separate IPv4 and IPv6 ICMP rules with conditional insertion:

  • IPv4 ICMP: IpProtocol: "icmp", FromPort: 3, ToPort: 4, IpRanges only
  • IPv6 ICMP: IpProtocol: "icmpv6", FromPort: -1, ToPort: -1, Ipv6Ranges only
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws.go` around lines 2014 - 2020, The current
ec2types.IpPermission construction (permission) mixes IPv4 ICMP ("icmp") with
Ipv6Ranges which violates the EC2 API; split into two conditional IpPermission
objects: one for IPv4 using IpProtocol "icmp", FromPort 3, ToPort 4 and only
IpRanges (ec2SourceRanges), and another for IPv6 using IpProtocol "icmpv6",
FromPort -1, ToPort -1 and only Ipv6Ranges (ec2Ipv6SourceRanges); create and
append the IPv4 permission if ec2SourceRanges is non-empty and create/append the
IPv6 permission if ec2Ipv6SourceRanges is non-empty so authorize calls use the
correct permission objects.
♻️ Duplicate comments (1)
pkg/providers/v1/aws_validations.go (1)

203-206: ⚠️ Potential issue | 🟠 Major

Avoid mutating service.Spec inside validation

validateIPFamilyInfo mutates the caller-owned object (service.Spec.IPFamilies) at Line 203-Line 206. This can leak state beyond validation and skew later reconciliation logic/tests.

Proposed fix
-	// Make sure we have a usable zero value for IPFamilies
-	if service.Spec.IPFamilies == nil {
-		service.Spec.IPFamilies = make([]v1.IPFamily, 0)
-	}
+	// Use local normalized values to avoid mutating caller-owned Service.
+	ipFamilies := service.Spec.IPFamilies
+	if ipFamilies == nil {
+		ipFamilies = []v1.IPFamily{}
+	}
@@
-	ipFamilies := service.Spec.IPFamilies
 	if len(ipFamilies) >= 3 {
 		return fmt.Errorf("ipFamilies requires 1 or 2 entries. got %d", len(ipFamilies))
 	}
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_validations.go` around lines 203 - 206,
validateIPFamilyInfo currently mutates the caller-owned field
service.Spec.IPFamilies by assigning an empty slice when nil; instead, avoid
modifying service.Spec by using a local variable (e.g., ipFamilies :=
service.Spec.IPFamilies) and, if nil, set the local ipFamilies to an empty slice
for validation logic, then run all checks against that local variable; ensure no
writes are made back to service.Spec.IPFamilies and remove the assignment lines
that mutate the struct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/providers/v1/aws_validations_test.go`:
- Around line 752-760: Two test table rows labeled "RequireDualStack with two
entries works" are incorrectly set to v1.IPFamilyPolicyPreferDualStack; update
those rows to use v1.IPFamilyPolicyRequireDualStack so the RequireDualStack
success path is actually tested. Locate the test cases in
pkg/providers/v1/aws_validations_test.go (the table entries with name
"RequireDualStack with two entries works") and change their ipFamilyPolicy
fields from v1.IPFamilyPolicyPreferDualStack to
v1.IPFamilyPolicyRequireDualStack.

---

Outside diff comments:
In `@pkg/providers/v1/aws.go`:
- Around line 2014-2020: The current ec2types.IpPermission construction
(permission) mixes IPv4 ICMP ("icmp") with Ipv6Ranges which violates the EC2
API; split into two conditional IpPermission objects: one for IPv4 using
IpProtocol "icmp", FromPort 3, ToPort 4 and only IpRanges (ec2SourceRanges), and
another for IPv6 using IpProtocol "icmpv6", FromPort -1, ToPort -1 and only
Ipv6Ranges (ec2Ipv6SourceRanges); create and append the IPv4 permission if
ec2SourceRanges is non-empty and create/append the IPv6 permission if
ec2Ipv6SourceRanges is non-empty so authorize calls use the correct permission
objects.

---

Duplicate comments:
In `@pkg/providers/v1/aws_validations.go`:
- Around line 203-206: validateIPFamilyInfo currently mutates the caller-owned
field service.Spec.IPFamilies by assigning an empty slice when nil; instead,
avoid modifying service.Spec by using a local variable (e.g., ipFamilies :=
service.Spec.IPFamilies) and, if nil, set the local ipFamilies to an empty slice
for validation logic, then run all checks against that local variable; ensure no
writes are made back to service.Spec.IPFamilies and remove the assignment lines
that mutate the struct.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0c4f3fc-eb19-4651-90eb-c0874f6b6502

📥 Commits

Reviewing files that changed from the base of the PR and between e0139d8 and 412a5fa.

📒 Files selected for processing (3)
  • pkg/providers/v1/aws.go
  • pkg/providers/v1/aws_validations.go
  • pkg/providers/v1/aws_validations_test.go

Comment on lines +752 to +760
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
expectedError: "",
},
{
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix misconfigured RequireDualStack test cases

At Line 753 and Line 759, the test rows named “RequireDualStack with two entries works” are actually using v1.IPFamilyPolicyPreferDualStack. This leaves the RequireDualStack success path untested.

Proposed fix
 		{
 			name:           "RequireDualStack with two entries works",
-			ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
+			ipFamilyPolicy: v1.IPFamilyPolicyRequireDualStack,
 			ipFamilies:     []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
 			expectedError:  "",
 		},
 		{
 			name:           "RequireDualStack with two entries works",
-			ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
+			ipFamilyPolicy: v1.IPFamilyPolicyRequireDualStack,
 			ipFamilies:     []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
 			expectedError:  "",
 		},
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
expectedError: "",
},
{
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyPreferDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyRequireDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
expectedError: "",
},
{
name: "RequireDualStack with two entries works",
ipFamilyPolicy: v1.IPFamilyPolicyRequireDualStack,
ipFamilies: []v1.IPFamily{v1.IPv6Protocol, v1.IPv4Protocol},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/v1/aws_validations_test.go` around lines 752 - 760, Two test
table rows labeled "RequireDualStack with two entries works" are incorrectly set
to v1.IPFamilyPolicyPreferDualStack; update those rows to use
v1.IPFamilyPolicyRequireDualStack so the RequireDualStack success path is
actually tested. Locate the test cases in
pkg/providers/v1/aws_validations_test.go (the table entries with name
"RequireDualStack with two entries works") and change their ipFamilyPolicy
fields from v1.IPFamilyPolicyPreferDualStack to
v1.IPFamilyPolicyRequireDualStack.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 25, 2026

@nrb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 412a5fa link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 25, 2026

/retest

@nrb
Copy link
Copy Markdown
Author

nrb commented Mar 26, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/cluster-ingress-operator#1376

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/cluster-ingress-operator#1376

Running the dual stack scenario jobs with the ingress operator, as those were failing prior to the latest commit.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 26, 2026

@nrb, testwith: could not generate prow job. ERROR:

no ref for requested test included in command. The org, repo, and branch containing the requested test need to be targeted by at least one of the included PRs

1 similar comment
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 26, 2026

@nrb, testwith: could not generate prow job. ERROR:

no ref for requested test included in command. The org, repo, and branch containing the requested test need to be targeted by at least one of the included PRs

@huali9
Copy link
Copy Markdown

huali9 commented Mar 26, 2026

cc @huali9 for final review and verification.

Thank you @yunjiang29 I'm working on this and try to complete the testing today.

@sadasu
Copy link
Copy Markdown

sadasu commented Mar 26, 2026

// Add IPv6 default range if service supports IPv6.
if serviceRequestsIPv6(apiService) && !contains(sourceCIDRs, "::/0") {
sourceCIDRs = append(sourceCIDRs, "::/0")
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this block will always force ::/0 even if the user defines a restricted set of allowed CIDRs via:

  • service.spec.loadBalancerSourceRanges
  • annotation service.beta.kubernetes.io/load-balancer-source-ranges

Reference: https://github.com/kubernetes/cloud-provider/blob/2e08bbe9b7f1623207c046a892270baf27599b84/service/helpers/helper.go#L61-L85

I think !contains(sourceCIDRs, "::/0") should be replaced with there is no defined allowed IPv6 CIDR, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we can:

ec2SourceRanges, ec2Ipv6SourceRanges := separateIPv4AndIPv6CIDRs(sourceCIDRs)

And check if len(ec2Ipv6SourceRanges) == 0 here, then add ::/0.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these changes should go in. Will incorporate into later work.

Comment on lines +2554 to +2555
if serviceRequestsIPv6(apiService) && !contains(sourceRangeCidrs, "::/0") {
sourceRangeCidrs = append(sourceRangeCidrs, "::/0")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above ☝️

Comment on lines +1135 to +1139
func isIPv6CIDR(cidr string) bool {
prefix, err := netip.ParsePrefix(cidr)
if err != nil {
// If parsing fails, fall back to simple string check for backward compatibility
// This shouldn't happen with valid AWS CIDR blocks, but we handle it gracefully
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can simplify this by using "k8s.io/utils/net", which exposes IsIPv6CIDRString 😁

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case v1.IPFamilyPolicySingleStack:
return !serviceRequestsIPv6(service)
case v1.IPFamilyPolicyPreferDualStack:
return true
Copy link
Copy Markdown
Member

@tthvo tthvo Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SingleStack: Single-stack service. The control plane allocates a cluster IP for the Service, using the first configured service cluster IP range.

PreferDualStack: Allocates both IPv4 and IPv6 cluster IPs for the Service when dual-stack is enabled. If dual-stack is not enabled or supported, it falls back to single-stack behavior.

Reading from docs, my understanding is that if dual-stack is not configured, PreferDualstack will fall back to the primary IP family (i.e. the first configured service cluster IP range). That is:

ipFamilyPolicy: PreDualStack
ipFamilies: ["IPv6", "IPv4"] # Will fall back to IPv6

So, we should also check if the primary IP (i.e. IPFamilies[0]) is IPv4, right? I guess it does not matter much because we never support IPV6-only AWS, but maybe in the "future".

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my comments came a bit late. I just have a few (mostly for edge cases) questions. This looks good overall to me 👍

@sadasu
Copy link
Copy Markdown

sadasu commented Mar 27, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 27, 2026
@huali9
Copy link
Copy Markdown

huali9 commented Mar 27, 2026

job <https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-aws-modern/2037305972635996160 | build 4.22.0-0.ci,#135,openshift/origin#30938> succeeded

DualStackIPv6Primary cluster install successfully with this build and I'm still testing this, will update the result later.

@huali9
Copy link
Copy Markdown

huali9 commented Mar 27, 2026

I tested creating services with the following configurations across both DualStackIPv4Primary and DualStackIPv6Primary clusters:

  • ipFamilyPolicy: RequireDualStack, PreferDualStack, and SingleStack (both IPv4 and IPv6)
  • ipFamilies ordering: ["IPv4","IPv6"] vs ["IPv6","IPv4"]

All test details are documented in the test case:
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-88428

Test Results

On DualStackIPv4Primary Cluster:

  • ipFamilies: ["IPv4","IPv6"] - Works successfully
  • ipFamilies: ["IPv6","IPv4"] - Fails with the following CCM error:
    Error syncing load balancer: failed to ensure load balancer: error trying to register targets in target group:
    "operation error Elastic Load Balancing v2: RegisterTargets, https response error StatusCode: 400,
    RequestID: e5f8406b-9edb-40f0-a015-25cf74579f32, InvalidTarget: The following targets cannot be used
    with this IPv6 target group because of missing primary IPv6 addresses: i-08043fdfb5bf3cfb7,
    i-0fdf34fe17c5bcea2, i-01aeef3411b4976b9, i-01b14c6d42cb95566, i-046f553fbd2f3b4e6, i-0696afe3e41709a8f"

On DualStackIPv6Primary Cluster:

  • ipFamilies: ["IPv4","IPv6"] - Works successfully, LoadBalancer hostname is directly accessible
  • ipFamilies: ["IPv6","IPv4"] - Service is created successfully, but direct access to the LoadBalancer hostname fails. Access only works through a Route.

Questions

Are these two behavioral differences expected?

  1. On DualStackIPv4Primary cluster: Should ipFamilies: ["IPv6","IPv4"] fail with the "missing primary IPv6 addresses" error, or should it succeed? I'm uncertain whether this failure is expected because on the DualStackIPv6Primary cluster, both ["IPv4","IPv6"] and ["IPv6","IPv4"] orderings successfully create the service without errors.

  2. On DualStackIPv6Primary cluster: Should ipFamilies: ["IPv6","IPv4"] require a Route for external access instead of allowing direct LoadBalancer hostname access? This behavior differs from all other tested scenarios where the LoadBalancer hostname is directly accessible.

I'm uncertain whether I should add the verified label based on these results. Please advise on the expected behavior. Thanks!

@huali9
Copy link
Copy Markdown

huali9 commented Mar 27, 2026

Additional Finding: I thought of another test scenario - modifying an existing service's ipFamilyPolicy from SingleStack to dual-stack, or vice versa. When attempting either modification, the operation fails with an IAM permission error:
elasticloadbalancing:SetIpAddressType is not authorized.

Example - modifying from dual-stack to SingleStack:

$ oc edit svc httpd-require-dualstack-ipv4first                                                                                   
 service/httpd-require-dualstack-ipv4first edited

CCM logs show:

I0327 07:04:35.462920       1 aws_loadbalancer.go:273] Updating load balancer a3bf33c22202c4ac2b80b65711b31fd0 IpAddressType from dualstack to ipv4 for default/httpd-require-dualstack-ipv4first
E0327 07:04:35.479465       1 controller.go:302] "Unhandled Error" err="error processing service default/httpd-require-dualstack-ipv4first (retrying with exponential backoff): failed to ensure load balancer: error updating load balancer IpAddressType: \"operation error Elastic Load Balancing v2: SetIpAddressType, https response error StatusCode: 403, RequestID: df28a454-ab70-4b54-b411-bb520e250f1f, api error AccessDenied: User: arn:aws:sts::301721915996:assumed-role/ci-op-txlgv59x-62174-pkh7p-master-role/i-08043fdfb5bf3cfb7 is not authorized to perform: elasticloadbalancing:SetIpAddressType on resource: arn:aws:elasticloadbalancing:ap-northeast-1:301721915996:loadbalancer/net/a3bf33c22202c4ac2b80b65711b31fd0/1bc9d7f024f32fda because no identity-based policy allows the elasticloadbalancing:SetIpAddressType action\"" logger="UnhandledError"
I0327 07:04:35.479514       1 event.go:389] "Event occurred" object="default/httpd-require-dualstack-ipv4first" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: error updating load balancer IpAddressType: \"operation error Elastic Load Balancing v2: SetIpAddressType, https response error StatusCode: 403, RequestID: df28a454-ab70-4b54-b411-bb520e250f1f, api error AccessDenied: User: arn:aws:sts::301721915996:assumed-role/ci-op-txlgv59x-62174-pkh7p-master-role/i-08043fdfb5bf3cfb7 is not authorized to perform: elasticloadbalancing:SetIpAddressType on resource: arn:aws:elasticloadbalancing:ap-northeast-1:301721915996:loadbalancer/net/a3bf33c22202c4ac2b80b65711b31fd0/1bc9d7f024f32fda because no identity-based policy allows the elasticloadbalancing:SetIpAddressType action\""

I'm uncertain whether this is expected behavior or a bug. Should the cluster's IAM role include the elasticloadbalancing:SetIpAddressType permission by default for dual-stack clusters, or is modifying the ipFamilyPolicy of existing services not a supported operation?

@huali9
Copy link
Copy Markdown

huali9 commented Mar 27, 2026

@tthvo answered all the questions and it makes sense to me. no questions from my side now.

/verified by @huali9

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@huali9: This PR has been marked as verified by @huali9.

Details

In response to this:

@tthvo answered all the questions and it makes sense to me. no questions from my side now.

/verified by @huali9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@knobunc
Copy link
Copy Markdown

knobunc commented Mar 27, 2026

/approve

@knobunc knobunc added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 27, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 27, 2026

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2026
@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 27, 2026

@knobunc If possible I'd like to wait for confirmation from upstream this is the right direction (at least getting verbal approval) before merging this carry, for sustainability reasons. @nrb is reaching out to them to confirm

@knobunc
Copy link
Copy Markdown

knobunc commented Mar 27, 2026

@knobunc If possible I'd like to wait for confirmation from upstream this is the right direction (at least getting verbal approval) before merging this carry, for sustainability reasons. @nrb is reaching out to them to confirm

No problem. From our Slack conversation it sounds like you are talking to them at Kubecon, so hopefully this gets resolved today. Thank you for looking at it.

@sadasu
Copy link
Copy Markdown

sadasu commented Mar 27, 2026

@damdo, we are not making any changes to the k8s service API. Based on upstream feedback, we are open to change the implementation. We are not changing the k8s service API here or in the upstream PR.

Once we get the upstream PR merged, we will replace this implementation with what we bring from upstream.

@nrb
Copy link
Copy Markdown
Author

nrb commented Mar 27, 2026

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2026
@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 27, 2026

Thanks @sadasu
@nrb checked directly with upstream and confirmed that no code changes should be needed to the existing upstream PR, so we can proceed with this and adjust if needed.

@openshift-merge-bot openshift-merge-bot bot merged commit e73d6a3 into openshift:main Mar 27, 2026
9 of 11 checks passed
@sadasu
Copy link
Copy Markdown

sadasu commented Mar 27, 2026

Thanks @sadasu @nrb checked directly with upstream and confirmed that no code changes should be needed to the existing upstream PR, so we can proceed with this and adjust if needed.

@damdo and @nrb, thanks for doing the due diligence. Now we know that when we do bring in the upstream changes, we won't have too many surprises given a lot of testing has already happened with this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants