Skip to content

[DNM] Hosted cluster w/ PUDN POC#7773

Draft
RamLavi wants to merge 8 commits intoopenshift:mainfrom
RamLavi:hosted_cluster_poc
Draft

[DNM] Hosted cluster w/ PUDN POC#7773
RamLavi wants to merge 8 commits intoopenshift:mainfrom
RamLavi:hosted_cluster_poc

Conversation

@RamLavi
Copy link
Copy Markdown

@RamLavi RamLavi commented Feb 23, 2026

What this PR does / why we need it:

This PR holds the code changes in order to run a POC of a hosted cluster with PrimaryUDN.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

PR is not suitable for review, and would need further grooming.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • Primary User Defined Network (UDN) support for KubeVirt hosted clusters, with internal apps DNS and an OAuth authentication bridge for in-cluster console access.
    • Cluster bootstrap/sync automation to configure UDN-aware networking and DNS.
  • Bug Fixes

    • ETCD and control plane connectivity improved for KubeVirt; ignition retrieval and service access work via internal DNS/ClusterIP in UDN scenarios.
  • Chores

    • Ignition CA/certificate handling switched to use cluster root CA and internal names; RBAC expanded for EndpointSlice/webhook/admission resources.
  • Tests

    • Conformance and e2e tests updated to cover Primary UDN workflows.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@RamLavi RamLavi marked this pull request as draft February 23, 2026 06:56
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2026
@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Feb 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 23, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7787b724-beef-4010-903e-68c2de36296e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Add Primary UDN support for KubeVirt in HyperShift: new annotations/CLI flags, UDN creation/ensure, etcd EndpointSlice mirroring to UDN IPs, guest internal CoreDNS and OAuth bridge provisioning, VM/network template overrides, ignition/TLS adjustments, RBAC expansions, and bootstrap/sync automation scripts.

Changes

Cohort / File(s) Summary
API & CLI
api/hypershift/v1beta1/hostedcluster_types.go, cmd/cluster/kubevirt/create.go, test/e2e/e2e_test.go, test/e2e/util/options.go
Add Primary UDN annotations/constants, CLI flags and fields, atomic validation (name+subnet), and e2e option propagation.
Bootstrap & Sync scripts
cluster-up.sh, cluster-sync.sh
New scripts to build/publish operator/CPO images, create/teardown clusters with Primary UDN, configure internal DNS, create OAuth bridge, mirror etcd EndpointSlices, and run connectivity checks.
Management UDN ensure & HostedCluster changes
hypershift-operator/controllers/hostedcluster/primaryudn/management_udn.go, hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Ensure UserDefinedNetwork CR, enforce annotation atomicity/immutability, and call EnsureUserDefinedNetwork during HostedCluster/HCP reconciliation.
Primary UDN guest-fixups controller (new)
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/*, control-plane-operator/hostedclusterconfigoperator/cmd.go, control-plane-operator/hostedclusterconfigoperator/operator/config.go
New controller wiring: provision internal CoreDNS in guest, render Corefile, patch DNS operator forwarding, detect management OAuth UDN IP, create guest oauth-bridge Service/Endpoints, and register watches.
etcd UDN EndpointSlice support
control-plane-operator/hostedclusterconfigoperator/controllers/etcd/*, control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
New etcd EndpointSlice reconciler deriving addresses from pod-networks annotations to create UDN-aware EndpointSlices; integrated into HCP reconcile.
Machine EndpointSlice labeling & VM template overrides
control-plane-operator/hostedclusterconfigoperator/controllers/machine/*, hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go, hypershift-operator/controllers/hostedcluster/network_policies.go
Detect Primary UDN via virt-launcher annotations to switch EndpointSlice labels/annotations, apply l2bridge VM interface overrides, remove live-migration annotation, and skip blocking service CIDR in virt-launcher network policies when Primary UDN enabled.
Ignition, TLS, and konnectivity
control-plane-operator/controllers/hostedcontrolplane/v2/assets/ignition-server-proxy/deployment.yaml, hypershift-operator/controllers/nodepool/token.go, control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
Switch ignition serving secret and CA lookup to ignition-server/root-ca; use internal ClusterIP for konnectivity status when Primary UDN enabled.
ETCD StatefulSet adjustments
control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go
Adjust ETCD listen URLs for KubeVirt (IPv4 listen on 0.0.0.0:2379; IPv6 handling updated) to allow UDN routing.
Internal guest DNS utilities
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go, guest_info.go
Provision internal CoreDNS Deployment/Service, render Corefile from host mappings, and helpers to read guest router IP and DNS image.
OAuth bridge utilities
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go, oauth_bridge.go
Compute OAuth route host and management OAuth pod Primary UDN IP; create guest oauth-bridge Service/Endpoints pointing to that IP.
Controller Setup & tests
control-plane-operator/hostedclusterconfigoperator/controllers/etcd/setup.go, .../endpointslices_test.go, .../setup.go (primaryudn)
Setup wiring for etcd and primaryudn controllers, unit test for etcd UDN EndpointSlice creation, and predicates/watches wiring.
RBAC updates
cmd/install/assets/hypershift_operator.go, control-plane-operator/controllers/hostedcontrolplane/v2/assets/hosted-cluster-config-operator/role.yaml, control-plane-operator/.../testdata/*.yaml
Expanded RBAC: add discovery.k8s.io Endpointslices verbs, new userdefinednetworks permissions, reorganized webhook/admission/CSR rules and fixtures updated.
Misc. controller wiring & imports
control-plane-operator/hostedclusterconfigoperator/cmd.go, control-plane-operator/hostedclusterconfigoperator/controllers/etcd/*, control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/*
Register etcd and primaryudn controllers in operator startup and add Mgr watches for relevant resource kinds.
KubeVirt provider image & simplifications
hypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt.go
Replace dynamic provider image override logic with a fixed provider image constant.

Sequence Diagram(s)

sequenceDiagram
  participant Mgr as Management Controller
  participant KubeVirt as KubeVirt infra
  participant HCP as HostedControlPlane (etcd)
  participant Guest as Guest Cluster API
  participant GuestDNS as Guest CoreDNS
  participant OAuthPod as OAuth Pod

  Mgr->>KubeVirt: Ensure UserDefinedNetwork CR (Primary UDN)
  KubeVirt-->>Mgr: UDN exists
  Mgr->>HCP: List etcd pods (read pod-networks annotation)
  Mgr->>HCP: Reconcile etcd UDN EndpointSlices (addresses → UDN IPs)
  Mgr->>Guest: Ensure internal-apps-dns Namespace/Deployment/Service
  Guest->>GuestDNS: Deploy CoreDNS with hosts entries (apps, console, downloads)
  Mgr->>Guest: Patch guest DNS operator to forward apps domain to internal DNS
  Mgr->>OAuthPod: Read pod-networks → get OAuth primary UDN IP
  Mgr->>Guest: Create oauth-bridge Service + Endpoints -> OAuth UDN IP
  GuestDNS-->>Guest: DNS resolves apps and OAuth via internal DNS
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.12% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test violates single responsibility principle by asserting multiple unrelated behaviors: UDN EndpointSlice creation, label/annotation/condition verification, and orphaning logic in one test. Refactor into separate tests: one for UDN EndpointSlice creation, one for endpoint content verification, one for orphaning behavior. Use table-driven approach consistent with codebase patterns.
Title check ❓ Inconclusive The title '[DNM] Hosted cluster w/ PUDN POC' uses the [DNM] prefix indicating 'Do Not Merge', which correctly signals work-in-progress status and aligns with the PR's draft state and do-not-merge label. However, 'PUDN POC' is abbreviation-heavy and vague; it lacks clarity about the primary change. Expand or clarify the title to be more descriptive. Consider: '[WIP] Add Primary UDN support for KubeVirt hosted clusters' or similar to better convey the main change scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing with table-driven tests, not Ginkgo BDD syntax. All test case names are static and deterministic with no dynamic values.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the area/api Indicates the PR includes changes for the API label Feb 23, 2026
@openshift-ci openshift-ci bot requested review from csrwng and enxebre February 23, 2026 06:57
@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Feb 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cluster-up.sh (1)

268-282: ⚠️ Potential issue | 🟡 Minor

Useless check of $? after if already captured exit status.

On line 269, ./create-etcd-mirrors.sh runs inside the if [ -f ... ] block. Line 271 tests $? after the script runs on line 269, but $? at that point reflects the exit code of line 269. However, if set -e is active, a non-zero exit from line 269 would have already terminated the script. If you intend to handle the failure gracefully, run the script with || true and capture the exit code, or use it directly in an if statement.

Proposed fix
 if [ -f "./create-etcd-mirrors.sh" ]; then
-  ./create-etcd-mirrors.sh
-  
-  if [ $? -eq 0 ]; then
+  if ./create-etcd-mirrors.sh; then
     echo ""
     echo "✓ EndpointSlice workaround applied successfully"
   else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-up.sh` around lines 268 - 282, The post-if check of `$?` after
running ./create-etcd-mirrors.sh is redundant and unsafe with set -e; change the
flow to either (A) run the script directly in an if: use `if
./create-etcd-mirrors.sh ...; then` and branch on its success/failure, or (B)
run it with `./create-etcd-mirrors.sh ... || true` and immediately capture its
exit code into a variable (e.g., rc=$?) before checking rc; update the block
around the create-etcd-mirrors.sh invocation to use one of these patterns and
remove the standalone `$?` check.
🟡 Minor comments (13)
api/hypershift/v1beta1/hostedcluster_types.go-273-278 (1)

273-278: ⚠️ Potential issue | 🟡 Minor

Document mutability semantics and platform scope for the new UDN annotations.

Two concerns:

  1. Immutability: PrimaryUDNSubnetAnnotation is especially sensitive — mutating the subnet CIDR after a UserDefinedNetwork has been provisioned would create a divergence between the running UDN and the annotation value. Consider adding a doc comment warning that these annotations are effectively immutable post-cluster-creation (matching the pattern of e.g. SkipReleaseImageValidation).

  2. Platform scope: The comments mention KubeVirt hosted clusters, but both constants are in the general annotation block with no enforcement. Any platform can set these annotations and silently get no effect (or worse, unexpected behavior). A short note in the comment (e.g., "only honoured for KubeVirt platform") would help prevent misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 273 - 278, Update
the comment for PrimaryUDNNameAnnotation and PrimaryUDNSubnetAnnotation to
document mutability and platform scope: note that PrimaryUDNSubnetAnnotation
(and by extension the PrimaryUDNNameAnnotation) should be treated as effectively
immutable after a UserDefinedNetwork is provisioned (mutating the CIDR causes
divergence) and add a short explicit note that these annotations are only
honored for KubeVirt-hosted clusters (other platforms will ignore them);
reference the constants PrimaryUDNSubnetAnnotation and PrimaryUDNNameAnnotation
when adding these doc warnings.
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/guest_info.go-41-49 (1)

41-49: ⚠️ Potential issue | 🟡 Minor

guestDNSImage relies on container position rather than name.

Containers[0] silently couples to the assumption that the DNS container is always first. If a sidecar is ever prepended, or the container order shifts, the function returns the wrong image without raising an error. Lookup by name is more robust.

♻️ Proposed fix
 func guestDNSImage(ctx context.Context, c client.Client) (string, error) {
 	ds := &appsv1.DaemonSet{ObjectMeta: metav1.ObjectMeta{Namespace: "openshift-dns", Name: "dns-default"}}
 	if err := c.Get(ctx, client.ObjectKeyFromObject(ds), ds); err != nil {
 		return "", err
 	}
-	if len(ds.Spec.Template.Spec.Containers) == 0 || ds.Spec.Template.Spec.Containers[0].Image == "" {
-		return "", fmt.Errorf("openshift-dns/dns-default has no container image")
+	for _, c := range ds.Spec.Template.Spec.Containers {
+		if c.Name == "dns" && c.Image != "" {
+			return c.Image, nil
+		}
 	}
-	return ds.Spec.Template.Spec.Containers[0].Image, nil
+	return "", fmt.Errorf("openshift-dns/dns-default: dns container image not found")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/guest_info.go`
around lines 41 - 49, guestDNSImage currently uses
ds.Spec.Template.Spec.Containers[0] which assumes the DNS container is first;
change guestDNSImage to iterate over ds.Spec.Template.Spec.Containers and locate
the container by name (e.g., container.Name == "dns" or the real DNS container
name used in the DaemonSet), validate its Image is non-empty, and return that
image; if no matching container or image is found return a descriptive error.
Ensure you still fetch the DaemonSet via c.Get(ctx,
client.ObjectKeyFromObject(ds), ds) and use the ds variable for the container
search and error messages.
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/util.go-5-13 (1)

5-13: ⚠️ Potential issue | 🟡 Minor

dnsZoneFromHostname has two edge cases that fall through to return host unexpectedly.

  1. Leading dot (i == 0): i > 0 is false, so ".apps.example.com" returns ".apps.example.com" unchanged instead of "apps.example.com".
  2. Trailing dot / FQDN notation (i+1 == len(host)): "apps.example.com." fails the i+1 < len(host) check and returns "apps.example.com." verbatim instead of "example.com.".

Neither is a likely real-world input from an OpenShift ingress domain, but the silent passthrough could cause a DNS zone mismatch downstream without any error.

🛡️ Proposed fix
 func dnsZoneFromHostname(host string) string {
 	if host == "" {
 		return ""
 	}
-	if i := strings.IndexByte(host, '.'); i > 0 && i+1 < len(host) {
-		return host[i+1:]
-	}
-	return host
+	host = strings.TrimSuffix(host, ".")
+	host = strings.TrimPrefix(host, ".")
+	if i := strings.IndexByte(host, '.'); i >= 0 && i+1 < len(host) {
+		return host[i+1:]
+	}
+	return host
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/util.go`
around lines 5 - 13, dnsZoneFromHostname currently returns the original host for
leading-dot or trailing-dot cases; update it to normalize those cases: detect
and strip a single leading '.' before processing, record whether the input had a
trailing '.' and preserve it on the returned zone, then find the first '.' and
return the substring after that label (e.g. for ".apps.example.com" return
"apps.example.com", for "apps.example.com." return "example.com."). Keep the
function name dnsZoneFromHostname and ensure empty-string input still returns
"".
test/e2e/util/options.go-153-154 (1)

153-154: ⚠️ Potential issue | 🟡 Minor

Validate Primary UDN name/subnet pairing early.

The flag text says the subnet is required when the name is set, but Options.Validate doesn’t enforce this pairing, so misconfigurations will fail later and less clearly. Consider validating these fields together.

🔧 Suggested validation (in Options.Validate)
 func (o *Options) Validate() error {
 	var errs []error

+	if o.ConfigurableClusterOptions.KubeVirtPrimaryUDNName != "" &&
+		o.ConfigurableClusterOptions.KubeVirtPrimaryUDNSubnet == "" {
+		errs = append(errs, fmt.Errorf("kubevirt-primary-udn-subnet is required when kubevirt-primary-udn-name is set"))
+	}
+	if o.ConfigurableClusterOptions.KubeVirtPrimaryUDNName == "" &&
+		o.ConfigurableClusterOptions.KubeVirtPrimaryUDNSubnet != "" {
+		errs = append(errs, fmt.Errorf("kubevirt-primary-udn-name is required when kubevirt-primary-udn-subnet is set"))
+	}
+
 	if len(o.LatestReleaseImage) == 0 {
 		errs = append(errs, fmt.Errorf("latest release image is required"))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/options.go` around lines 153 - 154, Options.Validate should
enforce that when KubeVirtPrimaryUDNName is set, KubeVirtPrimaryUDNSubnet must
also be set (and optionally validate subnet format); update the Options.Validate
method to check the pairing of KubeVirtPrimaryUDNName and
KubeVirtPrimaryUDNSubnet, return a clear validation error if name is non-empty
and subnet is empty (and vice versa if you want symmetry), and include the field
names in the error message so misconfiguration is caught early.
cmd/install/assets/hypershift_operator.go-1343-1347 (1)

1343-1347: ⚠️ Potential issue | 🟡 Minor

Redundant RBAC rule — duplicate of the pre-existing rule at lines 1381–1384

The new rule at lines 1343–1347 grants ["get","list","watch","create","update","patch","delete"] on validatingadmissionpolicies and validatingadmissionpolicybindings. The pre-existing rule at lines 1381–1384 already grants rbacv1.VerbAll ("*") on the exact same resources, which is a strict superset. The new rule is entirely redundant and should be removed.

🔧 Proposed fix — remove the redundant rule
-		{
-			APIGroups: []string{"admissionregistration.k8s.io"},
-			Resources: []string{"validatingadmissionpolicies", "validatingadmissionpolicybindings"},
-			Verbs:     []string{"get", "list", "watch", "create", "update", "patch", "delete"},
-		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/install/assets/hypershift_operator.go` around lines 1343 - 1347, Remove
the redundant RBAC rule that lists APIGroups: ["admissionregistration.k8s.io"]
and Resources: ["validatingadmissionpolicies",
"validatingadmissionpolicybindings"] with Verbs
["get","list","watch","create","update","patch","delete"]; it duplicates an
existing rule that already grants rbacv1.VerbAll ("*") on the same resources
(the later rule). Locate the Role/ClusterRole definition containing that rule
and delete this rule entry so only the existing VerbAll rule remains.
cmd/install/assets/hypershift_operator.go-1337-1352 (1)

1337-1352: ⚠️ Potential issue | 🟡 Minor

Indentation inconsistency on new rule blocks will fail gofmt

The three new PolicyRule blocks at lines 1337–1352 are indented at two levels, while all surrounding entries in the Rules slice literal use three levels. Running gofmt will reject or reformat this.

🔧 Proposed fix — align indentation
-	{
-		APIGroups:     []string{"admissionregistration.k8s.io"},
-		Resources:     []string{"validatingwebhookconfigurations"},
-		Verbs:         []string{"delete"},
-		ResourceNames: []string{hyperv1.GroupVersion.Group},
-	},
-	{
-		APIGroups: []string{"admissionregistration.k8s.io"},
-		Resources: []string{"validatingadmissionpolicies", "validatingadmissionpolicybindings"},
-		Verbs:     []string{"get", "list", "watch", "create", "update", "patch", "delete"},
-	},
-	{
-		APIGroups: []string{"certificates.k8s.io"},
-		Resources: []string{"certificatesigningrequests"},
-		Verbs:     []string{"get", "list", "watch"},
-	},
+			{
+				APIGroups:     []string{"admissionregistration.k8s.io"},
+				Resources:     []string{"validatingwebhookconfigurations"},
+				Verbs:         []string{"delete"},
+				ResourceNames: []string{hyperv1.GroupVersion.Group},
+			},
+			{
+				APIGroups: []string{"certificates.k8s.io"},
+				Resources: []string{"certificatesigningrequests"},
+				Verbs:     []string{"get", "list", "watch"},
+			},

(The validatingadmissionpolicies block is omitted above as it should be deleted per the duplicate-rule comment.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/install/assets/hypershift_operator.go` around lines 1337 - 1352, The
three newly added PolicyRule blocks inside the Rules slice in
hypershift_operator.go are indented two levels instead of the three-level
indentation used by surrounding entries; update the indentation of each new
block (the one with Resources: []string{"validatingwebhookconfigurations"}, the
one with Resources: []string{"validatingadmissionpolicies",
"validatingadmissionpolicybindings"}, and the one with Resources:
[]string{"certificatesigningrequests"}) to match the existing three-level
indentation in the Rules slice so that gofmt will not reformat or fail; also
remove the duplicate validatingadmissionpolicies block if applicable per the
duplicate-rule note.
cmd/install/assets/hypershift_operator.go-1333-1336 (1)

1333-1336: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate RBAC rule for validatingadmissionpolicies and validatingadmissionpolicybindings

Lines 1343–1346 introduce an explicit-verb rule for admissionregistration.k8s.io/validatingadmissionpolicies and validatingadmissionpolicybindings. A pre-existing rule at lines 1381–1383 already grants rbacv1.VerbAll on the same resources, making the new rule redundant. Remove lines 1343–1346 to avoid confusion and technical debt.

The k8s.ovn.org API group for userdefinednetworks is correct and verified against the codebase CRD definitions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/install/assets/hypershift_operator.go` around lines 1333 - 1336, Remove
the redundant RBAC PolicyRule that explicitly lists verbs for the
admissionregistration.k8s.io resources "validatingadmissionpolicies" and
"validatingadmissionpolicybindings" (the rule that enumerates
"get","list","watch","create","update","patch","delete"); this duplicate is
unnecessary because an existing rule later grants rbacv1.VerbAll for those same
resources—delete the explicit rule entry in the RBAC rules slice (the block
referencing admissionregistration.k8s.io and the two resources) so only the
existing VerbAll rule remains.
PRIMARY-UDN-CHANGES.md-259-259 (1)

259-259: ⚠️ Potential issue | 🟡 Minor

Add language tags to fenced code blocks and standardize heading style.

markdownlint flagged multiple fenced blocks without a language and a Setext-style heading. Adding language identifiers (e.g., bash, yaml, go) and switching the heading to ATX style will resolve the lint warnings.

Also applies to: 293-293, 298-298, 305-305, 313-313, 357-357, 392-392, 488-488, 646-646, 657-657, 703-703, 802-802

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRIMARY-UDN-CHANGES.md` at line 259, Several fenced code blocks lack language
identifiers and there are Setext-style headings; update each affected fenced
block to include the appropriate language tag (e.g., bash, yaml, go) and convert
Setext-style headings to ATX-style (prefix with #) throughout the document so
markdownlint warnings are resolved; search for all triple-backtick blocks and
Setext headings and edit them to use ```bash / ```yaml / ```go as appropriate
and change underlined headings (=== or ---) to leading-# headings.
PRIMARY-UDN-CHANGES.md-16-41 (1)

16-41: ⚠️ Potential issue | 🟡 Minor

Clarify Primary UDN namespace requirements vs controller behavior.

The controller now derives Primary UDN name/subnet from HostedCluster annotations and enforces the namespace label+annotation at creation time. This section only shows a namespace annotation; consider noting that the label is required at namespace creation (or that it will be set by the controller when the HostedCluster annotations are provided) to avoid setup ambiguity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRIMARY-UDN-CHANGES.md` around lines 16 - 41, Clarify that the namespace must
include both the annotation k8s.ovn.org/primary-user-defined-network and the
required label (the label the controller expects) at creation time, or
explicitly state that the controller will automatically add the label when it
derives the Primary UDN name/subnet from HostedCluster annotations; update the
Namespace section to mention both the annotation and the namespace label
requirement (or that the controller will set the label upon processing
HostedCluster annotations) and reference the HostedCluster and
UserDefinedNetwork resources so readers know where the controller derives the
Primary UDN.
PRIMARY-UDN-CHANGES.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor

Tighten compound modifiers and article usage.

LanguageTool flagged a few compound modifiers that read better hyphenated (e.g., “Primary‑UDN support/hosted cluster”), and the phrase “a init-container” should be “an init-container.”

✏️ Suggested edit for the article usage
-Using `0.0.0.0` is temporary, and should be replaced by a init-container that fetches the primaryUDN IP and sets it as a env var or something like that.
+Using `0.0.0.0` is temporary, and should be replaced by an init-container that fetches the primaryUDN IP and sets it as an env var or something like that.

Also applies to: 16-16, 160-160, 306-306

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRIMARY-UDN-CHANGES.md` at line 3, The text uses unhyphenated compound
modifiers and an incorrect article: replace occurrences like "Primary UDN
support" and "hosted cluster" where they function as compound modifiers with
hyphenated forms (e.g., "Primary-UDN support" and "hosted-cluster" where
appropriate) and change the article "a init-container" to "an init-container";
search the document for the exact phrases "Primary UDN support", "hosted
cluster", and "a init-container" and update them accordingly to fix hyphenation
and the article usage.
cluster-sync.sh-88-96 (1)

88-96: ⚠️ Potential issue | 🟡 Minor

grep "$CLUSTER_NAME" for filtering NodePools could match unintended resources.

If CLUSTER_NAME is e.g. test, this grep will match any NodePool containing "test" in its name. Use a more specific pattern or --field-selector.

Proposed fix
-    for np in $(oc get nodepools.hypershift.openshift.io -n clusters -o name 2>/dev/null | grep "$CLUSTER_NAME"); do
+    for np in $(oc get nodepools.hypershift.openshift.io -n clusters -o name 2>/dev/null | grep -w "$CLUSTER_NAME"); do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-sync.sh` around lines 88 - 96, The loop that lists NodePools uses
grep "$CLUSTER_NAME" which can match substrings; update the filtering so only
NodePools belonging exactly to the target cluster are selected. Replace the grep
step in the for loop (the command using oc get nodepools.hypershift.openshift.io
-n clusters -o name and grep "$CLUSTER_NAME") with a stricter selector:
preferably use oc's --field-selector=metadata.name=<exact-name> when you know
the NodePool names, or at minimum anchor the grep to the resource name (e.g.
grep -E "/${CLUSTER_NAME}(-|$)") so it matches only NodePools with the cluster
name as a discrete token; keep the rest of the loop (echo and oc patch)
unchanged and reference CLUSTER_NAME, the oc get
nodepools.hypershift.openshift.io command, and the grep usage for locating the
change.
control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go-249-260 (1)

249-260: ⚠️ Potential issue | 🟡 Minor

serviceEndpointPorts uses the Service port rather than the target port.

EndpointSlice Port fields should reflect the port the pod is listening on (i.e., targetPort), not the Service-level port. For etcd services these are typically identical, so this works in practice, but it's technically incorrect and will break if the service ever maps port X → targetPort Y.

Proposed fix
 func serviceEndpointPorts(svc *corev1.Service) []discoveryv1.EndpointPort {
 	out := make([]discoveryv1.EndpointPort, 0, len(svc.Spec.Ports))
 	for i := range svc.Spec.Ports {
 		p := svc.Spec.Ports[i]
+		port := p.TargetPort.IntVal
+		if port == 0 {
+			// If targetPort is not set or is a named port, fall back to the service port.
+			port = p.Port
+		}
 		out = append(out, discoveryv1.EndpointPort{
 			Name:     ptr.To(p.Name),
 			Protocol: ptr.To(p.Protocol),
-			Port:     ptr.To(p.Port),
+			Port:     ptr.To(port),
 		})
 	}
 	return out
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go`
around lines 249 - 260, serviceEndpointPorts currently uses
svc.Spec.Ports[i].Port; change it to resolve the Service port's TargetPort and
populate EndpointPort with the pod-facing port: for each p in svc.Spec.Ports use
p.TargetPort when set (if TargetPort.IsInt() set Port to
int32(TargetPort.IntVal) and set Name to p.Name/protocol as appropriate; if
TargetPort is a string, set EndpointPort.Name to that string and leave Port
nil), otherwise fall back to p.Port; update the creation of
discoveryv1.EndpointPort in serviceEndpointPorts to use p.TargetPort (and ptr.To
conversions) instead of p.Port so EndpointSlice ports reflect the pod target
port.
control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go-269-295 (1)

269-295: ⚠️ Potential issue | 🟡 Minor

ReconcileUDNEndpointSlices discards the shouldRequeue signal.

Line 278 captures desiredEtcdEndpointsFromPods(...) but ignores the fourth return value (shouldRequeue). When pods exist but primary UDN annotations aren't available yet, the function returns false, true for ok and shouldRequeue respectively, and ReconcileUDNEndpointSlices silently returns nil without retrying.

Since the caller is the main HostedControlPlaneReconciler.Reconcile() method (a K8s reconciler that will be re-triggered by watches and periodic reconciliation), the function will be called again shortly and the annotations will likely be available then. However, explicitly returning and checking the shouldRequeue signal would be more robust defensive programming. Consider returning a boolean or error so the caller can decide to requeue immediately if desired.

🧹 Nitpick comments (15)
control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go (1)

507-510: Optional: loop-variable copy is no longer needed in Go 1.22+.

The pod := p copy (and the identical pattern in the surrounding loops) was required before Go 1.22 to avoid the loop-variable capture bug. Since Go 1.22 each iteration gets its own variable, making these copies redundant and the // golang bug referencing for loop vars comments misleading. This is consistent with the rest of the file, so the fix can be deferred, but it is worth a cleanup pass if the module requires Go ≥ 1.22.

♻️ Proposed cleanup (apply consistently across all four loops in this block)
-		for _, p := range tc.infraPods {
-			pod := p // golang bug referencing for loop vars
-			kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &pod)
-		}
-		for _, vm := range tc.virtualMachines {
-			virtualMachine := vm // golang bug referencing for loop vars
-			kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &virtualMachine)
-		}
+		for i := range tc.infraPods {
+			kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &tc.infraPods[i])
+		}
+		for i := range tc.virtualMachines {
+			kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &tc.virtualMachines[i])
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go`
around lines 507 - 510, The loop-variable copy "pod := p" (and identical copies
in the three neighboring loops) is redundant with Go ≥1.22; remove the temporary
assignment and the misleading comment and append the loop variable directly to
kubevirtInfraClusterObjects (and the equivalent slices in the other three loops)
so each loop simply uses p when calling append.
control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go (2)

202-203: API call inside CreateOrUpdate mutate — hoist it outside

primaryUDNNetworkName performs a kubevirtInfraClient.List inside the mutate closure. The mutate is invoked once per IP family (twice per service per reconcile) and would be called again on any conflict retry. The value is also identical for all IP families of the same service.

Consider computing this once per service in reconcileKubevirtPassthroughService and passing it as a parameter to reconcileKubevirtPassthroughServiceEndpointsByIPFamily.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`
around lines 202 - 203, primaryUDNNetworkName is calling
kubevirtInfraClient.List inside the CreateOrUpdate mutate closure (used by
reconcileKubevirtPassthroughServiceEndpointsByIPFamily), causing repeated API
calls per IP family and on conflict retries; move the call out of the mutate by
computing udnNetworkName/isPrimaryUDN once in
reconcileKubevirtPassthroughService (using cpService.Namespace) and add
udnNetworkName and isPrimaryUDN as parameters to
reconcileKubevirtPassthroughServiceEndpointsByIPFamily so the mutate closure no
longer invokes primaryUDNNetworkName/kubevirtInfraClient.List.

247-250: primaryUDNNetworkName is a no-op wrapper — consider removing it

The function unconditionally delegates to detectPrimaryUDNNetworkNameFromVirtLauncher with no added logic. If future dispatch (e.g., cache lookup) is not planned in the short term, inline the call directly to reduce indirection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`
around lines 247 - 250, primaryUDNNetworkName is a dead wrapper around
detectPrimaryUDNNetworkNameFromVirtLauncher on reconciler; remove
primaryUDNNetworkName and update all call sites to call
r.detectPrimaryUDNNetworkNameFromVirtLauncher(ctx, namespace) directly (or, if
you prefer to keep an indirection, add a TODO comment explaining planned future
logic), ensuring the reconciler type still compiles and no references to
primaryUDNNetworkName remain.
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)

1639-1648: Use the KonnectivityServerPort constant instead of hardcoding 8091.

Replace the hardcoded port with kas.KonnectivityServerPort to align with usage elsewhere in the codebase and prevent drift if the port changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 1639 - 1648, The current branch hardcodes the konnectivity port to
8091 when detecting a KubeVirt Primary UDN (within the block checking
hcp.Spec.Platform.Type == hyperv1.KubevirtPlatform and the annotation
"hypershift.openshift.io/primary-udn" == "true"), which risks drift; update the
assignment of port (the variable used alongside host =
fmt.Sprintf("konnectivity-server.%s.svc.cluster.local", hcp.Namespace)) to use
the shared constant kas.KonnectivityServerPort instead of the literal 8091 so
the controller uses the centralized constant everywhere.
hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go (1)

174-180: Comment contradicts the actual implementation

Line 174 unconditionally adds kubevirt.io/allow-pod-bridge-network-live-migration, yet the comment on lines 175–180 says "For now, we omit this annotation entirely." The real implementation is add-then-remove: the annotation is added here and deleted by applyPrimaryUDNVMTemplateOverrides for Primary UDN clusters. The comment also says "When Primary UDN support is added, this should be conditionally added…" — but Primary UDN support is being added in this very PR.

📝 Suggested comment update
 			Annotations: map[string]string{
-				"kubevirt.io/allow-pod-bridge-network-live-migration": "",
-				// NOTE: DO NOT add kubevirt.io/allow-pod-bridge-network-live-migration for Primary UDN!
-				// This annotation causes OVN-K CNI to skip IP configuration on eth0,
-				// which breaks route discovery for l2bridge binding on Primary UDN.
-				// Live migration on Primary UDN is not supported anyway.
-				// For now, we omit this annotation entirely. When Primary UDN support is added,
-				// this should be conditionally added only for non-UDN clusters.
+				// Allow live migration over the pod network for non-Primary-UDN clusters.
+				// NOTE: This annotation is removed for Primary UDN clusters by
+				// applyPrimaryUDNVMTemplateOverrides, because OVN-K CNI skips eth0 IP
+				// configuration when it is present, breaking l2bridge route discovery.
+				"kubevirt.io/allow-pod-bridge-network-live-migration": "",
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go` around lines
174 - 180, The comment is inaccurate: the code currently unconditionally adds
the "kubevirt.io/allow-pod-bridge-network-live-migration" annotation and it is
later removed by applyPrimaryUDNVMTemplateOverrides for Primary UDN clusters;
update the comment to reflect that flow (annotation is added here and
conditionally removed by applyPrimaryUDNVMTemplateOverrides) and note that
Primary UDN support is being introduced in this PR so the conditional removal is
intentional rather than "omitted entirely", and mention the annotation key and
the applyPrimaryUDNVMTemplateOverrides function name so future readers can trace
the behavior.
cluster-up.sh (3)

61-62: Declare and assign separately to avoid masking return values (SC2155).

If date fails, the non-zero exit code is masked by export.

Proposed fix
-export OP_IMAGE_TAG="udn-fix-$(date +%Y%m%d-%H%M%S)"
-export OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}"
+OP_IMAGE_TAG="udn-fix-$(date +%Y%m%d-%H%M%S)"
+export OP_IMAGE_TAG
+OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}"
+export OP_IMAGE

Apply the same pattern at lines 108-109 for CPO_IMAGE_TAG / CPO_IMAGE.

Also applies to: 108-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-up.sh` around lines 61 - 62, The exports for OP_IMAGE_TAG/OP_IMAGE
(and similarly CPO_IMAGE_TAG/CPO_IMAGE) mask non-zero exit codes from command
substitution; instead assign the tag and image to variables first (e.g., set
OP_IMAGE_TAG="$(date +%Y%m%d-%H%M%S)" and
OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}") and then export them in a separate
export statement, doing the same for CPO_IMAGE_TAG/CPO_IMAGE so a failing date
command won't be hidden by export.

122-124: Inconsistent indentation in the CPO push else branch.

Line 123 is missing 2 spaces of indentation compared to the equivalent block for the operator image push (line 77).

Proposed fix
 else
-podman push ${CPO_IMAGE}
+  podman push ${CPO_IMAGE}
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-up.sh` around lines 122 - 124, The else branch that pushes the CPO
image (podman push ${CPO_IMAGE}) is misindented; change its indentation to match
the operator image push block (add the two missing spaces before the podman push
line) so the CPO_IMAGE push line aligns with the equivalent operator-image push
lines and retains consistent shell script formatting.

5-5: Hardcoded user-specific paths and registry references.

Lines 5, 63, 67, 113 embed developer-specific paths (/root/ralavi/..., /root/.kcli/...) and lines 60, 107 use a personal Quay repo. For a POC this is understandable, but if this script is ever promoted to CI or shared usage, these should be parameterized via environment variables with defaults (similar to how cluster-sync.sh does it with : "${VAR:=default}").

Also applies to: 60-63, 107-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-up.sh` at line 5, The script hardcodes user-specific paths and
registry references (e.g., the KUBECONFIG export and the personal Quay repo
usages around the lines exporting KUBECONFIG and resource/registry variables);
change these to environment-configurable variables with sensible defaults using
the shell pattern : "${VAR:=default}" (for example introduce KUBECONFIG_PATH,
REGISTRY_REPO, USER_HOME or similar), replace direct occurrences like export
KUBECONFIG=/root/.kcli/... and any /root/ralavi/... and quay.io/ralavi/...
references with the corresponding variables (KUBECONFIG, REGISTRY_REPO, etc.),
and update any places that reference those paths to use the variables so the
script can be overridden in CI or by other users.
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go (2)

48-73: Spec.Selector is immutable on Deployments after creation—setting it inside CreateOrUpdate is a known footgun.

Because the labels are constant ("app": internalAppsDNSDeploy), this works in practice. However, if the label value ever changes, the update will be rejected by the API server. A defensive approach is to only set Selector during creation (check deploy.CreationTimestamp.IsZero()).

Defensive pattern
 	if _, err := controllerutil.CreateOrUpdate(ctx, c, deploy, func() error {
 		labels := map[string]string{"app": internalAppsDNSDeploy}
 		deploy.Spec.Replicas = ptr.To[int32](1)
-		deploy.Spec.Selector = &metav1.LabelSelector{MatchLabels: labels}
+		if deploy.CreationTimestamp.IsZero() {
+			deploy.Spec.Selector = &metav1.LabelSelector{MatchLabels: labels}
+		}
 		deploy.Spec.Template.ObjectMeta.Labels = labels
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go`
around lines 48 - 73, The Deployment's Spec.Selector should only be set on
creation to avoid immutable field errors; inside the
controllerutil.CreateOrUpdate callback (where deploy is the *appsv1.Deployment),
keep setting deploy.Spec.Template.ObjectMeta.Labels and
deploy.Spec.Template.Spec.* always, but guard setting deploy.Spec.Selector =
&metav1.LabelSelector{MatchLabels: labels} so it runs only when
deploy.CreationTimestamp.IsZero(); use the existing labels map
(map[string]string{"app": internalAppsDNSDeploy}) and keep replicas, containers,
volumes updates unchanged.

54-63: Deployment spec overwrites the entire Containers and Volumes slices on every reconcile.

This means any out-of-band modifications (e.g., resource limits added by an admission webhook or LimitRange) will be reverted every reconciliation cycle. For a POC this is acceptable, but for production you'd want to merge rather than replace. Noting for awareness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go`
around lines 54 - 63, The code currently replaces
deploy.Spec.Template.Spec.Containers and Volumes wholesale which clobbers
out-of-band changes; modify the reconcile to search
deploy.Spec.Template.Spec.Containers for a container named "coredns" and update
its Image, Args, Ports and VolumeMounts (using dnsImage and
internalAppsDNSPort), or append a new "coredns" container if not found, and
similarly merge deploy.Spec.Template.Spec.Volumes by adding the "config" volume
only if it doesn't already exist—this preserves any existing containers,
resource limits, or webhook-injected fields instead of overwriting the slices.
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go (2)

63-83: Map iteration is non-deterministic; first match wins.

primaryUDNIPFromPodNetworks iterates over a map[string]podNetworksEntry. If a pod somehow has multiple non-default primary-role entries (unlikely but possible with misconfiguration), Go's map iteration order is random, so the returned IP could vary across calls. For a POC this is fine, but worth a note if this is promoted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go`
around lines 63 - 83, primaryUDNIPFromPodNetworks currently iterates a map m
(map[string]podNetworksEntry) which yields a nondeterministic first match;
instead, collect all valid candidate entries (respecting checks for v.Role ==
ovnPrimaryRole, networkKey != "default", non-empty/parsable IPAddress), store
them in a slice with their networkKey or IP, sort that slice deterministically
(e.g., lexicographically by networkKey or IP), then return the first element
from the sorted list; update function primaryUDNIPFromPodNetworks to perform
collection, sorting, and deterministic selection so repeated calls return a
stable IP even if multiple primary-role entries exist.

24-27: Duplicate struct definition; function signatures differ significantly.

podNetworksEntry and podNetworksAnnotationEntry are identical and should be unified. However, primaryUDNIPFromPodNetworks and primaryUDNInfoFromPodNetworks differ in return type, return values, and behavior—the mgmt_oauth version validates IPs with netip.ParseAddr while the etcd version does not, and the etcd version returns additional metadata. Consider extracting the shared struct to a common package, but keep the functions separate due to their distinct purposes and return signatures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go`
around lines 24 - 27, The two identical types podNetworksEntry and
podNetworksAnnotationEntry should be consolidated into a single shared type in a
common package (e.g., move the struct to a new/shared package and export it) and
replace both local definitions with that shared type; keep the functions
primaryUDNIPFromPodNetworks and primaryUDNInfoFromPodNetworks separate because
their return shapes and behaviors differ, but update them to reference the new
shared type; ensure primaryUDNIPFromPodNetworks continues to validate with
netip.ParseAddr while primaryUDNInfoFromPodNetworks preserves its additional
metadata return values and non-validating behavior, and update imports/usages
accordingly.
control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go (2)

53-96: Corefile is written twice per reconciliation—once without OAuth and once with.

ensureInternalAppsDNSCorefile is called at line 62 with only the router hosts, then again at line 89 after adding the OAuth entry to the same hosts map. If the OAuth path fails between lines 62 and 89, the corefile will persist without the OAuth entry until the next successful reconcile.

Consider deferring the corefile write until all hosts are assembled (moving the first call to after line 88), so the corefile is only written once and is always complete. The same applies to ensureDNSOperatorForwarding (lines 65 and 94).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go`
around lines 53 - 96, The reconcile currently calls
ensureInternalAppsDNSCorefile and ensureDNSOperatorForwarding early (using only
router hosts) then again after adding OAuth, which can leave a partial corefile
if OAuth steps fail; remove the first calls at the top and instead assemble the
full hosts map (populate consoleHost, canaryHost, downloadsHost, then call
mgmtOAuthRouteHost, mgmtOAuthPrimaryUDNIP and ensureGuestOAuthBridge to add
oauthHost=>oauthBridgeSvcIP) and invoke ensureInternalAppsDNSCorefile(ctx,
r.guestClient, hosts) and ensureDNSOperatorForwarding(ctx, r.guestClient, zones,
internalUpstream) once after oauth entries are added so the corefile and
forwarding rules are written atomically; update any requeue behavior around
mgmtOAuthRouteHost, mgmtOAuthPrimaryUDNIP, and ensureGuestOAuthBridge to return
early without having written partial state.

24-38: Errors are logged at Info level and silently swallowed during requeue.

This is a reasonable pattern for progressive bootstrapping. However, the err value is discarded—logging the error message would aid debugging when the cause isn't simply "not yet created."

Proposed improvement
 	ingressDomain, err := guestIngressDomain(ctx, r.guestClient)
 	if err != nil {
-		log.Info("waiting for guest ingress domain")
+		log.Info("waiting for guest ingress domain", "error", err)
 		return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
 	}

Apply the same pattern to the other log.Info("waiting for ...") blocks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go`
around lines 24 - 38, The current waits drop the error value; update the three
blocks (calls to guestIngressDomain, guestRouterInternalClusterIP,
guestDNSImage) so that instead of calling log.Info("waiting for ...") you log
the actual error (e.g. log.Error(err, "waiting for guest ingress domain") /
log.Error(err, "waiting for guest router-internal ClusterIP") / log.Error(err,
"waiting for guest dns-default DaemonSet")), then keep returning the same
ctrl.Result{RequeueAfter: 30 * time.Second}, nil; this ensures the error from
guestIngressDomain, guestRouterInternalClusterIP and guestDNSImage is recorded
for debugging while preserving the requeue behavior.
cluster-sync.sh (1)

446-446: Hardcoded CoreDNS image tag v4.17.

The Go controller (internal_dns.go) dynamically fetches the DNS image from the guest cluster's dns-default DaemonSet, but this shell script hardcodes registry.redhat.io/openshift4/ose-coredns-rhel9:v4.17. This will become stale as clusters upgrade.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-sync.sh` at line 446, Hardcoded CoreDNS image
"registry.redhat.io/openshift4/ose-coredns-rhel9:v4.17" will become stale;
replace that literal with a variable populated by querying the guest cluster's
dns-default DaemonSet like the Go controller does. Before using the image value
in cluster-sync.sh, run a kubectl (against the guest cluster context/namespace)
to get the image from the dns-default DaemonSet (jsonpath to
.spec.template.spec.containers[?(@.name=="coredns")].image or similar) assign it
to DNS_IMAGE, fall back to a sensible default only if the query fails, and use
$DNS_IMAGE in place of the hardcoded string.

Comment on lines +268 to +282
if [ -f "./create-etcd-mirrors.sh" ]; then
./create-etcd-mirrors.sh

if [ $? -eq 0 ]; then
echo ""
echo "✓ EndpointSlice workaround applied successfully"
else
echo ""
echo "WARNING: EndpointSlice workaround script failed"
echo "You may need to run it manually: ./create-etcd-mirrors.sh"
fi
else
echo "ERROR: create-etcd-mirrors.sh not found in current directory"
echo "Please ensure the script is present and run it manually"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

set -e causes the script to exit before $? can be checked.

Same issue as in cluster-up.sh: if ./create-etcd-mirrors.sh fails, set -e terminates the script before reaching line 271.

Proposed fix
 if [ -f "./create-etcd-mirrors.sh" ]; then
-  ./create-etcd-mirrors.sh
-  
-  if [ $? -eq 0 ]; then
+  if ./create-etcd-mirrors.sh; then
     echo ""
     echo "✓ EndpointSlice workaround applied successfully"
   else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-sync.sh` around lines 268 - 282, The problem is that set -e causes
the script to exit when ./create-etcd-mirrors.sh fails so the subsequent check
of $? never runs; fix this by invoking the script inside a conditional (or
capturing its exit immediately) so the non-zero exit is handled instead of
aborting: inside cluster-sync.sh keep the file-exists test for
create-etcd-mirrors.sh, then replace the separate "./create-etcd-mirrors.sh" +
"if [ $? -eq 0 ]; then" pattern with "if ./create-etcd-mirrors.sh; then" (or run
"./create-etcd-mirrors.sh; rc=$?; if [ $rc -eq 0 ]; then") and move the
success/warning echoes into the then/else branches so set -e won’t prematurely
terminate.

Comment on lines +819 to +830
if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then
echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}"
echo ""
echo "To access the guest cluster:"
echo " export KUBECONFIG=${GUEST_KUBECONFIG}"
echo " oc get nodes"
else
echo "Guest cluster kubeconfig not extracted yet."
echo "To extract it manually:"
echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm"
echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

GUEST_KUBECONFIG may be unset here, causing an abort under set -u.

GUEST_KUBECONFIG is only assigned inside the conditional block starting at line 327. If the kubeconfig secret wasn't found or the block was skipped, this variable is unset and line 819's ${GUEST_KUBECONFIG} will trigger an "unbound variable" error.

Proposed fix
-if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then
+if [ -n "${GUEST_KUBECONFIG:-}" ] && [ -f "${GUEST_KUBECONFIG:-}" ]; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then
echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}"
echo ""
echo "To access the guest cluster:"
echo " export KUBECONFIG=${GUEST_KUBECONFIG}"
echo " oc get nodes"
else
echo "Guest cluster kubeconfig not extracted yet."
echo "To extract it manually:"
echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm"
echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig"
fi
if [ -n "${GUEST_KUBECONFIG:-}" ] && [ -f "${GUEST_KUBECONFIG:-}" ]; then
echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}"
echo ""
echo "To access the guest cluster:"
echo " export KUBECONFIG=${GUEST_KUBECONFIG}"
echo " oc get nodes"
else
echo "Guest cluster kubeconfig not extracted yet."
echo "To extract it manually:"
echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm"
echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-sync.sh` around lines 819 - 830, The block prints and uses
GUEST_KUBECONFIG which may be unset under set -u; ensure it cannot cause an
"unbound variable" error by either initializing GUEST_KUBECONFIG='' earlier or
using safe parameter expansion everywhere it's referenced (e.g.,
${GUEST_KUBECONFIG:-}) and in tests (use [ -n "${GUEST_KUBECONFIG:-}" ] and [ -f
"${GUEST_KUBECONFIG:-}" ]), and similarly use ${GUEST_KUBECONFIG:-} in the
echo/export lines so the script won’t abort if the variable was never assigned.

echo "Decrease hosted cluster Prometheus retention period to 8 hours to prevent over logging..."
oc --kubeconfig "${CLUSTER_NAME}-kubeconfig" -n openshift-monitoring patch prometheus k8s --type=merge -p '{"spec":{"retention":"8h"}}'

# oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Commented-out line contains credentials.

Even in a comment, hardcoded credentials (-u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU) should not be committed to version control.

Proposed fix
-# oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU
+# oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p <REDACTED>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster-sync.sh` at line 866, The commented-out oc login line in
cluster-sync.sh ("oc login --server=https://api.hypershift.qinqon.corp:6443 -u
kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU") contains hardcoded credentials and must
be removed; replace it with a non-sensitive placeholder or reference
environment/secret values (e.g., use KUBECONFIG, OCP_USER/OCP_PASS env vars, or
a secret manager call) and add a brief comment explaining how to supply
credentials securely instead of embedding them in the file.

Comment on lines +74 to 82
} else if isKubeVirt {
// IPv4 KubeVirt: listen on all interfaces
// Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately
// Adding localhost would cause "bind: address already in use" error
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_CLIENT_URLS",
Value: "https://[$(POD_IP)]:2379,https://localhost:2379",
})
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_METRICS_URLS",
Value: "https://[::]:2382",
Value: "https://0.0.0.0:2379",
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the file structure and read the complete statefulset.go file
cd control-plane-operator/controllers/hostedcontrolplane/v2/etcd
wc -l statefulset.go

Repository: openshift/hypershift

Length of output: 82


🏁 Script executed:

# Read the full statefulset.go to see the context around lines 74-82 and the IPv6 branch
cat -n statefulset.go | head -100

Repository: openshift/hypershift

Length of output: 110


🏁 Script executed:

# Search for IPv6 branch and ETCD_LISTEN_METRICS_URLS
rg -n 'IPv6|ETCD_LISTEN_METRICS_URLS|isKubeVirt' statefulset.go

Repository: openshift/hypershift

Length of output: 118


🏁 Script executed:

# Look for base manifest files (YAML) in the etcd directory
cd /tmp && find /repo -path '*/etcd/*' -name '*.go' -o -name '*.yaml' | grep -i etcd | head -20
ls -la /repo/control-plane-operator/controllers/hostedcontrolplane/v2/etcd/

Repository: openshift/hypershift

Length of output: 220


🏁 Script executed:

# Check git history for this file
cd /repo && git log --oneline --all -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -20

Repository: openshift/hypershift

Length of output: 119


🏁 Script executed:

# Look for any manifest content embedded in the code or separate files
rg -n 'ETCD_LISTEN_PEER_URLS|ETCD_LISTEN_METRICS_URLS' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/

Repository: openshift/hypershift

Length of output: 420


🏁 Script executed:

# Read the full statefulset.go file to understand all branches
cat -n control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go

Repository: openshift/hypershift

Length of output: 6977


🏁 Script executed:

# Search for all branch conditions and understand the structure
rg -B5 -A10 'if isKubeVirt' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go

Repository: openshift/hypershift

Length of output: 1140


🏁 Script executed:

# Check git log for recent changes to this file
git log --oneline -20 -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go

Repository: openshift/hypershift

Length of output: 127


🏁 Script executed:

# Check a recent commit to see what changed
git show HEAD:control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -100

Repository: openshift/hypershift

Length of output: 3313


🏁 Script executed:

# Search for base manifests or embedded YAML with ETCD environment variables
find control-plane-operator -name '*.go' -o -name '*.yaml' | xargs grep -l 'ETCD_LISTEN' 2>/dev/null | head -20

Repository: openshift/hypershift

Length of output: 772


🏁 Script executed:

# Look for manifest files in etcd component directory
ls -la control-plane-operator/controllers/hostedcontrolplane/v2/etcd/

Repository: openshift/hypershift

Length of output: 436


🏁 Script executed:

# Check recent git diff to see what was actually changed in the commit
git show c731b58de -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -150

Repository: openshift/hypershift

Length of output: 5516


🏁 Script executed:

# Search for any reference to base etcd manifests or templates
rg -i 'embed|manifest' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/

Repository: openshift/hypershift

Length of output: 1095


🏁 Script executed:

# Check if there's an etcd manifest file
find control-plane-operator/controllers/hostedcontrolplane -name '*etcd*' -type f | head -20

Repository: openshift/hypershift

Length of output: 2799


🏁 Script executed:

# Read the base etcd statefulset manifest to see default environment variables
cat control-plane-operator/controllers/hostedcontrolplane/v2/assets/etcd/statefulset.yaml | head -150

Repository: openshift/hypershift

Length of output: 4495


🏁 Script executed:

# Check component.go to understand how the manifest is loaded
cat control-plane-operator/controllers/hostedcontrolplane/v2/etcd/component.go

Repository: openshift/hypershift

Length of output: 2110


🏁 Script executed:

# Look at one of the test fixtures to see the actual rendered output
cat control-plane-operator/controllers/hostedcontrolplane/testdata/etcd/zz_fixture_TestControlPlaneComponents_etcd_statefulset.yaml | grep -A 50 'ETCD_LISTEN'

Repository: openshift/hypershift

Length of output: 2095


🏁 Script executed:

# Search for ETCD_LISTEN defaults in all files
rg 'ETCD_LISTEN_PEER_URLS' control-plane-operator/controllers/hostedcontrolplane/

Repository: openshift/hypershift

Length of output: 999


ETCD_LISTEN_PEER_URLS is missing for KubeVirt (both IPv4 and IPv6) — incoming peer connections will fail due to the UDN multi-interface setup.

The base manifest sets ETCD_LISTEN_PEER_URLS to https://$(POD_IP):2380, which binds to a single pod IP. When pods have multiple interfaces with the default route through UDN (the entire rationale for this change), incoming peer-to-peer connections from other etcd members via the UDN interface will be silently dropped. Both IPv4 and IPv6 KubeVirt branches need to override this.

🔧 Proposed fix
 } else if isKubeVirt {
     // IPv4 KubeVirt: listen on all interfaces
     // Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately
     // Adding localhost would cause "bind: address already in use" error
+    util.UpsertEnvVar(c, corev1.EnvVar{
+        Name:  "ETCD_LISTEN_PEER_URLS",
+        Value: "https://0.0.0.0:2380",
+    })
     util.UpsertEnvVar(c, corev1.EnvVar{
         Name:  "ETCD_LISTEN_CLIENT_URLS",
         Value: "https://0.0.0.0:2379",
     })
 }

And for IPv6 KubeVirt (lines 48–58):

 if isKubeVirt {
     // IPv6 KubeVirt: listen on all interfaces
     // Note: [::] includes localhost, so no need to specify it separately
+    util.UpsertEnvVar(c, corev1.EnvVar{
+        Name:  "ETCD_LISTEN_PEER_URLS",
+        Value: "https://[::]:2380",
+    })
     util.UpsertEnvVar(c, corev1.EnvVar{
         Name:  "ETCD_LISTEN_CLIENT_URLS",
         Value: "https://[::]:2379",
     })

Note: The base manifest already sets ETCD_LISTEN_METRICS_URLS to https://0.0.0.0:2382 for IPv4 (and IPv6 KubeVirt explicitly sets it to https://[::]:2382), so the metrics asymmetry does not require action.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if isKubeVirt {
// IPv4 KubeVirt: listen on all interfaces
// Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately
// Adding localhost would cause "bind: address already in use" error
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_CLIENT_URLS",
Value: "https://[$(POD_IP)]:2379,https://localhost:2379",
})
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_METRICS_URLS",
Value: "https://[::]:2382",
Value: "https://0.0.0.0:2379",
})
}
} else if isKubeVirt {
// IPv4 KubeVirt: listen on all interfaces
// Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately
// Adding localhost would cause "bind: address already in use" error
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_PEER_URLS",
Value: "https://0.0.0.0:2380",
})
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_CLIENT_URLS",
Value: "https://0.0.0.0:2379",
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go`
around lines 74 - 82, The KubeVirt branches are missing ETCD_LISTEN_PEER_URLS so
peers will bind to a single POD_IP and fail on multi-interface UDN setups; add
an util.UpsertEnvVar call in the same IPv4 and IPv6 KubeVirt branches to set
ETCD_LISTEN_PEER_URLS to a wildcard bind (for IPv4 use "https://0.0.0.0:2380",
for IPv6 use "https://[::]:2380"), mirroring how ETCD_LISTEN_CLIENT_URLS is set
(use the same util.UpsertEnvVar(c, corev1.EnvVar{ Name: "ETCD_LISTEN_PEER_URLS",
Value: "<appropriate value>" }) pattern).

Comment on lines +46 to +54
if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &corev1.Pod{}, enqueueHCP, isEtcdPodInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd Pods: %w", err)
}
if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &corev1.Service{}, enqueueHCP, isEtcdServiceInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd Services: %w", err)
}
if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &discoveryv1.EndpointSlice{}, enqueueHCP, isEtcdEndpointSliceInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd EndpointSlices: %w", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for Kind function signature in controller-runtime
rg -n "func Kind\[" --type go vendor/sigs.k8s.io/controller-runtime/pkg/source/ 2>/dev/null || \
rg -n "func Kind\[" --type go . | grep -i "controller-runtime" | head -20

Repository: openshift/hypershift

Length of output: 158


🏁 Script executed:

#!/bin/bash
# Find the actual Kind implementation in the codebase or vendor
fd -t f "source.go" vendor/sigs.k8s.io/controller-runtime/pkg/source/ 2>/dev/null | head -5

Repository: openshift/hypershift

Length of output: 122


🏁 Script executed:

#!/bin/bash
# Check if vendor directory exists and show structure
if [ -d "vendor/sigs.k8s.io/controller-runtime/pkg/source" ]; then
  ls -la vendor/sigs.k8s.io/controller-runtime/pkg/source/
else
  echo "Vendor directory not found, searching for Kind in the repo"
  rg "type Kind\[" --type go -A 5 | head -30
fi

Repository: openshift/hypershift

Length of output: 294


🏁 Script executed:

#!/bin/bash
# Read the Kind function signature and surrounding code
sed -n '73,100p' vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go

Repository: openshift/hypershift

Length of output: 928


🏁 Script executed:

#!/bin/bash
# Let's look at the actual file setup.go to see the exact code
sed -n '46,54p' control-plane-operator/hostedclusterconfigoperator/controllers/etcd/setup.go

Repository: openshift/hypershift

Length of output: 752


Fix incorrect generic type parameter: source.Kind[client.Object] should omit the explicit type parameter or use concrete types.

The source.Kind function signature is func Kind[object client.Object](...) where object is the type parameter and the second argument must be a concrete value of that type. The code explicitly specifies client.Object (an interface) as the type parameter while passing concrete types (&corev1.Pod{}, &corev1.Service{}, &discoveryv1.EndpointSlice{}), causing a type mismatch.

Remove the explicit type parameter to let it be inferred from the object argument:

Suggested fix
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &corev1.Pod{}, enqueueHCP, isEtcdPodInNamespace(opts.Namespace))); err != nil {
	return fmt.Errorf("failed to watch etcd Pods: %w", err)
}
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &corev1.Service{}, enqueueHCP, isEtcdServiceInNamespace(opts.Namespace))); err != nil {
	return fmt.Errorf("failed to watch etcd Services: %w", err)
}
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &discoveryv1.EndpointSlice{}, enqueueHCP, isEtcdEndpointSliceInNamespace(opts.Namespace))); err != nil {
	return fmt.Errorf("failed to watch etcd EndpointSlices: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@control-plane-operator/hostedclusterconfigoperator/controllers/etcd/setup.go`
around lines 46 - 54, The three c.Watch calls use an incorrect explicit generic
parameter (source.Kind[client.Object]) causing a type mismatch; remove the
explicit type parameter so Go can infer the concrete types from the provided
objects (change source.Kind[client.Object](opts.CPCluster.GetCache(),
&corev1.Pod{}, ...) etc. to source.Kind(opts.CPCluster.GetCache(),
&corev1.Pod{}, ...)). Update the three calls that reference source.Kind with
&corev1.Pod{}, &corev1.Service{}, and &discoveryv1.EndpointSlice{} respectively
(the lines using enqueueHCP and the isEtcd*InNamespace predicates remain the
same).

Comment on lines +255 to +259
func (r *reconciler) detectPrimaryUDNNetworkNameFromVirtLauncher(ctx context.Context, namespace string) (string, bool) {
podList := &corev1.PodList{}
if err := r.kubevirtInfraClient.List(ctx, podList, client.InNamespace(namespace), client.MatchingLabels{"kubevirt.io": "virt-launcher"}); err != nil {
return "", false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent List error causes destructive mis-labeling in UDN namespaces

When r.kubevirtInfraClient.List fails (transient API error, RBAC issue, context cancellation, etc.), detectPrimaryUDNNetworkNameFromVirtLauncher returns ("", false) without logging anything. The caller then takes the non-UDN branch (line 212–216), which:

  1. Writes kubernetes.io/service-name on the EndpointSlice.
  2. Deletes k8s.ovn.org/service-name from Labels.
  3. Deletes k8s.ovn.org/endpointslice-network from Annotations.

In a Primary-UDN-configured namespace this actively strips the correct OVN-K labels from any existing EndpointSlice, silently breaking traffic until the next successful reconcile that happens to list pods without error.

The function signature should surface the error so the caller can abort instead of mis-configuring the object. At minimum, add a ctrl.LoggerFrom(ctx) log so the failure is observable.

🔧 Proposed fix – propagate error and hoist call
-func (r *reconciler) primaryUDNNetworkName(ctx context.Context, namespace string) (string, bool) {
-	derived, ok := r.detectPrimaryUDNNetworkNameFromVirtLauncher(ctx, namespace)
-	return derived, ok
-}
-
-func (r *reconciler) detectPrimaryUDNNetworkNameFromVirtLauncher(ctx context.Context, namespace string) (string, bool) {
+// detectPrimaryUDNNetworkName derives the OVN-K primary network name from any
+// virt-launcher pod's k8s.ovn.org/pod-networks annotation.
+func (r *reconciler) detectPrimaryUDNNetworkName(ctx context.Context, namespace string) (string, bool, error) {
+	log := ctrl.LoggerFrom(ctx)
 	podList := &corev1.PodList{}
 	if err := r.kubevirtInfraClient.List(ctx, podList, client.InNamespace(namespace), client.MatchingLabels{"kubevirt.io": "virt-launcher"}); err != nil {
-		return "", false
+		return "", false, fmt.Errorf("listing virt-launcher pods in namespace %s: %w", namespace, err)
 	}
 	for i := range podList.Items {
 		raw := podList.Items[i].Annotations[ovnPodNetworksAnnotationKey]
 		if raw == "" {
 			continue
 		}
 		var m map[string]podNetworksAnnotationEntry
 		if err := json.Unmarshal([]byte(raw), &m); err != nil {
+			log.V(4).Info("skipping pod with unparseable pod-networks annotation", "pod", podList.Items[i].Name, "err", err)
 			continue
 		}
 		for networkKey, v := range m {
 			if v.Role == ovnPrimaryRole && networkKey != "default" {
-				return strings.ReplaceAll(networkKey, "/", "_"), true
+				return strings.ReplaceAll(networkKey, "/", "_"), true, nil
 			}
 		}
 	}
-	return "", false
+	return "", false, nil
 }

Then in reconcileKubevirtPassthroughServiceEndpointsByIPFamily, hoist the call outside the CreateOrUpdate mutate:

 func (r *reconciler) reconcileKubevirtPassthroughServiceEndpointsByIPFamily(...) error {
 	log := ctrl.LoggerFrom(ctx)
+	udnNetworkName, isPrimaryUDN, err := r.detectPrimaryUDNNetworkName(ctx, cpService.Namespace)
+	if err != nil {
+		return fmt.Errorf("detecting primary UDN network in namespace %s: %w", cpService.Namespace, err)
+	}
+
 	endpointSlice := &discoveryv1.EndpointSlice{ ... }
 	result, err := r.CreateOrUpdate(ctx, r.kubevirtInfraClient, endpointSlice, func() error {
 		...
-		udnNetworkName, isPrimaryUDN := r.primaryUDNNetworkName(ctx, cpService.Namespace)
 		if isPrimaryUDN {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`
around lines 255 - 259, detectPrimaryUDNNetworkNameFromVirtLauncher currently
swallows List errors and returns ("", false) which leads the caller
reconcileKubevirtPassthroughServiceEndpointsByIPFamily to take the non-UDN
branch and mutate/delete OVN labels; change
detectPrimaryUDNNetworkNameFromVirtLauncher to return an error (e.g., func (...)
(string, bool, error)) and propagate that error to the caller so
reconcileKubevirtPassthroughServiceEndpointsByIPFamily can abort on List
failures; additionally log the List failure with ctrl.LoggerFrom(ctx) inside
detectPrimaryUDNNetworkNameFromVirtLauncher for observability, and move the call
out of the CreateOrUpdate mutate closure in
reconcileKubevirtPassthroughServiceEndpointsByIPFamily so errors are handled
before any EndpointSlice mutations.

go.mod Outdated
Comment on lines +261 to +266
go.opentelemetry.io/otel v1.38.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.37.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.37.0 // indirect
go.opentelemetry.io/otel/metric v1.37.0 // indirect
go.opentelemetry.io/otel/sdk v1.37.0 // indirect
go.opentelemetry.io/otel/trace v1.37.0 // indirect
go.opentelemetry.io/otel/metric v1.38.0 // indirect
go.opentelemetry.io/otel/sdk v1.38.0 // indirect
go.opentelemetry.io/otel/trace v1.38.0 // indirect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether otlptrace v1.38.0 and otlptracegrpc v1.38.0 exist and are compatible.
curl -sf "https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/@v/v1.38.0.info" | jq .
curl -sf "https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc/@v/v1.38.0.info" | jq .

Repository: openshift/hypershift

Length of output: 725


Bump OTel exporter versions to v1.38.0 to match core package versions.

The core OTel packages (go.opentelemetry.io/otel, /metric, /sdk, /trace) are bumped to v1.38.0, but the exporter packages (otlptrace and otlptracegrpc) remain at v1.37.0. This version split creates a build risk: Go's MVS will resolve the core packages at v1.38.0 while the exporters were compiled against v1.37.0 internals. OpenTelemetry has a documented history of removing internal APIs between minor versions, causing build failures in exactly this configuration (e.g., v1.11.2 vs v1.11.1).

Both go.opentelemetry.io/otel/exporters/otlp/otlptrace and go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc are available at v1.38.0 (released 2025-08-29 from the same Git hash), making the alignment straightforward.

🔧 Proposed fix
-	go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.37.0 // indirect
-	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.37.0 // indirect
+	go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.38.0 // indirect
+	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.38.0 // indirect
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` around lines 261 - 266, Update the exporter module versions so they
match the core OTel packages: change
go.opentelemetry.io/otel/exporters/otlp/otlptrace and
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc to v1.38.0 in
go.mod (so all otel, metric, sdk, trace, otlptrace and otlptracegrpc use
v1.38.0), then run go get or go mod tidy to refresh the module graph and verify
the build.

Comment on lines +157 to +164
// Use root-ca secret which is the CA that signs the ignition-server certificate.
// This is needed for Primary UDN clusters where VMs access ignition-server via internal service DNS.
caSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Namespace: t.controlplaneNamespace,
Name: "root-ca",
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# 1. Check where root-ca Secret is created/provisioned
rg -n "root-ca" --type go -C3

Repository: openshift/hypershift

Length of output: 15087


🏁 Script executed:

# 2. Check remaining usages of IgnitionCACertSecret
rg -n "IgnitionCACertSecret" --type go -C3

Repository: openshift/hypershift

Length of output: 7620


🏁 Script executed:

# 3. Check for Primary UDN detection logic
rg -n "PrimaryUDN|IsPrimaryUDN|IsHostedClusterPrimary" --type go -C2

Repository: openshift/hypershift

Length of output: 27729


🏁 Script executed:

# Check what IgnitionCACertSecret actually is
rg -n "IgnitionCACertSecret" --type go -A 5 hypershift-operator/controllers/manifests/ignitionserver/manifests.go

Repository: openshift/hypershift

Length of output: 270


🏁 Script executed:

# Check token.go more thoroughly - see the full function and its caller
rg -n "getIgnitionCACert\|NewToken" --type go -B2 -A8 hypershift-operator/controllers/nodepool/token.go

Repository: openshift/hypershift

Length of output: 46


🏁 Script executed:

# Check conditions.go to see how it uses IgnitionCACertSecret
rg -n "IgnitionCACertSecret" --type go -B3 -A8 hypershift-operator/controllers/nodepool/conditions.go

Repository: openshift/hypershift

Length of output: 777


🏁 Script executed:

# Find ResourceName definition in ignitionserver manifests
rg -n "ResourceName" --type go -B2 -A2 hypershift-operator/controllers/manifests/ignitionserver/

Repository: openshift/hypershift

Length of output: 6655


🏁 Script executed:

# Check token.go full function implementation including the surrounding context
cat -n hypershift-operator/controllers/nodepool/token.go | head -200 | tail -80

Repository: openshift/hypershift

Length of output: 3494


🏁 Script executed:

# Check if there's a git diff or recent change to token.go
git log --oneline -20 hypershift-operator/controllers/nodepool/token.go

Repository: openshift/hypershift

Length of output: 127


🏁 Script executed:

# Verify: are root-ca and ignition-server-ca-cert the same or different?
# Check what creates each secret
rg -n "ReconcileRootCA\|ignitionserver\|IgnitionCACertSecret" --type go control-plane-operator/controllers/hostedcontrolplane/v2/ignitionserver/pki.go

Repository: openshift/hypershift

Length of output: 46


🏁 Script executed:

# Check the test file to see what the old code expected
cat -n hypershift-operator/controllers/nodepool/token_test.go | head -100

Repository: openshift/hypershift

Length of output: 3709


🏁 Script executed:

# Check the git diff for the recent commit
git show c731b58de --stat

Repository: openshift/hypershift

Length of output: 50378


🏁 Script executed:

# Get the full diff for token.go changes
git show c731b58de -- hypershift-operator/controllers/nodepool/token.go

Repository: openshift/hypershift

Length of output: 18950


🏁 Script executed:

# Also check if tests were updated
git show c731b58de -- hypershift-operator/controllers/nodepool/token_test.go

Repository: openshift/hypershift

Length of output: 33714


Tests are misaligned with implementation and will fail.

Lines 157–164 implement getIgnitionCACert to read the hardcoded "root-ca" secret, but the test file (token_test.go) mocks ignitionserver.IgnitionCACertSecret(controlplaneNamespace), which creates a secret named "ignition-server-ca-cert". The implementation will fail to find the mocked secret at test time. For example, TestNewToken (line 239) expects an error about "ignition-server-ca-cert" not found, but the code will error about "root-ca" instead.

Additionally, the scope concern in the original comment is valid: IgnitionCACertSecret (the ignition-server CA cert) is still actively used in conditions.go:277, agent/agent.go:198, and pki.go:43, indicating this change is an incomplete migration. Switching to root-ca unconditionally without a PUDN guard means non-PUDN clusters will now use a different CA than before, even though root-ca is provisioned universally. If root-ca does not sign the ignition-server TLS certificate for non-PUDN setups, VMs will fail TLS validation during ignition.

The comment on lines 157–158 explicitly scopes this to Primary UDN clusters, but there is no conditional guard. Either add a PUDN check or update the test mocks to use "root-ca" and verify all code paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/token.go` around lines 157 - 164,
The implementation of getIgnitionCACert currently reads a hardcoded "root-ca"
secret but tests and other code expect the ignition-server CA secret
("ignition-server-ca-cert") — fix by either restoring conditional logic to
select ignitionserver.IgnitionCACertSecret(controlplaneNamespace) when not in
Primary UDN mode or update the tests/mocks to return "root-ca"; specifically
adjust getIgnitionCACert (and any call sites using root-ca) to consult the PUDN
guard before choosing the secret name, or change token_test.go to mock
ignitionserver.IgnitionCACertSecret to return a secret named "root-ca" so
TestNewToken and related tests align with the implementation; keep references
consistent with conditions.go, agent/agent.go, and pki.go usage of the
ignition-server CA.

@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Feb 23, 2026

/test e2e-kubevirt-metal-conformance

@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Feb 23, 2026

/test pull-ci-openshift-hypershift-main-e2e-kubevirt-metal-conformance

@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Feb 23, 2026

/test e2e-kubevirt-metal-conformance

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Feb 23, 2026

/payload 4.22 nightly informing hypershift-kubevirt-ovn-conformance-4.22

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 23, 2026

@qinqon: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Feb 23, 2026

/payload hypershift-kubevirt-ovn-conformance

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 23, 2026

@qinqon: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Feb 23, 2026

/payload-job periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-azure-kubevirt-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 23, 2026

@qinqon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-azure-kubevirt-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/793e8260-1094-11f1-9f14-923b1f4e9bf2-0

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Feb 23, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-azure-kubevirt-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 23, 2026

@qinqon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-azure-kubevirt-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f98fef80-1094-11f1-9a9e-7236f40753ee-0

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Feb 23, 2026

/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-azure-kubevirt-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 23, 2026

@qinqon: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@RamLavi RamLavi changed the base branch from main to release-4.22 February 23, 2026 09:56
@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 8ce4a71 to f6871f0 Compare March 11, 2026 19:41
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 11, 2026

/test e2e-azure-kubevirt-ovn

@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from f6871f0 to b7f11ef Compare March 12, 2026 07:02
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 12, 2026

/test e2e-azure-kubevirt-ovn

@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from b7f11ef to 43fdc0d Compare March 12, 2026 11:46
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 12, 2026

/test e2e-azure-kubevirt-ovn

@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 43fdc0d to f1f8852 Compare March 15, 2026 08:34
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 15, 2026

/test e2e-azure-kubevirt-ovn

@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from f1f8852 to 2251cbf Compare March 15, 2026 15:14
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 15, 2026

/test e2e-azure-kubevirt-ovn

@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 2251cbf to 9627787 Compare March 16, 2026 08:02
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 16, 2026

/test e2e-azure-kubevirt-ovn

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2026
RamLavi added 7 commits March 16, 2026 22:17
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add an HCCO controller for KubeVirt Primary UDN that creates UDN-style
EndpointSlices for etcd
services and orphans default EndpointSlices to avoid DNS returning
default-network Pod IPs.

Assisted-by: Claude upus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
…h bridge

For KubeVirt clusters whose control-plane namespace is Primary UDN,
reconcile an internal DNS
instance and patch dns.operator to forward the apps domain (and OAuth
zone) to it.
Maintain an oauth-bridge Service/Endpoints in the guest cluster
targeting the management OAuth pod UDN IP, and gate DNS operator changes
until internal DNS is ready.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add kubevirt create flags to configure Primary UDN name/subnet,
reconcile the HCP namespace with
the required OVN primary-udn label/annotation at creation time, ensure
the UserDefinedNetwork CR,
and adjust KubeVirt VM templates for Primary UDN (l2bridge binding, omit
live-migration annotation).

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
In order to run hosted cluster with PrimaryUDN in OpenShift CI for the
e2e-azure-kubevirt-ovn job - setting default Primary UDN flags
when running that specific job.
This is done in order to avoid adding these flags on a separate repo
(openshift/release).

Signed-off-by: Ram Lavi <ralavi@redhat.com>
temporary change for debug.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 9627787 to 578183a Compare March 16, 2026 20:18
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2026
@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 578183a to 573f329 Compare March 18, 2026 07:50
@RamLavi RamLavi force-pushed the hosted_cluster_poc branch from 573f329 to 6ee95a2 Compare March 30, 2026 11:24
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RamLavi
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 30, 2026

rechecking without the ugly hack commit, now that openshift/cluster-ingress-operator#1382 is merged

@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 30, 2026

/test e2e-azure-kubevirt-ovn

…ssing

Keep the namespace creation-time Primary UDN enforcement by deleting and
recreating pre-created namespaces missing OVN UDN label/annotation,
while dropping the ingress-operator restart workaround now that upstream
ingress fix is merged.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi
Copy link
Copy Markdown
Author

RamLavi commented Mar 31, 2026

/test e2e-azure-kubevirt-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

@RamLavi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-self-managed 2dfed9b link true /test e2e-azure-self-managed
ci/prow/security 2dfed9b link true /test security
ci/prow/unit 2dfed9b link true /test unit
ci/prow/e2e-azure-kubevirt-ovn 40aa56a link false /test e2e-azure-kubevirt-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants