Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdd Primary UDN support for KubeVirt in HyperShift: new annotations/CLI flags, UDN creation/ensure, etcd EndpointSlice mirroring to UDN IPs, guest internal CoreDNS and OAuth bridge provisioning, VM/network template overrides, ignition/TLS adjustments, RBAC expansions, and bootstrap/sync automation scripts. Changes
Sequence Diagram(s)sequenceDiagram
participant Mgr as Management Controller
participant KubeVirt as KubeVirt infra
participant HCP as HostedControlPlane (etcd)
participant Guest as Guest Cluster API
participant GuestDNS as Guest CoreDNS
participant OAuthPod as OAuth Pod
Mgr->>KubeVirt: Ensure UserDefinedNetwork CR (Primary UDN)
KubeVirt-->>Mgr: UDN exists
Mgr->>HCP: List etcd pods (read pod-networks annotation)
Mgr->>HCP: Reconcile etcd UDN EndpointSlices (addresses → UDN IPs)
Mgr->>Guest: Ensure internal-apps-dns Namespace/Deployment/Service
Guest->>GuestDNS: Deploy CoreDNS with hosts entries (apps, console, downloads)
Mgr->>Guest: Patch guest DNS operator to forward apps domain to internal DNS
Mgr->>OAuthPod: Read pod-networks → get OAuth primary UDN IP
Mgr->>Guest: Create oauth-bridge Service + Endpoints -> OAuth UDN IP
GuestDNS-->>Guest: DNS resolves apps and OAuth via internal DNS
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cluster-up.sh (1)
268-282:⚠️ Potential issue | 🟡 MinorUseless check of
$?afterifalready captured exit status.On line 269,
./create-etcd-mirrors.shruns inside theif [ -f ... ]block. Line 271 tests$?after the script runs on line 269, but$?at that point reflects the exit code of line 269. However, ifset -eis active, a non-zero exit from line 269 would have already terminated the script. If you intend to handle the failure gracefully, run the script with|| trueand capture the exit code, or use it directly in anifstatement.Proposed fix
if [ -f "./create-etcd-mirrors.sh" ]; then - ./create-etcd-mirrors.sh - - if [ $? -eq 0 ]; then + if ./create-etcd-mirrors.sh; then echo "" echo "✓ EndpointSlice workaround applied successfully" else🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-up.sh` around lines 268 - 282, The post-if check of `$?` after running ./create-etcd-mirrors.sh is redundant and unsafe with set -e; change the flow to either (A) run the script directly in an if: use `if ./create-etcd-mirrors.sh ...; then` and branch on its success/failure, or (B) run it with `./create-etcd-mirrors.sh ... || true` and immediately capture its exit code into a variable (e.g., rc=$?) before checking rc; update the block around the create-etcd-mirrors.sh invocation to use one of these patterns and remove the standalone `$?` check.
🟡 Minor comments (13)
api/hypershift/v1beta1/hostedcluster_types.go-273-278 (1)
273-278:⚠️ Potential issue | 🟡 MinorDocument mutability semantics and platform scope for the new UDN annotations.
Two concerns:
Immutability:
PrimaryUDNSubnetAnnotationis especially sensitive — mutating the subnet CIDR after aUserDefinedNetworkhas been provisioned would create a divergence between the running UDN and the annotation value. Consider adding a doc comment warning that these annotations are effectively immutable post-cluster-creation (matching the pattern of e.g.SkipReleaseImageValidation).Platform scope: The comments mention KubeVirt hosted clusters, but both constants are in the general annotation block with no enforcement. Any platform can set these annotations and silently get no effect (or worse, unexpected behavior). A short note in the comment (e.g., "only honoured for KubeVirt platform") would help prevent misuse.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 273 - 278, Update the comment for PrimaryUDNNameAnnotation and PrimaryUDNSubnetAnnotation to document mutability and platform scope: note that PrimaryUDNSubnetAnnotation (and by extension the PrimaryUDNNameAnnotation) should be treated as effectively immutable after a UserDefinedNetwork is provisioned (mutating the CIDR causes divergence) and add a short explicit note that these annotations are only honored for KubeVirt-hosted clusters (other platforms will ignore them); reference the constants PrimaryUDNSubnetAnnotation and PrimaryUDNNameAnnotation when adding these doc warnings.control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/guest_info.go-41-49 (1)
41-49:⚠️ Potential issue | 🟡 Minor
guestDNSImagerelies on container position rather than name.
Containers[0]silently couples to the assumption that the DNS container is always first. If a sidecar is ever prepended, or the container order shifts, the function returns the wrong image without raising an error. Lookup by name is more robust.♻️ Proposed fix
func guestDNSImage(ctx context.Context, c client.Client) (string, error) { ds := &appsv1.DaemonSet{ObjectMeta: metav1.ObjectMeta{Namespace: "openshift-dns", Name: "dns-default"}} if err := c.Get(ctx, client.ObjectKeyFromObject(ds), ds); err != nil { return "", err } - if len(ds.Spec.Template.Spec.Containers) == 0 || ds.Spec.Template.Spec.Containers[0].Image == "" { - return "", fmt.Errorf("openshift-dns/dns-default has no container image") + for _, c := range ds.Spec.Template.Spec.Containers { + if c.Name == "dns" && c.Image != "" { + return c.Image, nil + } } - return ds.Spec.Template.Spec.Containers[0].Image, nil + return "", fmt.Errorf("openshift-dns/dns-default: dns container image not found") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/guest_info.go` around lines 41 - 49, guestDNSImage currently uses ds.Spec.Template.Spec.Containers[0] which assumes the DNS container is first; change guestDNSImage to iterate over ds.Spec.Template.Spec.Containers and locate the container by name (e.g., container.Name == "dns" or the real DNS container name used in the DaemonSet), validate its Image is non-empty, and return that image; if no matching container or image is found return a descriptive error. Ensure you still fetch the DaemonSet via c.Get(ctx, client.ObjectKeyFromObject(ds), ds) and use the ds variable for the container search and error messages.control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/util.go-5-13 (1)
5-13:⚠️ Potential issue | 🟡 Minor
dnsZoneFromHostnamehas two edge cases that fall through toreturn hostunexpectedly.
- Leading dot (
i == 0):i > 0isfalse, so".apps.example.com"returns".apps.example.com"unchanged instead of"apps.example.com".- Trailing dot / FQDN notation (
i+1 == len(host)):"apps.example.com."fails thei+1 < len(host)check and returns"apps.example.com."verbatim instead of"example.com.".Neither is a likely real-world input from an OpenShift ingress domain, but the silent passthrough could cause a DNS zone mismatch downstream without any error.
🛡️ Proposed fix
func dnsZoneFromHostname(host string) string { if host == "" { return "" } - if i := strings.IndexByte(host, '.'); i > 0 && i+1 < len(host) { - return host[i+1:] - } - return host + host = strings.TrimSuffix(host, ".") + host = strings.TrimPrefix(host, ".") + if i := strings.IndexByte(host, '.'); i >= 0 && i+1 < len(host) { + return host[i+1:] + } + return host }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/util.go` around lines 5 - 13, dnsZoneFromHostname currently returns the original host for leading-dot or trailing-dot cases; update it to normalize those cases: detect and strip a single leading '.' before processing, record whether the input had a trailing '.' and preserve it on the returned zone, then find the first '.' and return the substring after that label (e.g. for ".apps.example.com" return "apps.example.com", for "apps.example.com." return "example.com."). Keep the function name dnsZoneFromHostname and ensure empty-string input still returns "".test/e2e/util/options.go-153-154 (1)
153-154:⚠️ Potential issue | 🟡 MinorValidate Primary UDN name/subnet pairing early.
The flag text says the subnet is required when the name is set, but
Options.Validatedoesn’t enforce this pairing, so misconfigurations will fail later and less clearly. Consider validating these fields together.🔧 Suggested validation (in Options.Validate)
func (o *Options) Validate() error { var errs []error + if o.ConfigurableClusterOptions.KubeVirtPrimaryUDNName != "" && + o.ConfigurableClusterOptions.KubeVirtPrimaryUDNSubnet == "" { + errs = append(errs, fmt.Errorf("kubevirt-primary-udn-subnet is required when kubevirt-primary-udn-name is set")) + } + if o.ConfigurableClusterOptions.KubeVirtPrimaryUDNName == "" && + o.ConfigurableClusterOptions.KubeVirtPrimaryUDNSubnet != "" { + errs = append(errs, fmt.Errorf("kubevirt-primary-udn-name is required when kubevirt-primary-udn-subnet is set")) + } + if len(o.LatestReleaseImage) == 0 { errs = append(errs, fmt.Errorf("latest release image is required")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util/options.go` around lines 153 - 154, Options.Validate should enforce that when KubeVirtPrimaryUDNName is set, KubeVirtPrimaryUDNSubnet must also be set (and optionally validate subnet format); update the Options.Validate method to check the pairing of KubeVirtPrimaryUDNName and KubeVirtPrimaryUDNSubnet, return a clear validation error if name is non-empty and subnet is empty (and vice versa if you want symmetry), and include the field names in the error message so misconfiguration is caught early.cmd/install/assets/hypershift_operator.go-1343-1347 (1)
1343-1347:⚠️ Potential issue | 🟡 MinorRedundant RBAC rule — duplicate of the pre-existing rule at lines 1381–1384
The new rule at lines 1343–1347 grants
["get","list","watch","create","update","patch","delete"]onvalidatingadmissionpoliciesandvalidatingadmissionpolicybindings. The pre-existing rule at lines 1381–1384 already grantsrbacv1.VerbAll("*") on the exact same resources, which is a strict superset. The new rule is entirely redundant and should be removed.🔧 Proposed fix — remove the redundant rule
- { - APIGroups: []string{"admissionregistration.k8s.io"}, - Resources: []string{"validatingadmissionpolicies", "validatingadmissionpolicybindings"}, - Verbs: []string{"get", "list", "watch", "create", "update", "patch", "delete"}, - },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/hypershift_operator.go` around lines 1343 - 1347, Remove the redundant RBAC rule that lists APIGroups: ["admissionregistration.k8s.io"] and Resources: ["validatingadmissionpolicies", "validatingadmissionpolicybindings"] with Verbs ["get","list","watch","create","update","patch","delete"]; it duplicates an existing rule that already grants rbacv1.VerbAll ("*") on the same resources (the later rule). Locate the Role/ClusterRole definition containing that rule and delete this rule entry so only the existing VerbAll rule remains.cmd/install/assets/hypershift_operator.go-1337-1352 (1)
1337-1352:⚠️ Potential issue | 🟡 MinorIndentation inconsistency on new rule blocks will fail
gofmtThe three new
PolicyRuleblocks at lines 1337–1352 are indented at two levels, while all surrounding entries in theRulesslice literal use three levels. Runninggofmtwill reject or reformat this.🔧 Proposed fix — align indentation
- { - APIGroups: []string{"admissionregistration.k8s.io"}, - Resources: []string{"validatingwebhookconfigurations"}, - Verbs: []string{"delete"}, - ResourceNames: []string{hyperv1.GroupVersion.Group}, - }, - { - APIGroups: []string{"admissionregistration.k8s.io"}, - Resources: []string{"validatingadmissionpolicies", "validatingadmissionpolicybindings"}, - Verbs: []string{"get", "list", "watch", "create", "update", "patch", "delete"}, - }, - { - APIGroups: []string{"certificates.k8s.io"}, - Resources: []string{"certificatesigningrequests"}, - Verbs: []string{"get", "list", "watch"}, - }, + { + APIGroups: []string{"admissionregistration.k8s.io"}, + Resources: []string{"validatingwebhookconfigurations"}, + Verbs: []string{"delete"}, + ResourceNames: []string{hyperv1.GroupVersion.Group}, + }, + { + APIGroups: []string{"certificates.k8s.io"}, + Resources: []string{"certificatesigningrequests"}, + Verbs: []string{"get", "list", "watch"}, + },(The
validatingadmissionpoliciesblock is omitted above as it should be deleted per the duplicate-rule comment.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/hypershift_operator.go` around lines 1337 - 1352, The three newly added PolicyRule blocks inside the Rules slice in hypershift_operator.go are indented two levels instead of the three-level indentation used by surrounding entries; update the indentation of each new block (the one with Resources: []string{"validatingwebhookconfigurations"}, the one with Resources: []string{"validatingadmissionpolicies", "validatingadmissionpolicybindings"}, and the one with Resources: []string{"certificatesigningrequests"}) to match the existing three-level indentation in the Rules slice so that gofmt will not reformat or fail; also remove the duplicate validatingadmissionpolicies block if applicable per the duplicate-rule note.cmd/install/assets/hypershift_operator.go-1333-1336 (1)
1333-1336:⚠️ Potential issue | 🟡 MinorRemove the duplicate RBAC rule for
validatingadmissionpoliciesandvalidatingadmissionpolicybindingsLines 1343–1346 introduce an explicit-verb rule for
admissionregistration.k8s.io/validatingadmissionpoliciesandvalidatingadmissionpolicybindings. A pre-existing rule at lines 1381–1383 already grantsrbacv1.VerbAllon the same resources, making the new rule redundant. Remove lines 1343–1346 to avoid confusion and technical debt.The
k8s.ovn.orgAPI group foruserdefinednetworksis correct and verified against the codebase CRD definitions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/hypershift_operator.go` around lines 1333 - 1336, Remove the redundant RBAC PolicyRule that explicitly lists verbs for the admissionregistration.k8s.io resources "validatingadmissionpolicies" and "validatingadmissionpolicybindings" (the rule that enumerates "get","list","watch","create","update","patch","delete"); this duplicate is unnecessary because an existing rule later grants rbacv1.VerbAll for those same resources—delete the explicit rule entry in the RBAC rules slice (the block referencing admissionregistration.k8s.io and the two resources) so only the existing VerbAll rule remains.PRIMARY-UDN-CHANGES.md-259-259 (1)
259-259:⚠️ Potential issue | 🟡 MinorAdd language tags to fenced code blocks and standardize heading style.
markdownlint flagged multiple fenced blocks without a language and a Setext-style heading. Adding language identifiers (e.g.,
bash,yaml,go) and switching the heading to ATX style will resolve the lint warnings.Also applies to: 293-293, 298-298, 305-305, 313-313, 357-357, 392-392, 488-488, 646-646, 657-657, 703-703, 802-802
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PRIMARY-UDN-CHANGES.md` at line 259, Several fenced code blocks lack language identifiers and there are Setext-style headings; update each affected fenced block to include the appropriate language tag (e.g., bash, yaml, go) and convert Setext-style headings to ATX-style (prefix with #) throughout the document so markdownlint warnings are resolved; search for all triple-backtick blocks and Setext headings and edit them to use ```bash / ```yaml / ```go as appropriate and change underlined headings (=== or ---) to leading-# headings.PRIMARY-UDN-CHANGES.md-16-41 (1)
16-41:⚠️ Potential issue | 🟡 MinorClarify Primary UDN namespace requirements vs controller behavior.
The controller now derives Primary UDN name/subnet from HostedCluster annotations and enforces the namespace label+annotation at creation time. This section only shows a namespace annotation; consider noting that the label is required at namespace creation (or that it will be set by the controller when the HostedCluster annotations are provided) to avoid setup ambiguity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PRIMARY-UDN-CHANGES.md` around lines 16 - 41, Clarify that the namespace must include both the annotation k8s.ovn.org/primary-user-defined-network and the required label (the label the controller expects) at creation time, or explicitly state that the controller will automatically add the label when it derives the Primary UDN name/subnet from HostedCluster annotations; update the Namespace section to mention both the annotation and the namespace label requirement (or that the controller will set the label upon processing HostedCluster annotations) and reference the HostedCluster and UserDefinedNetwork resources so readers know where the controller derives the Primary UDN.PRIMARY-UDN-CHANGES.md-3-3 (1)
3-3:⚠️ Potential issue | 🟡 MinorTighten compound modifiers and article usage.
LanguageTool flagged a few compound modifiers that read better hyphenated (e.g., “Primary‑UDN support/hosted cluster”), and the phrase “a init-container” should be “an init-container.”
✏️ Suggested edit for the article usage
-Using `0.0.0.0` is temporary, and should be replaced by a init-container that fetches the primaryUDN IP and sets it as a env var or something like that. +Using `0.0.0.0` is temporary, and should be replaced by an init-container that fetches the primaryUDN IP and sets it as an env var or something like that.Also applies to: 16-16, 160-160, 306-306
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PRIMARY-UDN-CHANGES.md` at line 3, The text uses unhyphenated compound modifiers and an incorrect article: replace occurrences like "Primary UDN support" and "hosted cluster" where they function as compound modifiers with hyphenated forms (e.g., "Primary-UDN support" and "hosted-cluster" where appropriate) and change the article "a init-container" to "an init-container"; search the document for the exact phrases "Primary UDN support", "hosted cluster", and "a init-container" and update them accordingly to fix hyphenation and the article usage.cluster-sync.sh-88-96 (1)
88-96:⚠️ Potential issue | 🟡 Minor
grep "$CLUSTER_NAME"for filtering NodePools could match unintended resources.If
CLUSTER_NAMEis e.g.test, this grep will match any NodePool containing "test" in its name. Use a more specific pattern or--field-selector.Proposed fix
- for np in $(oc get nodepools.hypershift.openshift.io -n clusters -o name 2>/dev/null | grep "$CLUSTER_NAME"); do + for np in $(oc get nodepools.hypershift.openshift.io -n clusters -o name 2>/dev/null | grep -w "$CLUSTER_NAME"); do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-sync.sh` around lines 88 - 96, The loop that lists NodePools uses grep "$CLUSTER_NAME" which can match substrings; update the filtering so only NodePools belonging exactly to the target cluster are selected. Replace the grep step in the for loop (the command using oc get nodepools.hypershift.openshift.io -n clusters -o name and grep "$CLUSTER_NAME") with a stricter selector: preferably use oc's --field-selector=metadata.name=<exact-name> when you know the NodePool names, or at minimum anchor the grep to the resource name (e.g. grep -E "/${CLUSTER_NAME}(-|$)") so it matches only NodePools with the cluster name as a discrete token; keep the rest of the loop (echo and oc patch) unchanged and reference CLUSTER_NAME, the oc get nodepools.hypershift.openshift.io command, and the grep usage for locating the change.control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go-249-260 (1)
249-260:⚠️ Potential issue | 🟡 Minor
serviceEndpointPortsuses the Service port rather than the target port.EndpointSlice
Portfields should reflect the port the pod is listening on (i.e.,targetPort), not the Service-level port. For etcd services these are typically identical, so this works in practice, but it's technically incorrect and will break if the service ever maps port X → targetPort Y.Proposed fix
func serviceEndpointPorts(svc *corev1.Service) []discoveryv1.EndpointPort { out := make([]discoveryv1.EndpointPort, 0, len(svc.Spec.Ports)) for i := range svc.Spec.Ports { p := svc.Spec.Ports[i] + port := p.TargetPort.IntVal + if port == 0 { + // If targetPort is not set or is a named port, fall back to the service port. + port = p.Port + } out = append(out, discoveryv1.EndpointPort{ Name: ptr.To(p.Name), Protocol: ptr.To(p.Protocol), - Port: ptr.To(p.Port), + Port: ptr.To(port), }) } return out }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go` around lines 249 - 260, serviceEndpointPorts currently uses svc.Spec.Ports[i].Port; change it to resolve the Service port's TargetPort and populate EndpointPort with the pod-facing port: for each p in svc.Spec.Ports use p.TargetPort when set (if TargetPort.IsInt() set Port to int32(TargetPort.IntVal) and set Name to p.Name/protocol as appropriate; if TargetPort is a string, set EndpointPort.Name to that string and leave Port nil), otherwise fall back to p.Port; update the creation of discoveryv1.EndpointPort in serviceEndpointPorts to use p.TargetPort (and ptr.To conversions) instead of p.Port so EndpointSlice ports reflect the pod target port.control-plane-operator/hostedclusterconfigoperator/controllers/etcd/endpointslices.go-269-295 (1)
269-295:⚠️ Potential issue | 🟡 Minor
ReconcileUDNEndpointSlicesdiscards theshouldRequeuesignal.Line 278 captures
desiredEtcdEndpointsFromPods(...)but ignores the fourth return value (shouldRequeue). When pods exist but primary UDN annotations aren't available yet, the function returnsfalse, trueforokandshouldRequeuerespectively, andReconcileUDNEndpointSlicessilently returnsnilwithout retrying.Since the caller is the main
HostedControlPlaneReconciler.Reconcile()method (a K8s reconciler that will be re-triggered by watches and periodic reconciliation), the function will be called again shortly and the annotations will likely be available then. However, explicitly returning and checking theshouldRequeuesignal would be more robust defensive programming. Consider returning a boolean or error so the caller can decide to requeue immediately if desired.
🧹 Nitpick comments (15)
control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go (1)
507-510: Optional: loop-variable copy is no longer needed in Go 1.22+.The
pod := pcopy (and the identical pattern in the surrounding loops) was required before Go 1.22 to avoid the loop-variable capture bug. Since Go 1.22 each iteration gets its own variable, making these copies redundant and the// golang bug referencing for loop varscomments misleading. This is consistent with the rest of the file, so the fix can be deferred, but it is worth a cleanup pass if the module requires Go ≥ 1.22.♻️ Proposed cleanup (apply consistently across all four loops in this block)
- for _, p := range tc.infraPods { - pod := p // golang bug referencing for loop vars - kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &pod) - } - for _, vm := range tc.virtualMachines { - virtualMachine := vm // golang bug referencing for loop vars - kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &virtualMachine) - } + for i := range tc.infraPods { + kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &tc.infraPods[i]) + } + for i := range tc.virtualMachines { + kubevirtInfraClusterObjects = append(kubevirtInfraClusterObjects, &tc.virtualMachines[i]) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go` around lines 507 - 510, The loop-variable copy "pod := p" (and identical copies in the three neighboring loops) is redundant with Go ≥1.22; remove the temporary assignment and the misleading comment and append the loop variable directly to kubevirtInfraClusterObjects (and the equivalent slices in the other three loops) so each loop simply uses p when calling append.control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go (2)
202-203: API call insideCreateOrUpdatemutate — hoist it outside
primaryUDNNetworkNameperforms akubevirtInfraClient.Listinside the mutate closure. The mutate is invoked once per IP family (twice per service per reconcile) and would be called again on any conflict retry. The value is also identical for all IP families of the same service.Consider computing this once per service in
reconcileKubevirtPassthroughServiceand passing it as a parameter toreconcileKubevirtPassthroughServiceEndpointsByIPFamily.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go` around lines 202 - 203, primaryUDNNetworkName is calling kubevirtInfraClient.List inside the CreateOrUpdate mutate closure (used by reconcileKubevirtPassthroughServiceEndpointsByIPFamily), causing repeated API calls per IP family and on conflict retries; move the call out of the mutate by computing udnNetworkName/isPrimaryUDN once in reconcileKubevirtPassthroughService (using cpService.Namespace) and add udnNetworkName and isPrimaryUDN as parameters to reconcileKubevirtPassthroughServiceEndpointsByIPFamily so the mutate closure no longer invokes primaryUDNNetworkName/kubevirtInfraClient.List.
247-250:primaryUDNNetworkNameis a no-op wrapper — consider removing itThe function unconditionally delegates to
detectPrimaryUDNNetworkNameFromVirtLauncherwith no added logic. If future dispatch (e.g., cache lookup) is not planned in the short term, inline the call directly to reduce indirection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go` around lines 247 - 250, primaryUDNNetworkName is a dead wrapper around detectPrimaryUDNNetworkNameFromVirtLauncher on reconciler; remove primaryUDNNetworkName and update all call sites to call r.detectPrimaryUDNNetworkNameFromVirtLauncher(ctx, namespace) directly (or, if you prefer to keep an indirection, add a TODO comment explaining planned future logic), ensuring the reconciler type still compiles and no references to primaryUDNNetworkName remain.control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
1639-1648: Use theKonnectivityServerPortconstant instead of hardcoding8091.Replace the hardcoded port with
kas.KonnectivityServerPortto align with usage elsewhere in the codebase and prevent drift if the port changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` around lines 1639 - 1648, The current branch hardcodes the konnectivity port to 8091 when detecting a KubeVirt Primary UDN (within the block checking hcp.Spec.Platform.Type == hyperv1.KubevirtPlatform and the annotation "hypershift.openshift.io/primary-udn" == "true"), which risks drift; update the assignment of port (the variable used alongside host = fmt.Sprintf("konnectivity-server.%s.svc.cluster.local", hcp.Namespace)) to use the shared constant kas.KonnectivityServerPort instead of the literal 8091 so the controller uses the centralized constant everywhere.hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go (1)
174-180: Comment contradicts the actual implementationLine 174 unconditionally adds
kubevirt.io/allow-pod-bridge-network-live-migration, yet the comment on lines 175–180 says "For now, we omit this annotation entirely." The real implementation is add-then-remove: the annotation is added here and deleted byapplyPrimaryUDNVMTemplateOverridesfor Primary UDN clusters. The comment also says "When Primary UDN support is added, this should be conditionally added…" — but Primary UDN support is being added in this very PR.📝 Suggested comment update
Annotations: map[string]string{ - "kubevirt.io/allow-pod-bridge-network-live-migration": "", - // NOTE: DO NOT add kubevirt.io/allow-pod-bridge-network-live-migration for Primary UDN! - // This annotation causes OVN-K CNI to skip IP configuration on eth0, - // which breaks route discovery for l2bridge binding on Primary UDN. - // Live migration on Primary UDN is not supported anyway. - // For now, we omit this annotation entirely. When Primary UDN support is added, - // this should be conditionally added only for non-UDN clusters. + // Allow live migration over the pod network for non-Primary-UDN clusters. + // NOTE: This annotation is removed for Primary UDN clusters by + // applyPrimaryUDNVMTemplateOverrides, because OVN-K CNI skips eth0 IP + // configuration when it is present, breaking l2bridge route discovery. + "kubevirt.io/allow-pod-bridge-network-live-migration": "", },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go` around lines 174 - 180, The comment is inaccurate: the code currently unconditionally adds the "kubevirt.io/allow-pod-bridge-network-live-migration" annotation and it is later removed by applyPrimaryUDNVMTemplateOverrides for Primary UDN clusters; update the comment to reflect that flow (annotation is added here and conditionally removed by applyPrimaryUDNVMTemplateOverrides) and note that Primary UDN support is being introduced in this PR so the conditional removal is intentional rather than "omitted entirely", and mention the annotation key and the applyPrimaryUDNVMTemplateOverrides function name so future readers can trace the behavior.cluster-up.sh (3)
61-62: Declare and assign separately to avoid masking return values (SC2155).If
datefails, the non-zero exit code is masked byexport.Proposed fix
-export OP_IMAGE_TAG="udn-fix-$(date +%Y%m%d-%H%M%S)" -export OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}" +OP_IMAGE_TAG="udn-fix-$(date +%Y%m%d-%H%M%S)" +export OP_IMAGE_TAG +OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}" +export OP_IMAGEApply the same pattern at lines 108-109 for
CPO_IMAGE_TAG/CPO_IMAGE.Also applies to: 108-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-up.sh` around lines 61 - 62, The exports for OP_IMAGE_TAG/OP_IMAGE (and similarly CPO_IMAGE_TAG/CPO_IMAGE) mask non-zero exit codes from command substitution; instead assign the tag and image to variables first (e.g., set OP_IMAGE_TAG="$(date +%Y%m%d-%H%M%S)" and OP_IMAGE="${QUAY_REPO_OP}:${OP_IMAGE_TAG}") and then export them in a separate export statement, doing the same for CPO_IMAGE_TAG/CPO_IMAGE so a failing date command won't be hidden by export.
122-124: Inconsistent indentation in the CPO pushelsebranch.Line 123 is missing 2 spaces of indentation compared to the equivalent block for the operator image push (line 77).
Proposed fix
else -podman push ${CPO_IMAGE} + podman push ${CPO_IMAGE} fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-up.sh` around lines 122 - 124, The else branch that pushes the CPO image (podman push ${CPO_IMAGE}) is misindented; change its indentation to match the operator image push block (add the two missing spaces before the podman push line) so the CPO_IMAGE push line aligns with the equivalent operator-image push lines and retains consistent shell script formatting.
5-5: Hardcoded user-specific paths and registry references.Lines 5, 63, 67, 113 embed developer-specific paths (
/root/ralavi/...,/root/.kcli/...) and lines 60, 107 use a personal Quay repo. For a POC this is understandable, but if this script is ever promoted to CI or shared usage, these should be parameterized via environment variables with defaults (similar to howcluster-sync.shdoes it with: "${VAR:=default}").Also applies to: 60-63, 107-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-up.sh` at line 5, The script hardcodes user-specific paths and registry references (e.g., the KUBECONFIG export and the personal Quay repo usages around the lines exporting KUBECONFIG and resource/registry variables); change these to environment-configurable variables with sensible defaults using the shell pattern : "${VAR:=default}" (for example introduce KUBECONFIG_PATH, REGISTRY_REPO, USER_HOME or similar), replace direct occurrences like export KUBECONFIG=/root/.kcli/... and any /root/ralavi/... and quay.io/ralavi/... references with the corresponding variables (KUBECONFIG, REGISTRY_REPO, etc.), and update any places that reference those paths to use the variables so the script can be overridden in CI or by other users.control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go (2)
48-73:Spec.Selectoris immutable on Deployments after creation—setting it insideCreateOrUpdateis a known footgun.Because the labels are constant (
"app": internalAppsDNSDeploy), this works in practice. However, if the label value ever changes, the update will be rejected by the API server. A defensive approach is to only setSelectorduring creation (checkdeploy.CreationTimestamp.IsZero()).Defensive pattern
if _, err := controllerutil.CreateOrUpdate(ctx, c, deploy, func() error { labels := map[string]string{"app": internalAppsDNSDeploy} deploy.Spec.Replicas = ptr.To[int32](1) - deploy.Spec.Selector = &metav1.LabelSelector{MatchLabels: labels} + if deploy.CreationTimestamp.IsZero() { + deploy.Spec.Selector = &metav1.LabelSelector{MatchLabels: labels} + } deploy.Spec.Template.ObjectMeta.Labels = labels🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go` around lines 48 - 73, The Deployment's Spec.Selector should only be set on creation to avoid immutable field errors; inside the controllerutil.CreateOrUpdate callback (where deploy is the *appsv1.Deployment), keep setting deploy.Spec.Template.ObjectMeta.Labels and deploy.Spec.Template.Spec.* always, but guard setting deploy.Spec.Selector = &metav1.LabelSelector{MatchLabels: labels} so it runs only when deploy.CreationTimestamp.IsZero(); use the existing labels map (map[string]string{"app": internalAppsDNSDeploy}) and keep replicas, containers, volumes updates unchanged.
54-63: Deployment spec overwrites the entireContainersandVolumesslices on every reconcile.This means any out-of-band modifications (e.g., resource limits added by an admission webhook or LimitRange) will be reverted every reconciliation cycle. For a POC this is acceptable, but for production you'd want to merge rather than replace. Noting for awareness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/internal_dns.go` around lines 54 - 63, The code currently replaces deploy.Spec.Template.Spec.Containers and Volumes wholesale which clobbers out-of-band changes; modify the reconcile to search deploy.Spec.Template.Spec.Containers for a container named "coredns" and update its Image, Args, Ports and VolumeMounts (using dnsImage and internalAppsDNSPort), or append a new "coredns" container if not found, and similarly merge deploy.Spec.Template.Spec.Volumes by adding the "config" volume only if it doesn't already exist—this preserves any existing containers, resource limits, or webhook-injected fields instead of overwriting the slices.control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go (2)
63-83: Map iteration is non-deterministic; first match wins.
primaryUDNIPFromPodNetworksiterates over amap[string]podNetworksEntry. If a pod somehow has multiple non-default primary-role entries (unlikely but possible with misconfiguration), Go's map iteration order is random, so the returned IP could vary across calls. For a POC this is fine, but worth a note if this is promoted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go` around lines 63 - 83, primaryUDNIPFromPodNetworks currently iterates a map m (map[string]podNetworksEntry) which yields a nondeterministic first match; instead, collect all valid candidate entries (respecting checks for v.Role == ovnPrimaryRole, networkKey != "default", non-empty/parsable IPAddress), store them in a slice with their networkKey or IP, sort that slice deterministically (e.g., lexicographically by networkKey or IP), then return the first element from the sorted list; update function primaryUDNIPFromPodNetworks to perform collection, sorting, and deterministic selection so repeated calls return a stable IP even if multiple primary-role entries exist.
24-27: Duplicate struct definition; function signatures differ significantly.
podNetworksEntryandpodNetworksAnnotationEntryare identical and should be unified. However,primaryUDNIPFromPodNetworksandprimaryUDNInfoFromPodNetworksdiffer in return type, return values, and behavior—the mgmt_oauth version validates IPs withnetip.ParseAddrwhile the etcd version does not, and the etcd version returns additional metadata. Consider extracting the shared struct to a common package, but keep the functions separate due to their distinct purposes and return signatures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/mgmt_oauth.go` around lines 24 - 27, The two identical types podNetworksEntry and podNetworksAnnotationEntry should be consolidated into a single shared type in a common package (e.g., move the struct to a new/shared package and export it) and replace both local definitions with that shared type; keep the functions primaryUDNIPFromPodNetworks and primaryUDNInfoFromPodNetworks separate because their return shapes and behaviors differ, but update them to reference the new shared type; ensure primaryUDNIPFromPodNetworks continues to validate with netip.ParseAddr while primaryUDNInfoFromPodNetworks preserves its additional metadata return values and non-validating behavior, and update imports/usages accordingly.control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go (2)
53-96: Corefile is written twice per reconciliation—once without OAuth and once with.
ensureInternalAppsDNSCorefileis called at line 62 with only the router hosts, then again at line 89 after adding the OAuth entry to the samehostsmap. If the OAuth path fails between lines 62 and 89, the corefile will persist without the OAuth entry until the next successful reconcile.Consider deferring the corefile write until all hosts are assembled (moving the first call to after line 88), so the corefile is only written once and is always complete. The same applies to
ensureDNSOperatorForwarding(lines 65 and 94).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go` around lines 53 - 96, The reconcile currently calls ensureInternalAppsDNSCorefile and ensureDNSOperatorForwarding early (using only router hosts) then again after adding OAuth, which can leave a partial corefile if OAuth steps fail; remove the first calls at the top and instead assemble the full hosts map (populate consoleHost, canaryHost, downloadsHost, then call mgmtOAuthRouteHost, mgmtOAuthPrimaryUDNIP and ensureGuestOAuthBridge to add oauthHost=>oauthBridgeSvcIP) and invoke ensureInternalAppsDNSCorefile(ctx, r.guestClient, hosts) and ensureDNSOperatorForwarding(ctx, r.guestClient, zones, internalUpstream) once after oauth entries are added so the corefile and forwarding rules are written atomically; update any requeue behavior around mgmtOAuthRouteHost, mgmtOAuthPrimaryUDNIP, and ensureGuestOAuthBridge to return early without having written partial state.
24-38: Errors are logged at Info level and silently swallowed during requeue.This is a reasonable pattern for progressive bootstrapping. However, the
errvalue is discarded—logging the error message would aid debugging when the cause isn't simply "not yet created."Proposed improvement
ingressDomain, err := guestIngressDomain(ctx, r.guestClient) if err != nil { - log.Info("waiting for guest ingress domain") + log.Info("waiting for guest ingress domain", "error", err) return ctrl.Result{RequeueAfter: 30 * time.Second}, nil }Apply the same pattern to the other
log.Info("waiting for ...")blocks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/primaryudn/reconciler.go` around lines 24 - 38, The current waits drop the error value; update the three blocks (calls to guestIngressDomain, guestRouterInternalClusterIP, guestDNSImage) so that instead of calling log.Info("waiting for ...") you log the actual error (e.g. log.Error(err, "waiting for guest ingress domain") / log.Error(err, "waiting for guest router-internal ClusterIP") / log.Error(err, "waiting for guest dns-default DaemonSet")), then keep returning the same ctrl.Result{RequeueAfter: 30 * time.Second}, nil; this ensures the error from guestIngressDomain, guestRouterInternalClusterIP and guestDNSImage is recorded for debugging while preserving the requeue behavior.cluster-sync.sh (1)
446-446: Hardcoded CoreDNS image tagv4.17.The Go controller (
internal_dns.go) dynamically fetches the DNS image from the guest cluster'sdns-defaultDaemonSet, but this shell script hardcodesregistry.redhat.io/openshift4/ose-coredns-rhel9:v4.17. This will become stale as clusters upgrade.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster-sync.sh` at line 446, Hardcoded CoreDNS image "registry.redhat.io/openshift4/ose-coredns-rhel9:v4.17" will become stale; replace that literal with a variable populated by querying the guest cluster's dns-default DaemonSet like the Go controller does. Before using the image value in cluster-sync.sh, run a kubectl (against the guest cluster context/namespace) to get the image from the dns-default DaemonSet (jsonpath to .spec.template.spec.containers[?(@.name=="coredns")].image or similar) assign it to DNS_IMAGE, fall back to a sensible default only if the query fails, and use $DNS_IMAGE in place of the hardcoded string.
| if [ -f "./create-etcd-mirrors.sh" ]; then | ||
| ./create-etcd-mirrors.sh | ||
|
|
||
| if [ $? -eq 0 ]; then | ||
| echo "" | ||
| echo "✓ EndpointSlice workaround applied successfully" | ||
| else | ||
| echo "" | ||
| echo "WARNING: EndpointSlice workaround script failed" | ||
| echo "You may need to run it manually: ./create-etcd-mirrors.sh" | ||
| fi | ||
| else | ||
| echo "ERROR: create-etcd-mirrors.sh not found in current directory" | ||
| echo "Please ensure the script is present and run it manually" | ||
| fi |
There was a problem hiding this comment.
set -e causes the script to exit before $? can be checked.
Same issue as in cluster-up.sh: if ./create-etcd-mirrors.sh fails, set -e terminates the script before reaching line 271.
Proposed fix
if [ -f "./create-etcd-mirrors.sh" ]; then
- ./create-etcd-mirrors.sh
-
- if [ $? -eq 0 ]; then
+ if ./create-etcd-mirrors.sh; then
echo ""
echo "✓ EndpointSlice workaround applied successfully"
else🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cluster-sync.sh` around lines 268 - 282, The problem is that set -e causes
the script to exit when ./create-etcd-mirrors.sh fails so the subsequent check
of $? never runs; fix this by invoking the script inside a conditional (or
capturing its exit immediately) so the non-zero exit is handled instead of
aborting: inside cluster-sync.sh keep the file-exists test for
create-etcd-mirrors.sh, then replace the separate "./create-etcd-mirrors.sh" +
"if [ $? -eq 0 ]; then" pattern with "if ./create-etcd-mirrors.sh; then" (or run
"./create-etcd-mirrors.sh; rc=$?; if [ $rc -eq 0 ]; then") and move the
success/warning echoes into the then/else branches so set -e won’t prematurely
terminate.
| if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then | ||
| echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}" | ||
| echo "" | ||
| echo "To access the guest cluster:" | ||
| echo " export KUBECONFIG=${GUEST_KUBECONFIG}" | ||
| echo " oc get nodes" | ||
| else | ||
| echo "Guest cluster kubeconfig not extracted yet." | ||
| echo "To extract it manually:" | ||
| echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm" | ||
| echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig" | ||
| fi |
There was a problem hiding this comment.
GUEST_KUBECONFIG may be unset here, causing an abort under set -u.
GUEST_KUBECONFIG is only assigned inside the conditional block starting at line 327. If the kubeconfig secret wasn't found or the block was skipped, this variable is unset and line 819's ${GUEST_KUBECONFIG} will trigger an "unbound variable" error.
Proposed fix
-if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then
+if [ -n "${GUEST_KUBECONFIG:-}" ] && [ -f "${GUEST_KUBECONFIG:-}" ]; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -n "${GUEST_KUBECONFIG}" ] && [ -f "${GUEST_KUBECONFIG}" ]; then | |
| echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}" | |
| echo "" | |
| echo "To access the guest cluster:" | |
| echo " export KUBECONFIG=${GUEST_KUBECONFIG}" | |
| echo " oc get nodes" | |
| else | |
| echo "Guest cluster kubeconfig not extracted yet." | |
| echo "To extract it manually:" | |
| echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm" | |
| echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig" | |
| fi | |
| if [ -n "${GUEST_KUBECONFIG:-}" ] && [ -f "${GUEST_KUBECONFIG:-}" ]; then | |
| echo "Guest cluster kubeconfig: ${GUEST_KUBECONFIG}" | |
| echo "" | |
| echo "To access the guest cluster:" | |
| echo " export KUBECONFIG=${GUEST_KUBECONFIG}" | |
| echo " oc get nodes" | |
| else | |
| echo "Guest cluster kubeconfig not extracted yet." | |
| echo "To extract it manually:" | |
| echo " oc extract -n clusters secret/${CLUSTER_NAME}-admin-kubeconfig --to=. --confirm" | |
| echo " mv kubeconfig ${CLUSTER_NAME}-kubeconfig" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cluster-sync.sh` around lines 819 - 830, The block prints and uses
GUEST_KUBECONFIG which may be unset under set -u; ensure it cannot cause an
"unbound variable" error by either initializing GUEST_KUBECONFIG='' earlier or
using safe parameter expansion everywhere it's referenced (e.g.,
${GUEST_KUBECONFIG:-}) and in tests (use [ -n "${GUEST_KUBECONFIG:-}" ] and [ -f
"${GUEST_KUBECONFIG:-}" ]), and similarly use ${GUEST_KUBECONFIG:-} in the
echo/export lines so the script won’t abort if the variable was never assigned.
| echo "Decrease hosted cluster Prometheus retention period to 8 hours to prevent over logging..." | ||
| oc --kubeconfig "${CLUSTER_NAME}-kubeconfig" -n openshift-monitoring patch prometheus k8s --type=merge -p '{"spec":{"retention":"8h"}}' | ||
|
|
||
| # oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU |
There was a problem hiding this comment.
Commented-out line contains credentials.
Even in a comment, hardcoded credentials (-u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU) should not be committed to version control.
Proposed fix
-# oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU
+# oc login --server=https://api.hypershift.qinqon.corp:6443 -u kubeadmin -p <REDACTED>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cluster-sync.sh` at line 866, The commented-out oc login line in
cluster-sync.sh ("oc login --server=https://api.hypershift.qinqon.corp:6443 -u
kubeadmin -p SFs6e-gkwGC-H8Uk8-w3TTU") contains hardcoded credentials and must
be removed; replace it with a non-sensitive placeholder or reference
environment/secret values (e.g., use KUBECONFIG, OCP_USER/OCP_PASS env vars, or
a secret manager call) and add a brief comment explaining how to supply
credentials securely instead of embedding them in the file.
| } else if isKubeVirt { | ||
| // IPv4 KubeVirt: listen on all interfaces | ||
| // Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately | ||
| // Adding localhost would cause "bind: address already in use" error | ||
| util.UpsertEnvVar(c, corev1.EnvVar{ | ||
| Name: "ETCD_LISTEN_CLIENT_URLS", | ||
| Value: "https://[$(POD_IP)]:2379,https://localhost:2379", | ||
| }) | ||
| util.UpsertEnvVar(c, corev1.EnvVar{ | ||
| Name: "ETCD_LISTEN_METRICS_URLS", | ||
| Value: "https://[::]:2382", | ||
| Value: "https://0.0.0.0:2379", | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the file structure and read the complete statefulset.go file
cd control-plane-operator/controllers/hostedcontrolplane/v2/etcd
wc -l statefulset.goRepository: openshift/hypershift
Length of output: 82
🏁 Script executed:
# Read the full statefulset.go to see the context around lines 74-82 and the IPv6 branch
cat -n statefulset.go | head -100Repository: openshift/hypershift
Length of output: 110
🏁 Script executed:
# Search for IPv6 branch and ETCD_LISTEN_METRICS_URLS
rg -n 'IPv6|ETCD_LISTEN_METRICS_URLS|isKubeVirt' statefulset.goRepository: openshift/hypershift
Length of output: 118
🏁 Script executed:
# Look for base manifest files (YAML) in the etcd directory
cd /tmp && find /repo -path '*/etcd/*' -name '*.go' -o -name '*.yaml' | grep -i etcd | head -20
ls -la /repo/control-plane-operator/controllers/hostedcontrolplane/v2/etcd/Repository: openshift/hypershift
Length of output: 220
🏁 Script executed:
# Check git history for this file
cd /repo && git log --oneline --all -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -20Repository: openshift/hypershift
Length of output: 119
🏁 Script executed:
# Look for any manifest content embedded in the code or separate files
rg -n 'ETCD_LISTEN_PEER_URLS|ETCD_LISTEN_METRICS_URLS' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/Repository: openshift/hypershift
Length of output: 420
🏁 Script executed:
# Read the full statefulset.go file to understand all branches
cat -n control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.goRepository: openshift/hypershift
Length of output: 6977
🏁 Script executed:
# Search for all branch conditions and understand the structure
rg -B5 -A10 'if isKubeVirt' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.goRepository: openshift/hypershift
Length of output: 1140
🏁 Script executed:
# Check git log for recent changes to this file
git log --oneline -20 -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.goRepository: openshift/hypershift
Length of output: 127
🏁 Script executed:
# Check a recent commit to see what changed
git show HEAD:control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -100Repository: openshift/hypershift
Length of output: 3313
🏁 Script executed:
# Search for base manifests or embedded YAML with ETCD environment variables
find control-plane-operator -name '*.go' -o -name '*.yaml' | xargs grep -l 'ETCD_LISTEN' 2>/dev/null | head -20Repository: openshift/hypershift
Length of output: 772
🏁 Script executed:
# Look for manifest files in etcd component directory
ls -la control-plane-operator/controllers/hostedcontrolplane/v2/etcd/Repository: openshift/hypershift
Length of output: 436
🏁 Script executed:
# Check recent git diff to see what was actually changed in the commit
git show c731b58de -- control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -150Repository: openshift/hypershift
Length of output: 5516
🏁 Script executed:
# Search for any reference to base etcd manifests or templates
rg -i 'embed|manifest' control-plane-operator/controllers/hostedcontrolplane/v2/etcd/Repository: openshift/hypershift
Length of output: 1095
🏁 Script executed:
# Check if there's an etcd manifest file
find control-plane-operator/controllers/hostedcontrolplane -name '*etcd*' -type f | head -20Repository: openshift/hypershift
Length of output: 2799
🏁 Script executed:
# Read the base etcd statefulset manifest to see default environment variables
cat control-plane-operator/controllers/hostedcontrolplane/v2/assets/etcd/statefulset.yaml | head -150Repository: openshift/hypershift
Length of output: 4495
🏁 Script executed:
# Check component.go to understand how the manifest is loaded
cat control-plane-operator/controllers/hostedcontrolplane/v2/etcd/component.goRepository: openshift/hypershift
Length of output: 2110
🏁 Script executed:
# Look at one of the test fixtures to see the actual rendered output
cat control-plane-operator/controllers/hostedcontrolplane/testdata/etcd/zz_fixture_TestControlPlaneComponents_etcd_statefulset.yaml | grep -A 50 'ETCD_LISTEN'Repository: openshift/hypershift
Length of output: 2095
🏁 Script executed:
# Search for ETCD_LISTEN defaults in all files
rg 'ETCD_LISTEN_PEER_URLS' control-plane-operator/controllers/hostedcontrolplane/Repository: openshift/hypershift
Length of output: 999
ETCD_LISTEN_PEER_URLS is missing for KubeVirt (both IPv4 and IPv6) — incoming peer connections will fail due to the UDN multi-interface setup.
The base manifest sets ETCD_LISTEN_PEER_URLS to https://$(POD_IP):2380, which binds to a single pod IP. When pods have multiple interfaces with the default route through UDN (the entire rationale for this change), incoming peer-to-peer connections from other etcd members via the UDN interface will be silently dropped. Both IPv4 and IPv6 KubeVirt branches need to override this.
🔧 Proposed fix
} else if isKubeVirt {
// IPv4 KubeVirt: listen on all interfaces
// Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately
// Adding localhost would cause "bind: address already in use" error
+ util.UpsertEnvVar(c, corev1.EnvVar{
+ Name: "ETCD_LISTEN_PEER_URLS",
+ Value: "https://0.0.0.0:2380",
+ })
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_CLIENT_URLS",
Value: "https://0.0.0.0:2379",
})
}And for IPv6 KubeVirt (lines 48–58):
if isKubeVirt {
// IPv6 KubeVirt: listen on all interfaces
// Note: [::] includes localhost, so no need to specify it separately
+ util.UpsertEnvVar(c, corev1.EnvVar{
+ Name: "ETCD_LISTEN_PEER_URLS",
+ Value: "https://[::]:2380",
+ })
util.UpsertEnvVar(c, corev1.EnvVar{
Name: "ETCD_LISTEN_CLIENT_URLS",
Value: "https://[::]:2379",
})Note: The base manifest already sets ETCD_LISTEN_METRICS_URLS to https://0.0.0.0:2382 for IPv4 (and IPv6 KubeVirt explicitly sets it to https://[::]:2382), so the metrics asymmetry does not require action.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if isKubeVirt { | |
| // IPv4 KubeVirt: listen on all interfaces | |
| // Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately | |
| // Adding localhost would cause "bind: address already in use" error | |
| util.UpsertEnvVar(c, corev1.EnvVar{ | |
| Name: "ETCD_LISTEN_CLIENT_URLS", | |
| Value: "https://[$(POD_IP)]:2379,https://localhost:2379", | |
| }) | |
| util.UpsertEnvVar(c, corev1.EnvVar{ | |
| Name: "ETCD_LISTEN_METRICS_URLS", | |
| Value: "https://[::]:2382", | |
| Value: "https://0.0.0.0:2379", | |
| }) | |
| } | |
| } else if isKubeVirt { | |
| // IPv4 KubeVirt: listen on all interfaces | |
| // Note: 0.0.0.0 includes localhost (127.0.0.1), so no need to specify it separately | |
| // Adding localhost would cause "bind: address already in use" error | |
| util.UpsertEnvVar(c, corev1.EnvVar{ | |
| Name: "ETCD_LISTEN_PEER_URLS", | |
| Value: "https://0.0.0.0:2380", | |
| }) | |
| util.UpsertEnvVar(c, corev1.EnvVar{ | |
| Name: "ETCD_LISTEN_CLIENT_URLS", | |
| Value: "https://0.0.0.0:2379", | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go`
around lines 74 - 82, The KubeVirt branches are missing ETCD_LISTEN_PEER_URLS so
peers will bind to a single POD_IP and fail on multi-interface UDN setups; add
an util.UpsertEnvVar call in the same IPv4 and IPv6 KubeVirt branches to set
ETCD_LISTEN_PEER_URLS to a wildcard bind (for IPv4 use "https://0.0.0.0:2380",
for IPv6 use "https://[::]:2380"), mirroring how ETCD_LISTEN_CLIENT_URLS is set
(use the same util.UpsertEnvVar(c, corev1.EnvVar{ Name: "ETCD_LISTEN_PEER_URLS",
Value: "<appropriate value>" }) pattern).
| if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &corev1.Pod{}, enqueueHCP, isEtcdPodInNamespace(opts.Namespace))); err != nil { | ||
| return fmt.Errorf("failed to watch etcd Pods: %w", err) | ||
| } | ||
| if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &corev1.Service{}, enqueueHCP, isEtcdServiceInNamespace(opts.Namespace))); err != nil { | ||
| return fmt.Errorf("failed to watch etcd Services: %w", err) | ||
| } | ||
| if err := c.Watch(source.Kind[client.Object](opts.CPCluster.GetCache(), &discoveryv1.EndpointSlice{}, enqueueHCP, isEtcdEndpointSliceInNamespace(opts.Namespace))); err != nil { | ||
| return fmt.Errorf("failed to watch etcd EndpointSlices: %w", err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for Kind function signature in controller-runtime
rg -n "func Kind\[" --type go vendor/sigs.k8s.io/controller-runtime/pkg/source/ 2>/dev/null || \
rg -n "func Kind\[" --type go . | grep -i "controller-runtime" | head -20Repository: openshift/hypershift
Length of output: 158
🏁 Script executed:
#!/bin/bash
# Find the actual Kind implementation in the codebase or vendor
fd -t f "source.go" vendor/sigs.k8s.io/controller-runtime/pkg/source/ 2>/dev/null | head -5Repository: openshift/hypershift
Length of output: 122
🏁 Script executed:
#!/bin/bash
# Check if vendor directory exists and show structure
if [ -d "vendor/sigs.k8s.io/controller-runtime/pkg/source" ]; then
ls -la vendor/sigs.k8s.io/controller-runtime/pkg/source/
else
echo "Vendor directory not found, searching for Kind in the repo"
rg "type Kind\[" --type go -A 5 | head -30
fiRepository: openshift/hypershift
Length of output: 294
🏁 Script executed:
#!/bin/bash
# Read the Kind function signature and surrounding code
sed -n '73,100p' vendor/sigs.k8s.io/controller-runtime/pkg/source/source.goRepository: openshift/hypershift
Length of output: 928
🏁 Script executed:
#!/bin/bash
# Let's look at the actual file setup.go to see the exact code
sed -n '46,54p' control-plane-operator/hostedclusterconfigoperator/controllers/etcd/setup.goRepository: openshift/hypershift
Length of output: 752
Fix incorrect generic type parameter: source.Kind[client.Object] should omit the explicit type parameter or use concrete types.
The source.Kind function signature is func Kind[object client.Object](...) where object is the type parameter and the second argument must be a concrete value of that type. The code explicitly specifies client.Object (an interface) as the type parameter while passing concrete types (&corev1.Pod{}, &corev1.Service{}, &discoveryv1.EndpointSlice{}), causing a type mismatch.
Remove the explicit type parameter to let it be inferred from the object argument:
Suggested fix
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &corev1.Pod{}, enqueueHCP, isEtcdPodInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd Pods: %w", err)
}
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &corev1.Service{}, enqueueHCP, isEtcdServiceInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd Services: %w", err)
}
if err := c.Watch(source.Kind(opts.CPCluster.GetCache(), &discoveryv1.EndpointSlice{}, enqueueHCP, isEtcdEndpointSliceInNamespace(opts.Namespace))); err != nil {
return fmt.Errorf("failed to watch etcd EndpointSlices: %w", err)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/hostedclusterconfigoperator/controllers/etcd/setup.go`
around lines 46 - 54, The three c.Watch calls use an incorrect explicit generic
parameter (source.Kind[client.Object]) causing a type mismatch; remove the
explicit type parameter so Go can infer the concrete types from the provided
objects (change source.Kind[client.Object](opts.CPCluster.GetCache(),
&corev1.Pod{}, ...) etc. to source.Kind(opts.CPCluster.GetCache(),
&corev1.Pod{}, ...)). Update the three calls that reference source.Kind with
&corev1.Pod{}, &corev1.Service{}, and &discoveryv1.EndpointSlice{} respectively
(the lines using enqueueHCP and the isEtcd*InNamespace predicates remain the
same).
| func (r *reconciler) detectPrimaryUDNNetworkNameFromVirtLauncher(ctx context.Context, namespace string) (string, bool) { | ||
| podList := &corev1.PodList{} | ||
| if err := r.kubevirtInfraClient.List(ctx, podList, client.InNamespace(namespace), client.MatchingLabels{"kubevirt.io": "virt-launcher"}); err != nil { | ||
| return "", false | ||
| } |
There was a problem hiding this comment.
Silent List error causes destructive mis-labeling in UDN namespaces
When r.kubevirtInfraClient.List fails (transient API error, RBAC issue, context cancellation, etc.), detectPrimaryUDNNetworkNameFromVirtLauncher returns ("", false) without logging anything. The caller then takes the non-UDN branch (line 212–216), which:
- Writes
kubernetes.io/service-nameon the EndpointSlice. - Deletes
k8s.ovn.org/service-namefrom Labels. - Deletes
k8s.ovn.org/endpointslice-networkfrom Annotations.
In a Primary-UDN-configured namespace this actively strips the correct OVN-K labels from any existing EndpointSlice, silently breaking traffic until the next successful reconcile that happens to list pods without error.
The function signature should surface the error so the caller can abort instead of mis-configuring the object. At minimum, add a ctrl.LoggerFrom(ctx) log so the failure is observable.
🔧 Proposed fix – propagate error and hoist call
-func (r *reconciler) primaryUDNNetworkName(ctx context.Context, namespace string) (string, bool) {
- derived, ok := r.detectPrimaryUDNNetworkNameFromVirtLauncher(ctx, namespace)
- return derived, ok
-}
-
-func (r *reconciler) detectPrimaryUDNNetworkNameFromVirtLauncher(ctx context.Context, namespace string) (string, bool) {
+// detectPrimaryUDNNetworkName derives the OVN-K primary network name from any
+// virt-launcher pod's k8s.ovn.org/pod-networks annotation.
+func (r *reconciler) detectPrimaryUDNNetworkName(ctx context.Context, namespace string) (string, bool, error) {
+ log := ctrl.LoggerFrom(ctx)
podList := &corev1.PodList{}
if err := r.kubevirtInfraClient.List(ctx, podList, client.InNamespace(namespace), client.MatchingLabels{"kubevirt.io": "virt-launcher"}); err != nil {
- return "", false
+ return "", false, fmt.Errorf("listing virt-launcher pods in namespace %s: %w", namespace, err)
}
for i := range podList.Items {
raw := podList.Items[i].Annotations[ovnPodNetworksAnnotationKey]
if raw == "" {
continue
}
var m map[string]podNetworksAnnotationEntry
if err := json.Unmarshal([]byte(raw), &m); err != nil {
+ log.V(4).Info("skipping pod with unparseable pod-networks annotation", "pod", podList.Items[i].Name, "err", err)
continue
}
for networkKey, v := range m {
if v.Role == ovnPrimaryRole && networkKey != "default" {
- return strings.ReplaceAll(networkKey, "/", "_"), true
+ return strings.ReplaceAll(networkKey, "/", "_"), true, nil
}
}
}
- return "", false
+ return "", false, nil
}Then in reconcileKubevirtPassthroughServiceEndpointsByIPFamily, hoist the call outside the CreateOrUpdate mutate:
func (r *reconciler) reconcileKubevirtPassthroughServiceEndpointsByIPFamily(...) error {
log := ctrl.LoggerFrom(ctx)
+ udnNetworkName, isPrimaryUDN, err := r.detectPrimaryUDNNetworkName(ctx, cpService.Namespace)
+ if err != nil {
+ return fmt.Errorf("detecting primary UDN network in namespace %s: %w", cpService.Namespace, err)
+ }
+
endpointSlice := &discoveryv1.EndpointSlice{ ... }
result, err := r.CreateOrUpdate(ctx, r.kubevirtInfraClient, endpointSlice, func() error {
...
- udnNetworkName, isPrimaryUDN := r.primaryUDNNetworkName(ctx, cpService.Namespace)
if isPrimaryUDN {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`
around lines 255 - 259, detectPrimaryUDNNetworkNameFromVirtLauncher currently
swallows List errors and returns ("", false) which leads the caller
reconcileKubevirtPassthroughServiceEndpointsByIPFamily to take the non-UDN
branch and mutate/delete OVN labels; change
detectPrimaryUDNNetworkNameFromVirtLauncher to return an error (e.g., func (...)
(string, bool, error)) and propagate that error to the caller so
reconcileKubevirtPassthroughServiceEndpointsByIPFamily can abort on List
failures; additionally log the List failure with ctrl.LoggerFrom(ctx) inside
detectPrimaryUDNNetworkNameFromVirtLauncher for observability, and move the call
out of the CreateOrUpdate mutate closure in
reconcileKubevirtPassthroughServiceEndpointsByIPFamily so errors are handled
before any EndpointSlice mutations.
go.mod
Outdated
| go.opentelemetry.io/otel v1.38.0 // indirect | ||
| go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.37.0 // indirect | ||
| go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.37.0 // indirect | ||
| go.opentelemetry.io/otel/metric v1.37.0 // indirect | ||
| go.opentelemetry.io/otel/sdk v1.37.0 // indirect | ||
| go.opentelemetry.io/otel/trace v1.37.0 // indirect | ||
| go.opentelemetry.io/otel/metric v1.38.0 // indirect | ||
| go.opentelemetry.io/otel/sdk v1.38.0 // indirect | ||
| go.opentelemetry.io/otel/trace v1.38.0 // indirect |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether otlptrace v1.38.0 and otlptracegrpc v1.38.0 exist and are compatible.
curl -sf "https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/@v/v1.38.0.info" | jq .
curl -sf "https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc/@v/v1.38.0.info" | jq .Repository: openshift/hypershift
Length of output: 725
Bump OTel exporter versions to v1.38.0 to match core package versions.
The core OTel packages (go.opentelemetry.io/otel, /metric, /sdk, /trace) are bumped to v1.38.0, but the exporter packages (otlptrace and otlptracegrpc) remain at v1.37.0. This version split creates a build risk: Go's MVS will resolve the core packages at v1.38.0 while the exporters were compiled against v1.37.0 internals. OpenTelemetry has a documented history of removing internal APIs between minor versions, causing build failures in exactly this configuration (e.g., v1.11.2 vs v1.11.1).
Both go.opentelemetry.io/otel/exporters/otlp/otlptrace and go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc are available at v1.38.0 (released 2025-08-29 from the same Git hash), making the alignment straightforward.
🔧 Proposed fix
- go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.37.0 // indirect
- go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.37.0 // indirect
+ go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.38.0 // indirect
+ go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.38.0 // indirect🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` around lines 261 - 266, Update the exporter module versions so they
match the core OTel packages: change
go.opentelemetry.io/otel/exporters/otlp/otlptrace and
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc to v1.38.0 in
go.mod (so all otel, metric, sdk, trace, otlptrace and otlptracegrpc use
v1.38.0), then run go get or go mod tidy to refresh the module graph and verify
the build.
| // Use root-ca secret which is the CA that signs the ignition-server certificate. | ||
| // This is needed for Primary UDN clusters where VMs access ignition-server via internal service DNS. | ||
| caSecret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Namespace: t.controlplaneNamespace, | ||
| Name: "root-ca", | ||
| }, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# 1. Check where root-ca Secret is created/provisioned
rg -n "root-ca" --type go -C3Repository: openshift/hypershift
Length of output: 15087
🏁 Script executed:
# 2. Check remaining usages of IgnitionCACertSecret
rg -n "IgnitionCACertSecret" --type go -C3Repository: openshift/hypershift
Length of output: 7620
🏁 Script executed:
# 3. Check for Primary UDN detection logic
rg -n "PrimaryUDN|IsPrimaryUDN|IsHostedClusterPrimary" --type go -C2Repository: openshift/hypershift
Length of output: 27729
🏁 Script executed:
# Check what IgnitionCACertSecret actually is
rg -n "IgnitionCACertSecret" --type go -A 5 hypershift-operator/controllers/manifests/ignitionserver/manifests.goRepository: openshift/hypershift
Length of output: 270
🏁 Script executed:
# Check token.go more thoroughly - see the full function and its caller
rg -n "getIgnitionCACert\|NewToken" --type go -B2 -A8 hypershift-operator/controllers/nodepool/token.goRepository: openshift/hypershift
Length of output: 46
🏁 Script executed:
# Check conditions.go to see how it uses IgnitionCACertSecret
rg -n "IgnitionCACertSecret" --type go -B3 -A8 hypershift-operator/controllers/nodepool/conditions.goRepository: openshift/hypershift
Length of output: 777
🏁 Script executed:
# Find ResourceName definition in ignitionserver manifests
rg -n "ResourceName" --type go -B2 -A2 hypershift-operator/controllers/manifests/ignitionserver/Repository: openshift/hypershift
Length of output: 6655
🏁 Script executed:
# Check token.go full function implementation including the surrounding context
cat -n hypershift-operator/controllers/nodepool/token.go | head -200 | tail -80Repository: openshift/hypershift
Length of output: 3494
🏁 Script executed:
# Check if there's a git diff or recent change to token.go
git log --oneline -20 hypershift-operator/controllers/nodepool/token.goRepository: openshift/hypershift
Length of output: 127
🏁 Script executed:
# Verify: are root-ca and ignition-server-ca-cert the same or different?
# Check what creates each secret
rg -n "ReconcileRootCA\|ignitionserver\|IgnitionCACertSecret" --type go control-plane-operator/controllers/hostedcontrolplane/v2/ignitionserver/pki.goRepository: openshift/hypershift
Length of output: 46
🏁 Script executed:
# Check the test file to see what the old code expected
cat -n hypershift-operator/controllers/nodepool/token_test.go | head -100Repository: openshift/hypershift
Length of output: 3709
🏁 Script executed:
# Check the git diff for the recent commit
git show c731b58de --statRepository: openshift/hypershift
Length of output: 50378
🏁 Script executed:
# Get the full diff for token.go changes
git show c731b58de -- hypershift-operator/controllers/nodepool/token.goRepository: openshift/hypershift
Length of output: 18950
🏁 Script executed:
# Also check if tests were updated
git show c731b58de -- hypershift-operator/controllers/nodepool/token_test.goRepository: openshift/hypershift
Length of output: 33714
Tests are misaligned with implementation and will fail.
Lines 157–164 implement getIgnitionCACert to read the hardcoded "root-ca" secret, but the test file (token_test.go) mocks ignitionserver.IgnitionCACertSecret(controlplaneNamespace), which creates a secret named "ignition-server-ca-cert". The implementation will fail to find the mocked secret at test time. For example, TestNewToken (line 239) expects an error about "ignition-server-ca-cert" not found, but the code will error about "root-ca" instead.
Additionally, the scope concern in the original comment is valid: IgnitionCACertSecret (the ignition-server CA cert) is still actively used in conditions.go:277, agent/agent.go:198, and pki.go:43, indicating this change is an incomplete migration. Switching to root-ca unconditionally without a PUDN guard means non-PUDN clusters will now use a different CA than before, even though root-ca is provisioned universally. If root-ca does not sign the ignition-server TLS certificate for non-PUDN setups, VMs will fail TLS validation during ignition.
The comment on lines 157–158 explicitly scopes this to Primary UDN clusters, but there is no conditional guard. Either add a PUDN check or update the test mocks to use "root-ca" and verify all code paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hypershift-operator/controllers/nodepool/token.go` around lines 157 - 164,
The implementation of getIgnitionCACert currently reads a hardcoded "root-ca"
secret but tests and other code expect the ignition-server CA secret
("ignition-server-ca-cert") — fix by either restoring conditional logic to
select ignitionserver.IgnitionCACertSecret(controlplaneNamespace) when not in
Primary UDN mode or update the tests/mocks to return "root-ca"; specifically
adjust getIgnitionCACert (and any call sites using root-ca) to consult the PUDN
guard before choosing the secret name, or change token_test.go to mock
ignitionserver.IgnitionCACertSecret to return a secret named "root-ca" so
TestNewToken and related tests align with the implementation; keep references
consistent with conditions.go, agent/agent.go, and pki.go usage of the
ignition-server CA.
|
/test e2e-kubevirt-metal-conformance |
|
/test pull-ci-openshift-hypershift-main-e2e-kubevirt-metal-conformance |
|
/test e2e-kubevirt-metal-conformance |
|
/payload 4.22 nightly informing hypershift-kubevirt-ovn-conformance-4.22 |
|
/payload hypershift-kubevirt-ovn-conformance |
|
/payload-job periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-azure-kubevirt-ovn |
|
@qinqon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/793e8260-1094-11f1-9f14-923b1f4e9bf2-0 |
|
/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-azure-kubevirt-ovn |
|
@qinqon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f98fef80-1094-11f1-9a9e-7236f40753ee-0 |
|
/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-azure-kubevirt-ovn |
|
@qinqon: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
8ce4a71 to
f6871f0
Compare
|
/test e2e-azure-kubevirt-ovn |
f6871f0 to
b7f11ef
Compare
|
/test e2e-azure-kubevirt-ovn |
b7f11ef to
43fdc0d
Compare
|
/test e2e-azure-kubevirt-ovn |
43fdc0d to
f1f8852
Compare
|
/test e2e-azure-kubevirt-ovn |
f1f8852 to
2251cbf
Compare
|
/test e2e-azure-kubevirt-ovn |
2251cbf to
9627787
Compare
|
/test e2e-azure-kubevirt-ovn |
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add an HCCO controller for KubeVirt Primary UDN that creates UDN-style EndpointSlices for etcd services and orphans default EndpointSlices to avoid DNS returning default-network Pod IPs. Assisted-by: Claude upus 4.6 <noreply@anthropic.com> Signed-off-by: Ram Lavi <ralavi@redhat.com>
…h bridge For KubeVirt clusters whose control-plane namespace is Primary UDN, reconcile an internal DNS instance and patch dns.operator to forward the apps domain (and OAuth zone) to it. Maintain an oauth-bridge Service/Endpoints in the guest cluster targeting the management OAuth pod UDN IP, and gate DNS operator changes until internal DNS is ready. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add kubevirt create flags to configure Primary UDN name/subnet, reconcile the HCP namespace with the required OVN primary-udn label/annotation at creation time, ensure the UserDefinedNetwork CR, and adjust KubeVirt VM templates for Primary UDN (l2bridge binding, omit live-migration annotation). Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
In order to run hosted cluster with PrimaryUDN in OpenShift CI for the e2e-azure-kubevirt-ovn job - setting default Primary UDN flags when running that specific job. This is done in order to avoid adding these flags on a separate repo (openshift/release). Signed-off-by: Ram Lavi <ralavi@redhat.com>
temporary change for debug. Signed-off-by: Ram Lavi <ralavi@redhat.com>
9627787 to
578183a
Compare
578183a to
573f329
Compare
573f329 to
6ee95a2
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RamLavi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
rechecking without the ugly hack commit, now that openshift/cluster-ingress-operator#1382 is merged |
|
/test e2e-azure-kubevirt-ovn |
…ssing Keep the namespace creation-time Primary UDN enforcement by deleting and recreating pre-created namespaces missing OVN UDN label/annotation, while dropping the ingress-operator restart workaround now that upstream ingress fix is merged. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ram Lavi <ralavi@redhat.com>
|
/test e2e-azure-kubevirt-ovn |
|
@RamLavi: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
This PR holds the code changes in order to run a POC of a hosted cluster with PrimaryUDN.
Which issue(s) this PR fixes:
Fixes
Special notes for your reviewer:
PR is not suitable for review, and would need further grooming.
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests