Skip to content

Conversation

@rolandmkunkel
Copy link

This change introduces the capability to run a must-gather against an ACM (Advanced Cluster Management) Hosted Control Plane cluster. This required adding a mechanism to select a different must-gather image and provide specific arguments for it.

Key changes include:

API Extension: The MustGather CRD spec has been updated with three new fields:

mustGatherImage: An enum that allows selecting the type of must-gather to run. It supports "default" (the existing behavior) and "acm_hcp" for hosted clusters.
hostedClusterName: The name of the hosted cluster.
hostedClusterNamespace: The namespace of the hosted cluster.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 24, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 24, 2025

@rolandmkunkel: This pull request references SREP-1884 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This change introduces the capability to run a must-gather against an ACM (Advanced Cluster Management) Hosted Control Plane cluster. This required adding a mechanism to select a different must-gather image and provide specific arguments for it.

Key changes include:

API Extension: The MustGather CRD spec has been updated with three new fields:

mustGatherImage: An enum that allows selecting the type of must-gather to run. It supports "default" (the existing behavior) and "acm_hcp" for hosted clusters.
hostedClusterName: The name of the hosted cluster.
hostedClusterNamespace: The namespace of the hosted cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rolandmkunkel
Once this PR has been reviewed and has the lgtm label, please assign devppratik for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

Adds mustGatherImage and hostedClusterOptions to the MustGather API/CRD, introduces an ACM HCP must-gather image env var, implements validation and template/runtime changes to select/pass the ACM HCP image, updates deepcopy generation and tests. A duplicate IsValid implementation exists in the controller file.

Changes

Cohort / File(s) Summary
API types & deepcopy
api/v1alpha1/mustgather_types.go, api/v1alpha1/zz_generated.deepcopy.go
Added MustGatherImage string (enum ["default","acm_hcp"], default "default") and HostedClusterOptions (namespace/name) to MustGatherSpec; added deepcopy methods and MustGatherSpec copy logic for HostedClusterOptions.
CRDs
deploy/crds/operator.openshift.io_mustgathers.yaml, bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml
Added mustGatherImage, hostedClusterName, hostedClusterNamespace fields and x-kubernetes-validations enforcing presence/constraints when mustGatherImage == "acm_hcp".
Manifests & Deployment
bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml, deploy/99_must-gather-operator.Deployment.yaml, README.md
Added ACM_HCP_MUST_GATHER_IMAGE environment variable (example value registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8) alongside existing defaults; README run/dev examples updated to show the new env var.
Controller & validation
controllers/mustgather/mustgather_controller.go
Added func (r *MustGatherReconciler) IsValid(instance *mustgatherv1alpha1.MustGather) (bool, error) and call from Reconcile to enforce HostedClusterOptions presence and audit == false when mustGatherImage == "acm_hcp". Note: IsValid is implemented twice in the file (duplicate).
Template & runtime command
controllers/mustgather/template.go
Extended getGatherContainer signature to accept mustGatherImage and HostedClusterOptions; selects image via DEFAULT_MUST_GATHER_IMAGE / ACM_HCP_MUST_GATHER_IMAGE env vars and appends hosted-cluster args when acm_hcp; updated call sites and command construction.
Tests
controllers/mustgather/mustgather_controller_test.go, controllers/mustgather/template_test.go
Added TestIsValid covering default vs acm_hcp scenarios and hosted-cluster validation; expanded Test_getGatherContainer to verify image selection and hosted-cluster args.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Points to review carefully:

  • Duplicate IsValid implementations in controllers/mustgather/mustgather_controller.go (remove one).
  • CRD x-kubernetes-validations logic and messages for edge cases.
  • Runtime image-selection and environment variable handling in controllers/mustgather/template.go.
  • DeepCopy logic for HostedClusterOptions in zz_generated.deepcopy.go.
  • Tests for environment-dependent behavior and error message exactness.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "SREP-1884: Add support for ACM Hosted Control Plane (HCP) must-gathers" directly and clearly summarizes the main objective of the changeset. The title accurately reflects the primary purpose: adding ACM HCP support with new must-gather image selection capability and hosted cluster context parameters. The title is specific, concise, and void of vague language or unnecessary noise, making it easy for team members reviewing the project history to understand the core change at a glance.
Description Check ✅ Passed The pull request description is clearly related to the changeset and provides meaningful context about the changes introduced. The description accurately explains the addition of ACM HCP must-gather capability and correctly identifies the three new CRD fields (mustGatherImage, hostedClusterName, hostedClusterNamespace) that were added to the MustGather spec, which is confirmed by the raw_summary. The description also mentions the mechanism for selecting different must-gather images and providing specific arguments, aligning with the environment variable additions and controller logic changes present in the changeset.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.37%. Comparing base (985c9e2) to head (c1779a2).

Files with missing lines Patch % Lines
controllers/mustgather/mustgather_controller.go 64.28% 3 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   70.98%   71.37%   +0.39%     
==========================================
  Files           7        7              
  Lines         479      503      +24     
==========================================
+ Hits          340      359      +19     
- Misses        136      139       +3     
- Partials        3        5       +2     
Files with missing lines Coverage Δ
api/v1alpha1/mustgather_types.go 100.00% <ø> (ø)
controllers/mustgather/template.go 98.93% <100.00%> (+0.05%) ⬆️
controllers/mustgather/mustgather_controller.go 90.76% <64.28%> (-2.19%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rolandmkunkel
Copy link
Author

/retest

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
api/v1alpha1/mustgather_types.go (1)

63-64: Avoid hardcoding an exact image tag in public API comments.

Move concrete image/tag details to operator docs/env var description; keep API comment generic to reduce churn.

bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml (1)

211-212: Mirror quoting/digest guidance in CSV.

Keep CSV aligned with Deployment to prevent OLM drift.

-                - name: ACM_HCP_MUST_GATHER_IMAGE
-                  value: registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8
+                - name: ACM_HCP_MUST_GATHER_IMAGE
+                  value: "registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8"
README.md (1)

108-109: Add a minimal HCP CR example using the new spec fields.

Helps users understand when/how to set mustGatherImage and hosted fields.

Example:

apiVersion: operator.openshift.io/v1alpha1
kind: MustGather
metadata:
  name: example-acm-hcp
spec:
  mustGatherImage: acm_hcp
  hostedClusterNamespace: clusters
  hostedClusterName: my-hc
  caseID: "02527285"
  caseManagementAccountSecretRef:
    name: case-management-creds
  # audit must be omitted/false with acm_hcp
deploy/99_must-gather-operator.Deployment.yaml (1)

45-46: Quote the image value for YAML best practice and consistency.

The env var is properly consumed by the controller code (via os.Getenv() in controllers/mustgather/template.go:176-178) and correctly selects the image when mustGatherImage='acm_hcp'. Hosted cluster arguments are properly validated and passed. Quoting is recommended for consistency, since both DEFAULT_MUST_GATHER_IMAGE and ACM_HCP_MUST_GATHER_IMAGE should follow the same pattern. Regarding digest pinning: both images currently use tags; applying digests only to ACM_HCP_MUST_GATHER_IMAGE would create inconsistency with the default image.

Apply:

-            - name: ACM_HCP_MUST_GATHER_IMAGE
-              value: registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8
+            - name: ACM_HCP_MUST_GATHER_IMAGE
+              value: "registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8"
deploy/crds/operator.openshift.io_mustgathers.yaml (1)

48-68: Consider adding CRD-level validation for conditional field requirements.

The descriptions state that hostedClusterName and hostedClusterNamespace are required when mustGatherImage='acm_hcp', but there's no CEL validation rule to enforce this at the CRD level. While the controller's IsValid method provides application-level validation, adding CRD-level validation would reject invalid resources earlier and provide better user feedback.

Consider adding this validation rule to the spec's x-kubernetes-validations:

spec:
  description: MustGatherSpec defines the desired state of MustGather
  properties:
    # ... existing properties ...
  type: object
  x-kubernetes-validations:
  - message: hostedClusterNamespace and hostedClusterName must be set when mustGatherImage is 'acm_hcp'
    rule: "self.mustGatherImage != 'acm_hcp' || (has(self.hostedClusterNamespace) && self.hostedClusterNamespace != '' && has(self.hostedClusterName) && self.hostedClusterName != '')"
bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1)

159-179: Consider consistent field placement across CRD files.

The new fields (mustGatherImage, hostedClusterName, hostedClusterNamespace) are placed after uploadTarget in this file, but in deploy/crds/operator.openshift.io_mustgathers.yaml they're placed before uploadTarget (lines 48-68). While YAML doesn't care about property order, consistent placement across CRD files improves maintainability and makes diffs easier to review.

Additionally, the same recommendation for CRD-level validation applies here (see comment on deploy/crds/operator.openshift.io_mustgathers.yaml).

controllers/mustgather/template.go (1)

167-197: Consider optimizing env var reads and handling missing variables.

Two minor observations:

  1. Optimize environment variable reads: Line 176 reads the default image env var unconditionally, even when mustGatherImage == acmHcpMustGatherImage. Consider only reading the needed variable:
-	commandArgs := ""
-	containerImage := strings.TrimSpace(os.Getenv(defaultMustGatherImageEnv))
-	if mustGatherImage == acmHcpMustGatherImage {
-		containerImage = strings.TrimSpace(os.Getenv(acmHcpMustGatherImageEnv))
+	var containerImage string
+	commandArgs := ""
+	if mustGatherImage == acmHcpMustGatherImage {
+		containerImage = strings.TrimSpace(os.Getenv(acmHcpMustGatherImageEnv))
 		commandArgs = fmt.Sprintf("--hosted-cluster-namespace=%s --hosted-cluster-name=%s", hostedClusterNamespace, hostedClusterName)
+	} else {
+		containerImage = strings.TrimSpace(os.Getenv(defaultMustGatherImageEnv))
 	}
  1. Missing environment variable handling: If the required environment variable is not set, containerImage will be an empty string, which could lead to pod creation failure. Consider logging a warning or using a fallback value, though this may be intentional if the operator deployment guarantees these vars are set.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 80de730 and 39bb966.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • api/v1alpha1/mustgather_types.go (1 hunks)
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1 hunks)
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml (1 hunks)
  • controllers/mustgather/mustgather_controller.go (2 hunks)
  • controllers/mustgather/mustgather_controller_test.go (1 hunks)
  • controllers/mustgather/template.go (4 hunks)
  • controllers/mustgather/template_test.go (2 hunks)
  • deploy/99_must-gather-operator.Deployment.yaml (1 hunks)
  • deploy/crds/operator.openshift.io_mustgathers.yaml (1 hunks)
🔇 Additional comments (4)
controllers/mustgather/mustgather_controller.go (1)

365-379: Validation logic and constant sharing verified.

The acmHcpMustGatherImage constant is properly defined once in controllers/mustgather/template.go and consistently referenced across the codebase (controller, template, and tests), ensuring single source of truth with no string drift.

controllers/mustgather/mustgather_controller_test.go (1)

1192-1271: LGTM! Comprehensive validation test coverage.

The test function covers all validation scenarios appropriately:

  • Default image without hosted cluster info (valid)
  • ACM HCP image with complete hosted cluster info (valid)
  • ACM HCP image with missing hosted cluster name (invalid)
  • ACM HCP image with missing hosted cluster namespace (invalid)
  • ACM HCP image with no hosted cluster info (invalid)

The table-driven test pattern is idiomatic and the assertions verify both the validity boolean and error messages correctly.

controllers/mustgather/template_test.go (1)

45-123: LGTM! Test coverage expanded appropriately.

The test updates correctly exercise the new parameters:

  • New test constants for both must-gather images
  • Expanded test cases including "HCP default" scenario
  • Environment variables set for both image types
  • Assertions verify both image selection and argument construction

The changes maintain consistency with the updated function signature and add valuable test coverage for the ACM HCP use case.

controllers/mustgather/template.go (1)

49-55: LGTM! Constants are well-defined.

The environment variable names and image type constants follow clear naming conventions and align with the CRD enum values.

Comment on lines 69 to 80
// The namespace of the hosted cluster to collect the must-gather for.
// This field is required if mustGatherImage='acm_hcp'.
// +kubebuilder:validation:Optional
HostedClusterNamespace string `json:"hostedClusterNamespace,omitempty"`

// The name of the hosted cluster to collect the must-gather for.
// This field is required if mustGatherImage='acm_hcp'.
// +kubebuilder:validation:Optional
HostedClusterName string `json:"hostedClusterName,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think that we'll ever need to add additional options specific to HCP must gathers? I'm not close-enough to the functionality to know if we will or not.

If so - should we put it into its own subfield to keep the api cleaner?

So it would be something like HostedClusterOptions that's a HostedClusterOptions type, which would then contain Namespace and Name string fields.

It would end up looking like this:

spec:
  mustGatherImage: acm_hcp
  hostedClusterOptions:
    Namespace: my_hcp_namespace
    Name: my_cluster_name

Comment on lines 74 to 77
// The name of the hosted cluster to collect the must-gather for.
// This field is required if mustGatherImage='acm_hcp'.
// +kubebuilder:validation:Optional
HostedClusterName string `json:"hostedClusterName,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally - do we actually need the name here? Could we, given a namespace, infer the name somehow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could, but that would make the code between HCP and classic diverge more

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this after diving a bit deeper into hypershift there's actually not an enforced 1:1 relationship with where the hosted cluster namespace is and the name of the hosted cluster - meaning you can put multiple hosted cluster CRs into a single namespace. So I'm now +1 to keeping this. It's always easier to make something optional later than it is to make it required later, as that change shouldn't break anything that's already relying on that functionality.

Thanks for looking!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
controllers/mustgather/mustgather_controller.go (1)

110-114: Use the request-scoped ctx instead of context.TODO().

This issue was previously flagged: the call to r.ManageError uses context.TODO(), which loses request-scoped cancellation/deadline and logging correlation. Replace with the ctx parameter passed to Reconcile.

Apply this diff:

-	return r.ManageError(context.TODO(), instance, err)
+	return r.ManageError(ctx, instance, err)
🧹 Nitpick comments (2)
controllers/mustgather/mustgather_controller_test.go (1)

1192-1271: Consider adding a test case for the Audit constraint with ACM HCP image.

The test suite covers the hosted cluster field requirements, but there's no test case verifying that Audit must be false when MustGatherImage is "acm_hcp" (as enforced by the XValidation rule in the API). Adding this test case would improve coverage and ensure the validation logic matches the API constraints.

Add a test case like:

{
	name: "ACM HCP image should be invalid with audit enabled",
	mustGather: &mustgatherv1alpha1.MustGather{
		Spec: mustgatherv1alpha1.MustGatherSpec{
			MustGatherImage:        "acm_hcp",
			HostedClusterNamespace: "test-ns",
			HostedClusterName:      "test-cluster",
			Audit:                  true,
		},
	},
	expectedValid: false,
	expectedError: errors.New("cannot collect audit logs when using the acm_hcp must gather image"),
},
controllers/mustgather/template.go (1)

167-180: Add error handling for missing environment variables.

If the environment variables DEFAULT_MUST_GATHER_IMAGE or ACM_HCP_MUST_GATHER_IMAGE are not set, containerImage will be an empty string, leading to a pod creation failure with an obscure error. Consider returning an error or using fallback defaults to fail fast with a clear message.

Apply this pattern:

 func getGatherContainer(audit bool, timeout time.Duration, mustGatherImage, hostedClusterNamespace, hostedClusterName string) corev1.Container {
 	var commandBinary string
 	if audit {
 		commandBinary = gatherCommandBinaryAudit
 	} else {
 		commandBinary = gatherCommandBinaryNoAudit
 	}
 
 	commandArgs := ""
 	containerImage := strings.TrimSpace(os.Getenv(defaultMustGatherImageEnv))
+	if containerImage == "" {
+		// Fallback or panic with clear error message
+		panic(fmt.Sprintf("environment variable %s is not set", defaultMustGatherImageEnv))
+	}
 	if mustGatherImage == acmHcpMustGatherImage {
 		containerImage = strings.TrimSpace(os.Getenv(acmHcpMustGatherImageEnv))
+		if containerImage == "" {
+			panic(fmt.Sprintf("environment variable %s is not set", acmHcpMustGatherImageEnv))
+		}
 		commandArgs = fmt.Sprintf("--hosted-cluster-namespace=%s --hosted-cluster-name=%s", hostedClusterNamespace, hostedClusterName)
 	}

Alternatively, change getGatherContainer to return (corev1.Container, error) and propagate the error up through getJobTemplate.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 39bb966 and 4dc694a.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • api/v1alpha1/mustgather_types.go (2 hunks)
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1 hunks)
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml (1 hunks)
  • controllers/mustgather/mustgather_controller.go (2 hunks)
  • controllers/mustgather/mustgather_controller_test.go (1 hunks)
  • controllers/mustgather/template.go (4 hunks)
  • controllers/mustgather/template_test.go (2 hunks)
  • deploy/99_must-gather-operator.Deployment.yaml (1 hunks)
  • deploy/crds/operator.openshift.io_mustgathers.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy/crds/operator.openshift.io_mustgathers.yaml
  • README.md
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml
  • deploy/99_must-gather-operator.Deployment.yaml
🔇 Additional comments (5)
controllers/mustgather/mustgather_controller.go (1)

365-379: LGTM! Defensive validation complements CRD-level checks.

The validation logic correctly enforces that ACM HCP must-gathers require hosted cluster context and cannot collect audit logs. While these constraints are also enforced by CRD XValidation rules, this runtime check provides an additional safety layer and clearer error messages.

controllers/mustgather/template_test.go (1)

45-123: LGTM! Test coverage appropriately expanded for ACM HCP support.

The test updates correctly:

  • Expand test cases to include HCP scenarios with hosted cluster parameters
  • Set both must-gather image environment variables for test isolation
  • Verify container image selection based on mustGatherImage value
  • Assert that hosted-cluster arguments are properly included in the command

The test coverage ensures the new functionality works as expected.

api/v1alpha1/mustgather_types.go (1)

28-80: LGTM! API changes correctly implement ACM HCP support with proper validation.

The additions are well-structured:

  • XValidation rules enforce cross-field constraints at the API layer (acm_hcp requires hosted cluster fields, forbids audit, and non-acm_hcp forbids hosted fields)
  • New fields include appropriate kubebuilder markers (Optional, Enum, default values)
  • Default value "default" for MustGatherImage ensures backward compatibility
  • Field documentation clearly explains the requirements and usage

This addresses the past review comment requesting struct-level XValidation to encode HCP invariants in the type system.

bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1)

159-179: LGTM! CRD schema correctly reflects the new API fields.

The OpenAPI schema additions properly define:

  • mustGatherImage with enum constraint and default value matching the API type
  • hostedClusterName and hostedClusterNamespace with clear descriptions
  • Field descriptions that reference the acm_hcp requirement

The schema changes align with the Go type definitions in api/v1alpha1/mustgather_types.go.

controllers/mustgather/template.go (1)

50-54: LGTM! Constants appropriately defined for image selection.

The new constants clearly identify:

  • Environment variable names for must-gather images
  • Image type identifiers matching the API enum values

This makes the code self-documenting and maintainable.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
controllers/mustgather/mustgather_controller.go (1)

109-113: Use the request-scoped ctx instead of context.TODO().

This issue was previously flagged. Using context.TODO() loses request-scoped cancellation/deadline and logging correlation.

Apply this diff:

-		return r.ManageError(context.TODO(), instance, err)
+		return r.ManageError(ctx, instance, err)
🧹 Nitpick comments (1)
controllers/mustgather/template.go (1)

199-204: Add defensive nil check for hostedClusterOptions before dereferencing.

While the controller's IsValid method ensures hostedClusterOptions is non-nil when mustGatherImage is acm_hcp, defensive programming suggests adding an explicit nil check here to prevent panics if the validation is bypassed or modified in the future.

Apply this diff:

 	commandArgs := ""
 	containerImage := strings.TrimSpace(os.Getenv(defaultMustGatherImageEnv))
 	if mustGatherImage == acmHcpMustGatherImage {
 		containerImage = strings.TrimSpace(os.Getenv(acmHcpMustGatherImageEnv))
+		if hostedClusterOptions == nil {
+			// This should not happen if IsValid is called, but adding defensive check
+			log.Error(nil, "hostedClusterOptions is nil for acm_hcp image")
+			return corev1.Container{}
+		}
 		commandArgs = fmt.Sprintf("--hosted-cluster-namespace=%s --hosted-cluster-name=%s", hostedClusterOptions.HostedClusterNamespace, hostedClusterOptions.HostedClusterName)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc694a and 697a9b1.

📒 Files selected for processing (11)
  • README.md (1 hunks)
  • api/v1alpha1/mustgather_types.go (2 hunks)
  • api/v1alpha1/zz_generated.deepcopy.go (2 hunks)
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1 hunks)
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml (1 hunks)
  • controllers/mustgather/mustgather_controller.go (2 hunks)
  • controllers/mustgather/mustgather_controller_test.go (1 hunks)
  • controllers/mustgather/template.go (5 hunks)
  • controllers/mustgather/template_test.go (3 hunks)
  • deploy/99_must-gather-operator.Deployment.yaml (1 hunks)
  • deploy/crds/operator.openshift.io_mustgathers.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy/99_must-gather-operator.Deployment.yaml
  • README.md
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml
🔇 Additional comments (10)
api/v1alpha1/zz_generated.deepcopy.go (1)

12-25: LGTM! Auto-generated deepcopy code is correct.

The deepcopy implementation for HostedClusterOptions and the integration into MustGatherSpec.DeepCopyInto follow standard controller-runtime patterns and are correctly generated.

Also applies to: 102-106

controllers/mustgather/mustgather_controller_test.go (1)

1150-1235: LGTM! Comprehensive test coverage for IsValid validation logic.

The test cases thoroughly cover all validation scenarios:

  • Default image without hosted cluster options
  • ACM HCP with valid configuration
  • ACM HCP with missing required fields (name, namespace, or entire options struct)

The table-driven test approach is clean and maintainable.

controllers/mustgather/mustgather_controller.go (2)

334-353: LGTM! IsValid validation logic is correct.

The validation logic properly enforces the ACM HCP constraints:

  • Requires HostedClusterOptions to be non-nil when using acm_hcp image
  • Requires both HostedClusterNamespace and HostedClusterName to be set
  • Enforces that audit must be disabled for acm_hcp mode

334-353: No duplicate IsValid implementations found.

The verification confirms only one IsValid method declaration exists in the controller file at line 334. The original concern about duplicate implementations causing compilation errors is unfounded. The review comment should be disregarded.

Likely an incorrect or invalid review comment.

controllers/mustgather/template_test.go (1)

79-167: LGTM! Test updates correctly reflect the new signature and behavior.

The tests properly:

  • Set up both DEFAULT_MUST_GATHER_IMAGE and ACM_HCP_MUST_GATHER_IMAGE environment variables
  • Pass the new mustGatherImage and hostedClusterOptions parameters
  • Verify the container uses the correct image
  • Validate that ACM HCP command arguments are correctly appended
api/v1alpha1/mustgather_types.go (2)

28-30: LGTM! XValidation rules comprehensively enforce ACM HCP constraints.

The struct-level validations ensure:

  1. When mustGatherImage is acm_hcp, the hosted cluster namespace and name are required
  2. Audit must be false for acm_hcp mode
  3. Hosted cluster fields are only valid when using acm_hcp image

This prevents invalid configurations at the API layer before reaching the controller.


68-92: LGTM! New fields are well-structured with proper validation.

The MustGatherImage enum and HostedClusterOptions type are properly defined with:

  • Clear enum values and default for MustGatherImage
  • Required validation for HostedClusterOptions fields
  • Good inline documentation
deploy/crds/operator.openshift.io_mustgathers.yaml (1)

48-75: LGTM! CRD schema correctly reflects the Go type definitions.

The CRD updates are consistent with api/v1alpha1/mustgather_types.go:

  • hostedClusterOptions fields match the HostedClusterOptions struct
  • mustGatherImage enum and default match the Go field tags
  • X-kubernetes-validations match the XValidation rules

Also applies to: 233-243

controllers/mustgather/template.go (2)

182-217: LGTM! Image selection and command argument logic is correct.

The implementation properly:

  • Selects the container image based on mustGatherImage value
  • Constructs ACM HCP-specific command arguments with namespace and cluster name
  • Maintains backward compatibility with the default image flow

200-204: Environment variables are properly configured in the deployment manifest.

The DEFAULT_MUST_GATHER_IMAGE and ACM_HCP_MUST_GATHER_IMAGE environment variables are both explicitly set with non-empty values in the deployment manifest (deploy/99_must-gather-operator.Deployment.yaml):

  • DEFAULT_MUST_GATHER_IMAGE: quay.io/openshift/origin-must-gather:latest
  • ACM_HCP_MUST_GATHER_IMAGE: registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8

The code will receive these values when os.Getenv() is called, and containerImage will not be empty. The concern raised in the original review comment is not applicable.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deploy/99_must-gather-operator.Deployment.yaml (1)

45-46: Consider version management strategy for the ACM HCP must-gather image.

The image reference is pinned to v2.8. If ACM releases newer versions with fixes or features, this hardcoded value will require manual updates across deployments.

api/v1alpha1/mustgather_types.go (1)

68-75: Clarify image selection in documentation.

The comments reference specific registry URLs, but the actual image selection happens via environment variables (DEFAULT_MUST_GATHER_IMAGE and ACM_HCP_MUST_GATHER_IMAGE). Consider updating the comments to reference the environment variables rather than hardcoding URLs, which improves maintainability.

Example:

 // The image to use for the must-gather pods.
 // This can currently only be one of two values:
-// 'default' for most cases including OCP and Managed OpenShift, which will use quay.io/openshift/must-gather
-// 'acm_hcp' for HCP must gathers, which will use registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8
+// 'default' for most cases including OCP and Managed OpenShift, which will use the image from DEFAULT_MUST_GATHER_IMAGE env var
+// 'acm_hcp' for HCP must gathers, which will use the image from ACM_HCP_MUST_GATHER_IMAGE env var
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 697a9b1 and c1779a2.

📒 Files selected for processing (11)
  • README.md (1 hunks)
  • api/v1alpha1/mustgather_types.go (2 hunks)
  • api/v1alpha1/zz_generated.deepcopy.go (2 hunks)
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml (1 hunks)
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml (1 hunks)
  • controllers/mustgather/mustgather_controller.go (2 hunks)
  • controllers/mustgather/mustgather_controller_test.go (1 hunks)
  • controllers/mustgather/template.go (5 hunks)
  • controllers/mustgather/template_test.go (3 hunks)
  • deploy/99_must-gather-operator.Deployment.yaml (1 hunks)
  • deploy/crds/operator.openshift.io_mustgathers.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • bundle/manifests/tech-preview/support-log-gather-operator.clusterserviceversion.yaml
  • controllers/mustgather/template_test.go
  • controllers/mustgather/template.go
  • README.md
  • controllers/mustgather/mustgather_controller_test.go
  • bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml
  • deploy/crds/operator.openshift.io_mustgathers.yaml
🔇 Additional comments (5)
api/v1alpha1/mustgather_types.go (2)

28-30: LGTM! XValidation rules correctly enforce HCP invariants.

The struct-level validations properly enforce that:

  • When mustGatherImage is acm_hcp, hosted cluster fields are required and audit must be false
  • When mustGatherImage is not acm_hcp, hosted cluster fields must be absent

This prevents invalid configurations at the API layer.


82-92: LGTM! HostedClusterOptions struct is well-defined.

The required validations on both fields align with the XValidation rules at the spec level and ensure that hosted cluster information is complete when specified.

controllers/mustgather/mustgather_controller.go (2)

98-102: LGTM! Validation is correctly placed before finalizer handling.

Early validation prevents invalid MustGather resources from entering the reconciliation flow, ensuring clean error handling.


299-318: No issues found. The review comment is factually incorrect on multiple counts.

The script results definitively show:

  1. acmHcpMustGatherImage constant IS defined in controllers/mustgather/template.go:55 as "acm_hcp". The concern that it "is not visible in the provided code" is incorrect.

  2. Only ONE IsValid method exists in the controller file (mustgather_controller.go:299-318). The concern about a "duplicate IsValid implementation within this file" is incorrect. (The IsValid method found in vendored code is from an external dependency, not a duplicate within the controller.)

The validation logic may or may not duplicate XValidation rules in the API types, but the factual errors regarding the constant definition and duplicate method implementation invalidate the core premises of this review comment.

Likely an incorrect or invalid review comment.

api/v1alpha1/zz_generated.deepcopy.go (1)

12-25: LGTM! Generated deepcopy methods are correct.

The auto-generated deepcopy logic properly handles the new HostedClusterOptions type and its integration into MustGatherSpec, including appropriate nil checks for the pointer field.

Also applies to: 119-123

@rolandmkunkel
Copy link
Author

Thanks @iamkirkbater, I've incorporated your suggestions

@rolandmkunkel
Copy link
Author

/retest

@iamkirkbater iamkirkbater requested review from shivprakashmuley and swghosh and removed request for iamkirkbater November 5, 2025 14:20
- name: DEFAULT_MUST_GATHER_IMAGE
value: quay.io/openshift/origin-must-gather:latest
- name: ACM_HCP_MUST_GATHER_IMAGE
value: registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a ton about this image - should this be pointing at quay.io for OCP deployments?

@TrilokGeer @vaidehi411 @swghosh @shivprakashmuley - any thoughts here? Would love to lean on your expertise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are typically placeholder images, which get replaced with acceptable production grade images (i.e. generally registry.redhat.io ones) once building through our ART pipelines.

- name: DEFAULT_MUST_GATHER_IMAGE
value: quay.io/openshift/origin-must-gather:latest
- name: ACM_HCP_MUST_GATHER_IMAGE
value: registry.redhat.io/multicluster-engine/must-gather-rhel9:v2.8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about this image here.

@iamkirkbater
Copy link

Overall I think this looks good - I'd love to get a review from someone on the must gather team though, especially with the open questions about using the registry.redhat.io images and whether that's something that would work on all OCP environments they need to account for. @TrilokGeer @vaidehi411 @swghosh @shivprakashmuley if any of you could take a look and make sure this aligns with your expectations going forward?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

@rolandmkunkel: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/validate-boilerplate c1779a2 link false /test validate-boilerplate
ci/prow/e2e-gcp-operator c1779a2 link true /test e2e-gcp-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rolandmkunkel
Copy link
Author

We have decided to perform the must-gather locally in CAD without MGO instead, so I'm no longer going to move forward with this PR. I'm still happy to work with you in order to get this merged, if this contribution is still helpful for you.
@TrilokGeer @vaidehi411 @swghosh @shivprakashmuley

@swghosh
Copy link
Member

swghosh commented Nov 28, 2025

@rolandmkunkel thank you for your PR! We're looking into the specifics and security aspects of implementing custom must-gather images with this operator, which is still in an early evaluation phase. Likely, a custom must-gather image with scripts to collect data specific to HostedCluster(s) should help with this specific use case, IMHO. Also, given that we'd really appreciate if you or folks from HCM team be part of the feedback loop for us in this.

/cc @shivprakashmuley @iamkirkbater

@openshift-ci openshift-ci bot requested a review from iamkirkbater November 28, 2025 07:45
@swghosh
Copy link
Member

swghosh commented Jan 8, 2026

@rolandmkunkel @iamkirkbater Could either of you please help read openshift/enhancements#1906 enhancement and let us know if the specific use case being tackled through this PR can be covered through support for custom must-gather images too, thanks in advance!

cc: @TrilokGeer @shivprakashmuley

@rolandmkunkel
Copy link
Author

@swghosh Yes, I believe the enhancement covers this use case as well. As stated previously, we currently do not require this implementation at this time. I'm going to go ahead and close this PR. Thank you for following this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants