Skip to content

MG-128: Add/Update e2e tests#311

Merged
openshift-merge-bot[bot] merged 10 commits intoopenshift:masterfrom
praveencodes:MG-128
Dec 24, 2025
Merged

MG-128: Add/Update e2e tests#311
openshift-merge-bot[bot] merged 10 commits intoopenshift:masterfrom
praveencodes:MG-128

Conversation

@praveencodes
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 5, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 5, 2025

@praveencodes: This pull request references MG-128 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Add a Ginkgo-based e2e test framework: new Makefile e2e target and docs, comprehensive e2e tests and helpers, RBAC test manifests, dynamic resource loader and kube client utilities; migrate must-gather examples to a spec.uploadTarget.sftp shape; remove legacy OSDE2E artifacts and adjust dependencies.

Changes

Cohort / File(s) Summary
Makefile & e2e docs
Makefile, test/e2e/README.md
Add E2E_TIMEOUT (default 1h) and test-e2e target running go test with -timeout, -count 1, -v, -p 1, -tags e2e, plus Ginkgo flags; README updated to run tests via KUBECONFIG=... make test-e2e.
New Ginkgo e2e suite & runner
test/e2e/must_gather_operator_test.go, test/e2e/must_gather_operator_runner_test.go
Add comprehensive Ginkgo tests (RBAC, impersonation, MustGather lifecycle, Job/Pod checks) and runner with BeforeSuite, dynamic JUnit path, config setup, and package/tag switched to e2e.
Removed legacy e2e artifacts
test/e2e/must_gather_operator_tests.go, test/e2e/Dockerfile, test/e2e/e2e-template.yml, test/must-gather.yaml
Delete old OSDE2E test source, e2e Dockerfile, OpenShift job template, and a removed test manifest.
e2e test library & helpers
test/library/kube_client.go, test/library/dynamic_resources.go, test/library/utils.go
Add kubeconfig/client helpers, DynamicResourceLoader (CreateFromFile/DeleteFromFile, REST mapping, subject namespace updates), and namespace/event utilities for tests.
e2e testdata RBAC manifests
test/e2e/testdata/serviceaccount.yaml, test/e2e/testdata/serviceaccount-clusterrole.yaml, test/e2e/testdata/serviceaccount-clusterrole-binding.yaml, test/e2e/testdata/nonadmin-clusterrole.yaml, test/e2e/testdata/nonadmin-clusterrole-binding.yaml
Add ServiceAccount, ClusterRole(s), and ClusterRoleBinding(s) used by the new e2e tests.
MustGather examples & README updates
examples/*.yaml (e.g. examples/mustgather_*.yaml), README.md
Move caseID and caseManagementAccountSecretRef from top-level spec into spec.uploadTarget with type: SFTP and nested sftp block across examples; update apiVersion in one example; add proxy noProxy where applicable; README examples adjusted.
Dependency & boilerplate changes
go.mod, boilerplate/update.cfg
Remove OSDE2E-related dependencies (e.g., osde2e-common, e2e-framework entries), add github.com/stretchr/testify v1.10.0 and github.com/pmezard/go-difflib v1.0.0; remove openshift/golang-osd-e2e entry from boilerplate config.
Ignored artifacts
.gitignore
Add test/e2e/test-run-results/ to .gitignore (two locations).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@praveencodes praveencodes marked this pull request as ready for review November 6, 2025 04:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2025
@openshift-ci openshift-ci bot requested review from Tafhim and bng0y November 6, 2025 04:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/must_gather_operator_runner_test.go (1)

20-20: Update the outdated comment.

The comment still references "osde2e" but should be updated to reflect the new e2e test context.

Apply this diff:

-// Test entrypoint. osde2e runs this as a test suite on test pod.
+// Test entrypoint. e2e framework runs this as a test suite on test pod.
🧹 Nitpick comments (1)
test/e2e/must_gather_operator_tests.go (1)

882-921: Remove or uncomment the commented test cases.

Two test cases are commented out (lines 882-915 and 917-921). If these tests are not yet ready, consider using ginkgo.PIt() for pending tests or remove them entirely to improve code cleanliness.

If the tests are pending implementation:

  • Keep the ginkgo.PIt() marker on line 917 (already present)
  • Either uncomment lines 882-915 or remove them if they won't be implemented

If the tests are obsolete:

  • Remove lines 882-921 entirely
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 985c9e2 and 03a5eb7.

📒 Files selected for processing (12)
  • Makefile (1 hunks)
  • README.md (3 hunks)
  • examples/mustgather_basic.yaml (1 hunks)
  • examples/mustgather_full.yaml (1 hunks)
  • examples/mustgather_non_internal_user.yaml (1 hunks)
  • examples/mustgather_proxy.yaml (1 hunks)
  • examples/mustgather_retain_resources.yaml (1 hunks)
  • examples/mustgather_timeout.yaml (1 hunks)
  • go.mod (1 hunks)
  • test/e2e/must_gather_operator_runner_test.go (1 hunks)
  • test/e2e/must_gather_operator_tests.go (5 hunks)
  • test/must-gather.yaml (1 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile

[warning] 15-15: Missing required phony target "all"

(minphony)


[warning] 15-15: Missing required phony target "clean"

(minphony)


[warning] 15-15: Missing required phony target "test"

(minphony)

🔇 Additional comments (26)
test/e2e/must_gather_operator_runner_test.go (1)

2-4: LGTM! Build tag and package name updated correctly.

The migration from osde2e to e2e is consistent and aligns with the PR objectives.

examples/mustgather_basic.yaml (1)

8-13: LGTM! Consistent with the API restructuring.

The uploadTarget structure is consistent with other examples and test files in this PR.

examples/mustgather_timeout.yaml (1)

8-14: LGTM! Consistent restructuring while preserving other fields.

The uploadTarget structure is consistent, and the mustGatherTimeout field correctly remains at the spec level.

examples/mustgather_proxy.yaml (1)

8-17: LGTM! Consistent restructuring with proxy configuration.

The uploadTarget structure is consistent with other examples. The proxyConfig includes the standard noProxy field, which is appropriate for excluding certain hosts from proxying.

examples/mustgather_full.yaml (1)

8-13: LGTM! Consistent with the API restructuring.

The uploadTarget structure matches the pattern applied across all MustGather examples.

README.md (4)

14-20: LGTM! Documentation updated to reflect the new API structure.

The example correctly shows the uploadTarget.sftp structure consistent with the code changes.


22-22: Verify secret name reference consistency.

The text mentions caseManagementCreds secret, but the examples and the secret creation command (Line 93) use case-management-creds (with hyphens). Ensure the reference is accurate.

Apply this diff if the hyphenated form is correct:

-This request will collect the standard must-gather info and upload it to case `#02527285` using the credentials found in the `caseManagementCreds` secret.
+This request will collect the standard must-gather info and upload it to case `#02527285` using the credentials found in the `case-management-creds` secret.

35-42: LGTM! Full example documentation updated correctly.

The example-mustgather-full documentation reflects the new uploadTarget structure and correctly shows the audit field configuration.


56-66: LGTM! Proxy example documentation updated correctly.

The proxy configuration example correctly shows both the new uploadTarget structure and the proxy settings.

go.mod (1)

16-16: No action required—dependency promotion is justified.

The codebase directly imports k8s.io/apiextensions-apiserver in test/e2e/must_gather_operator_tests.go, confirming that promoting this dependency from indirect to direct in go.mod is appropriate.

examples/mustgather_non_internal_user.yaml (1)

8-14: Consistent API restructuring.

The uploadTarget.sftp structure is consistent with the changes in other example files. The internalUser field has been moved under the sftp block, which maintains logical grouping of SFTP-related configuration.

test/e2e/must_gather_operator_tests.go (11)

1-5: LGTM! Proper e2e test build tags.

The build tags and package name correctly isolate these e2e tests from unit tests, allowing selective execution via the -tags e2e build flag.


94-164: LGTM! Comprehensive permission validation.

This test thoroughly validates that the required service accounts exist and have the necessary permissions for job and secret management. The granular checks help catch RBAC misconfigurations early.


166-223: LGTM! Well-structured job creation test.

The test properly verifies the MustGather-to-Job lifecycle, including service account propagation, container configuration, and cleanup behavior.


225-261: LGTM! Good negative test case.

Testing failure scenarios with non-existent service accounts validates the operator's error handling and helps ensure graceful degradation.


263-356: LGTM! Comprehensive lifecycle test.

This test validates the complete MustGather workflow from creation through status updates to completion, with appropriate status logging for debugging.


358-400: LGTM! Thorough permission verification.

The SubjectAccessReview-based permission checks provide concrete validation of RBAC configuration without relying on actual resource operations.


402-623: LGTM! Thorough node log collection test.

This test comprehensively validates node log collection capabilities, including permission checks, pod lifecycle, and event monitoring for errors.


625-880: LGTM! Comprehensive CRD collection test.

The test thoroughly validates CRD and custom resource collection, including creating test CRs, verifying permissions, and ensuring non-destructive collection.


924-944: LGTM! Useful helper functions.

Both helper functions are well-implemented:

  • getNodeRoles() provides clean role extraction from node labels
  • boolPtr() is a common utility for converting bool to pointer

24-24: Dependency correctly declared as direct.

The verification confirms that k8s.io/apiextensions-apiserver v0.33.0 is properly declared as a direct dependency in go.mod (no // indirect marker present). The import of apiextensionsv1 in the test file is properly supported by the dependency declaration.


176-177: No issues found — the ServiceAccountName field is correctly defined and used.

The verification confirms that ServiceAccountName exists in the MustGatherSpec struct (api/v1alpha1/mustgather_types.go, line 32) and the test code at lines 176-177 uses it with the correct Go field name syntax.

Makefile (2)

11-14: LGTM! Reasonable e2e test configuration.

The 1-hour timeout is appropriate for end-to-end tests, and the platform-specific label filter allows selective test execution based on cloud provider.


16-26: LGTM! Properly configured e2e test target.

The test target correctly:

  • Uses the e2e build tag to select only e2e tests
  • Sets appropriate timeout and verbosity flags
  • Enables node event visibility for debugging
  • Runs tests serially (-p 1) to avoid resource conflicts
  • Uses -count 1 to disable test caching
examples/mustgather_retain_resources.yaml (2)

8-13: YAML configuration correctly aligns with the API definition.

The nested uploadTarget.sftp structure is properly defined in the CRD schema:

  • UploadTarget field in MustGatherSpec is typed as *UploadTargetSpec
  • UploadTargetSpec contains Type field (required, enum SFTP) and SFTP field (*SFTPSpec)
  • SFTPSpec defines CaseID (required string) and CaseManagementAccountSecretRef (required), both present in the YAML configuration

All required fields are provided and use correct field names matching the JSON tags.


1-1: The review comment is incorrect.

The verification shows that the codebase contains no references to managed.openshift.io/v1alpha1 anywhere. All examples, tests, and documentation already use operator.openshift.io/v1alpha1 throughout. The file examples/mustgather_retain_resources.yaml is already in the correct state with the new API version on line 1. The migration from the old API version is already complete—there is no actual breaking change to address in the current codebase state.

Likely an incorrect or invalid review comment.

Comment on lines 9 to 14
uploadTarget:
type: SFTP
sftp:
caseID: '02527285'
caseManagementAccountSecretRef:
name: case-management-creds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify CRD schema and controller support for the new uploadTarget structure.

This structural change moves authentication fields under uploadTarget.sftp. Ensure that:

  1. The CRD schema has been updated to support this new nested structure
  2. The controller/reconciler code has been updated to read from uploadTarget.sftp instead of the top-level spec fields
  3. There's a migration strategy for existing MustGather resources using the old structure

Run the following script to verify the implementation:


🏁 Script executed:

#!/bin/bash
# Description: Verify CRD schema and controller support for uploadTarget structure

# Check CRD for uploadTarget definition
echo "=== Checking CRD schema ==="
fd -e yaml -e yml 'crd|mustgather' --exec cat {} | rg -A 20 'uploadTarget|sftp' | head -50

# Check controller/reconciler for uploadTarget usage
echo -e "\n=== Checking controller implementation ==="
rg -n --type=go 'uploadTarget|UploadTarget' -g '!vendor/' -C 3

Length of output: 20995


Verify whether the API restructuring provides a backward-compatible migration path for existing MustGather resources.

The review comment correctly identifies a backward-compatibility issue. Moving a field in the API schema is a breaking change, and this PR restructures the upload credentials from top-level spec fields into a nested uploadTarget.sftp structure. Based on the script output:

  • The controller reads from instance.Spec.UploadTarget.SFTP (lines 159-160 in mustgather_controller.go), not from deprecated top-level fields
  • No conversion logic or backward-compatibility code is present to handle existing resources using the old flat structure
  • The UploadTarget field is optional, so new resources work, but this does NOT help existing MustGather resources that currently have credentials at the top level

While v1alpha1 is pre-release API, if deployed instances exist in the wild, this change will silently break them. The PR should either:

  1. Include controller logic to read from both old top-level fields and new nested structure temporarily
  2. Document this as a breaking change requiring manual resource migration
  3. Add conversion webhooks if supporting multiple versions

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.98%. Comparing base (7d9c76d) to head (8791f83).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #311   +/-   ##
=======================================
  Coverage   70.98%   70.98%           
=======================================
  Files           7        7           
  Lines         479      479           
=======================================
  Hits          340      340           
  Misses        136      136           
  Partials        3        3           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
test/e2e/must_gather_operator_tests.go (2)

624-626: Replace hardcoded sleep with polling.

The 30-second sleep makes the test unnecessarily slow and may still be insufficient in some environments. Poll the container state instead.

Apply this diff:

-		ginkgo.By("Verifying gather container is collecting data")
-		time.Sleep(30 * time.Second) // Allow time for data collection
-
+		ginkgo.By("Verifying gather container is collecting data")
+		Eventually(func() bool {
+			pod := &corev1.Pod{}
+			err := nonAdminClient.Get(testCtx, client.ObjectKey{
+				Name:      gatherPod.Name,
+				Namespace: testNamespace,
+			}, pod)
+			if err != nil {
+				return false
+			}
+			for _, cs := range pod.Status.ContainerStatuses {
+				if cs.Name == "gather" && (cs.State.Running != nil || cs.State.Terminated != nil) {
+					return true
+				}
+			}
+			return false
+		}).WithTimeout(2*time.Minute).WithPolling(5*time.Second).Should(BeTrue(),
+			"Gather container should be running or terminated")
+

797-798: Replace hardcoded sleep with polling for events.

The 45-second sleep unnecessarily slows down the test. Poll for relevant events or container status instead.

Apply this diff:

-		ginkgo.By("Checking for permission denied errors in events")
-		time.Sleep(45 * time.Second) // Allow time for potential errors
-
+		ginkgo.By("Checking for permission denied errors in events")
+		// Wait for container to start and potentially encounter permission issues
+		Eventually(func() corev1.PodPhase {
+			pod := &corev1.Pod{}
+			_ = adminClient.Get(testCtx, client.ObjectKey{
+				Name:      gatherPod.Name,
+				Namespace: testNamespace,
+			}, pod)
+			return pod.Status.Phase
+		}).WithTimeout(2*time.Minute).WithPolling(5*time.Second).Should(
+			Or(Equal(corev1.PodRunning), Equal(corev1.PodSucceeded), Equal(corev1.PodFailed)),
+			"Pod should reach a terminal or running state")
+
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 03a5eb7 and 7f1f989.

📒 Files selected for processing (1)
  • test/e2e/must_gather_operator_tests.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/must_gather_operator_tests.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7f1f989 and e58eb28.

📒 Files selected for processing (1)
  • test/e2e/must_gather_operator_tests.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/must_gather_operator_tests.go
🔇 Additional comments (2)
test/e2e/must_gather_operator_tests.go (2)

51-961: E2E suite structure and coverage look strong

The ordered suite, context breakdown (admin setup, non-admin flows, SA permissions, security, cleanup), and mix of SAR-based checks plus impersonated controller-runtime clients give you very thorough coverage of the RBAC and MustGather lifecycle from both admin and non-admin perspectives. Assertions and Eventually/Consistently polling patterns are appropriate for e2e and should keep flakiness manageable.


963-1017: Helper orchestration and cleanup utilities are well factored

The helper functions (createNonAdminClient, verifyOperatorDeployment, createTestNamespace, cleanupTestResources) keep the core specs readable and centralize cluster mutations. Labels like test: "non-admin-must-gather" and the final foreground namespace/clusterrole/clusterrolebinding deletes make it much easier to reason about test isolation and to clean up after runs.

Also applies to: 1237-1260

Comment on lines 1019 to 1070
func createNonAdminClusterRole() {
cr := &rbacv1.ClusterRole{
ObjectMeta: metav1.ObjectMeta{
Name: nonAdminCRRoleName,
Labels: map[string]string{
"test": "non-admin-must-gather",
},
},
Rules: []rbacv1.PolicyRule{
{
APIGroups: []string{"operator.openshift.io"},
Resources: []string{"mustgathers"},
Verbs: []string{"*"},
},
{
APIGroups: []string{"operator.openshift.io"},
Resources: []string{"mustgathers/status"},
Verbs: []string{"get", "list", "watch"},
},
{
APIGroups: []string{"batch"},
Resources: []string{"jobs"},
Verbs: []string{"get", "list", "watch"},
},
{
APIGroups: []string{""},
Resources: []string{"pods"},
Verbs: []string{"get", "list", "watch"},
},
{
APIGroups: []string{""},
Resources: []string{"pods/log"},
Verbs: []string{"get", "list"},
},
{
APIGroups: []string{""},
Resources: []string{"events"},
Verbs: []string{"get", "list"},
},
},
}

err := adminClient.Create(testCtx, cr)
if apierrors.IsAlreadyExists(err) {
// Update if already exists
_ = adminClient.Update(testCtx, cr)
} else {
Expect(err).NotTo(HaveOccurred(), "Failed to create ClusterRole for non-admin")
}

ginkgo.GinkgoWriter.Printf("✅ Created ClusterRole: %s\n", nonAdminCRRoleName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Fix RBAC Create/Update logic to be truly idempotent

In createNonAdminClusterRole, createNonAdminClusterRoleBinding, and createSARBACForMustGather, the pattern:

  • Create(...)
  • if apierrors.IsAlreadyExists(err) { _ = adminClient.Update(ctx, obj) }

is problematic. When Create returns AlreadyExists, the in-memory object you pass to Update has no resourceVersion, so Update will typically fail server-side, and that error is ignored. This means re-running the tests against a cluster where these RBAC objects already exist will not reliably reconcile them to the desired spec, which can lead to confusing, non‑reproducible behavior.

A safer pattern is:

  • On AlreadyExists, Get the existing resource, mutate its fields to match the desired spec (labels, rules, subjects, etc.), then Update that existing object and assert on the error.

Conceptual example for one of them:

err := adminClient.Create(testCtx, cr)
if apierrors.IsAlreadyExists(err) {
    existing := &rbacv1.ClusterRole{}
    err = adminClient.Get(testCtx, client.ObjectKey{Name: cr.Name}, existing)
    Expect(err).NotTo(HaveOccurred(), "Failed to get existing ClusterRole")

    existing.Labels = cr.Labels
    existing.Rules = cr.Rules

    err = adminClient.Update(testCtx, existing)
}
Expect(err).NotTo(HaveOccurred(), "Failed to ensure ClusterRole for non-admin")

Applying this pattern to all three helpers will make the RBAC setup idempotent and more predictable across repeated e2e runs.

Please double-check the Update behavior for your controller-runtime/k8s versions, but in general Kubernetes requires a populated metadata.resourceVersion on updates, which the newly-constructed objects here won’t have after a failed Create.

Also applies to: 1072-1103, 1126-1193


I'll verify this review by examining the actual code patterns and confirming the Kubernetes/controller-runtime behavior.


🏁 Script executed:

#!/bin/bash
# Examine the three functions mentioned in the review
rg -n "func createNonAdminClusterRole|func createNonAdminClusterRoleBinding|func createSARBACForMustGather" test/e2e/must_gather_operator_tests.go -A 60

Length of output: 5119


🌐 Web query:

controller-runtime Update resourceVersion requirement Kubernetes

💡 Result:

Short answer: Kubernetes uses metadata.resourceVersion for optimistic concurrency; Update requests must include the object's current resourceVersion (so the API can reject stale updates). controller-runtime's client.Update sends whatever resourceVersion is on the object you pass, so you must update an object that carries the latest resourceVersion (or handle 409 conflicts by retrying). Use Get → modify → Update, or use RetryOnConflict + controllerutil.CreateOrUpdate, or switch to Patch/Server-Side-Apply when appropriate. [1][2][3]

Helpful references:

  • explanation of resourceVersion and optimistic locking. [1]
  • note that Update requires a resourceVersion from a current object and will be rejected on mismatch. [2]
  • common pattern: RetryOnConflict / CreateOrUpdate to handle "the object has been modified" errors. [3]

Fix RBAC Create/Update pattern in three test setup functions—missing resourceVersion causes silent failures

In createNonAdminClusterRole() (line 1061–1067), createNonAdminClusterRoleBinding() (line 1094–1099), and createSARBACForMustGather() (line 1156–1161 and 1185–1186), the pattern of calling Create() followed by _ = adminClient.Update(ctx, obj) on AlreadyExists error is broken. The newly constructed objects lack metadata.resourceVersion, which Kubernetes requires for updates. The server-side update will fail, but the error is silently ignored (via _ blank assignment), causing the functions to appear successful while the objects remain unreconciled to the desired spec. This breaks idempotency across repeated test runs.

Fix by retrieving the existing resource before update:

err := adminClient.Create(testCtx, cr)
if apierrors.IsAlreadyExists(err) {
    existing := &rbacv1.ClusterRole{}
    err = adminClient.Get(testCtx, client.ObjectKey{Name: cr.Name}, existing)
    Expect(err).NotTo(HaveOccurred(), "Failed to get existing ClusterRole")

    existing.Labels = cr.Labels
    existing.Rules = cr.Rules

    err = adminClient.Update(testCtx, existing)
}
Expect(err).NotTo(HaveOccurred(), "Failed to ensure ClusterRole for non-admin")

Apply this pattern to all three helpers.

🤖 Prompt for AI Agents
In test/e2e/must_gather_operator_tests.go around lines 1019–1070 (and similarly
the helpers at ~1094–1099 and ~1156–1186), the Create-if-exists Update pattern
is broken because the newly constructed objects lack metadata.resourceVersion,
so Update silently fails; fix by: after Create returns AlreadyExists, call
adminClient.Get to load the existing resource into a local variable, copy/merge
the desired fields (Labels, Rules, Subjects, RoleRef, etc.) from the desired
object into that existing object, then call adminClient.Update on the fetched
object and assert the error (Expect(err).NotTo(HaveOccurred(...))). Apply this
same Get-merge-Update pattern to createNonAdminClusterRole(),
createNonAdminClusterRoleBinding(), and createSARBACForMustGather() so updates
succeed and are idempotent across runs.

@praveencodes
Copy link
Contributor Author

/cc @swghosh @shivprakashmuley
As I removed the direct dependency github.com/openshift/osde2e-common and some indirect dependencies, I had to run go mod vendor and push the files in vendor directory. The verify-deps check is successful after that.

@swghosh
Copy link
Member

swghosh commented Nov 25, 2025

/test e2e-gcp-operator

@shivprakashmuley
Copy link
Contributor

@praveencodes can you please check e2e--gcp-operator job failures.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/must_gather_operator_runner_test.go (1)

1-27: Create the JUnit results directory and align its path with .gitignore

Right now:

  • testResultsDirectory is "./test-run-results", but .gitignore ignores test/e2e/test-run-results/, and
  • There is no os.MkdirAll call before Ginkgo tries to write the JUnit report.

This can lead to flaky/broken e2e runs (if the directory doesn’t exist) and to unignored artifacts in git status.

Consider something along these lines:

-const (
-	testResultsDirectory = "./test-run-results"
+const (
+	testResultsDirectory = "test/e2e/test-run-results"
@@
 func TestMustGatherOperator(t *testing.T) {
@@
-	if _, ok := os.LookupEnv("DISABLE_JUNIT_REPORT"); !ok {
-		reporterConfig.JUnitReport = filepath.Join(testResultsDirectory, jUnitOutputFilename)
-	}
+	if _, ok := os.LookupEnv("DISABLE_JUNIT_REPORT"); !ok {
+		if err := os.MkdirAll(testResultsDirectory, 0755); err != nil {
+			t.Fatalf("failed to create test results directory %q: %v", testResultsDirectory, err)
+		}
+		reporterConfig.JUnitReport = filepath.Join(testResultsDirectory, jUnitOutputFilename)
+	}

This keeps artifacts under test/e2e/, makes the .gitignore entry accurate, and ensures JUnit report generation doesn’t fail due to a missing directory.

🧹 Nitpick comments (1)
.gitignore (1)

90-90: Align ignored test results path with the runner’s output directory

This ignores test/e2e/test-run-results/, but the e2e runner currently writes JUnit results under ./test-run-results (see testResultsDirectory in test/e2e/must_gather_operator_runner_test.go), so those artifacts will still show up in git status. I’d either:

  • Move the output under test/e2e/test-run-results in the runner (keeps artifacts near the tests and makes this pattern correct), or
  • Change this entry to ignore the actual output path (e.g., test-run-results/ at repo root).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c8caf91 and c88241c.

📒 Files selected for processing (7)
  • .gitignore (1 hunks)
  • Makefile (1 hunks)
  • boilerplate/update.cfg (0 hunks)
  • test/e2e/Dockerfile (0 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/e2e-template.yml (0 hunks)
  • test/e2e/must_gather_operator_runner_test.go (2 hunks)
💤 Files with no reviewable changes (3)
  • boilerplate/update.cfg
  • test/e2e/e2e-template.yml
  • test/e2e/Dockerfile
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/README.md
  • test/e2e/must_gather_operator_runner_test.go
  • Makefile
🪛 checkmake (0.2.2)
Makefile

[warning] 12-12: Missing required phony target "all"

(minphony)


[warning] 12-12: Missing required phony target "clean"

(minphony)


[warning] 12-12: Missing required phony target "test"

(minphony)

🪛 LanguageTool
test/e2e/README.md

[style] ~3-~3: Consider a more expressive alternative.
Context: ...ty is not breaking using e2e tests. To do this, following steps are recommended: ...

(DO_ACHIEVE)

🔇 Additional comments (2)
Makefile (1)

10-22: E2E test target wiring looks solid

The test-e2e target (timeout, single run, -p 1, -tags e2e, and Ginkgo flags after ./test/e2e) is correctly structured and should integrate cleanly with the new e2e suite.

test/e2e/README.md (1)

3-22: Docs correctly reflect the new e2e workflow

The updated steps (KUBECONFIG=... make test-e2e and optional DISABLE_JUNIT_REPORT) line up with the Makefile target and the test runner behavior; nothing concerning here.

@praveencodes
Copy link
Contributor Author

@praveencodes can you please check e2e--gcp-operator job failures.

Fixed the JUnit report generation issue.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 3, 2025
Comment on lines 77 to 89
ginkgo.By("STEP 2: Admin creates test namespace")
createTestNamespace()

ginkgo.By("STEP 3: Admin defines privileges for MustGather CRs")
createNonAdminClusterRole()

ginkgo.By("STEP 4: Admin binds the non-admin user to the ClusterRole")
createNonAdminClusterRoleBinding()

ginkgo.By("STEP 5: Admin provides a ServiceAccount for execution")
createServiceAccount()
createSARBACForMustGather()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason for using functions here? couldn't find repeated used of those functions elsewhere / do we plan to reuse these later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are called only once in the BeforeAll hook in Describe block. I used it for better readability and isolation.

})
})

ginkgo.Context("Non-Admin User MustGather Collection Flow", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ginkgo.Context("Non-Admin User MustGather Collection Flow", func() {
ginkgo.Context("Non-Admin User", func() {

}
})

ginkgo.It("non-admin user should successfully submit MustGather CR with SA", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ginkgo.It("non-admin user should successfully submit MustGather CR with SA", func() {
ginkgo.It("should successfully create MustGather CR", func() {

Resources: []string{"jobs", "cronjobs"},
Verbs: []string{"get", "list"},
},
// Add more resources as needed for must-gather
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good catch, we need a li'l more that this to actually run a full gather. refer: #264

ginkgo.By("Waiting for operator to create Job (non-admin user perspective)")
job := &batchv1.Job{}
Eventually(func() error {
err := nonAdminClient.Get(testCtx, client.ObjectKey{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's little questionable choice if we wanna let the non-admin user get the pod and the jobs, for a test we can however user specific scenario can vary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a real cluster admin would be hesitant to share the pod logs with the non-admin user atleast, considering the Job pod will have sensitive info cluster-wide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per this document, a non-admin user can get/list/watch the job and pod logs. Hence, added those tests.

For the non-admin user:
Must be able to create/get/list/watch MustGather CRs in the target namespace.
Likely needs get/list/watch on Jobs/Pods (and logs) in that namespace to monitor progress and fetch artifacts.

fetchedMG.Status.Status, fetchedMG.Status.Completed, fetchedMG.Status.Reason)
})

ginkgo.It("non-admin user should be able to monitor MustGather progress", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead we should look at improving the progress status before GA and let the non-admin user poll the conditions in the MustGather CR to get aware about the running state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. @shivprakashmuley What's your suggestion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we are trying to achieve as part of this test non-admin user should be able to monitor MustGather progress is non-admin should be able to see the CR status, if we want to simulate non-admin monitoring what is happening w.r.t. must-gather collection then we can poll conditions. But then again we don't update anything in conditions other than ReconcileSuccess.

For now we can rename this testcase to non-admin user should be able to get MustGather status and add in future add another testcase to monitor step by step must-gather collection through conditions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (13)
test/e2e/testdata/nonadmin-clusterrole-binding.yaml (1)

4-4: Consider consistent naming for clarity.

The binding name must-gather-cr-binding doesn't follow the "nonadmin" naming pattern used in the ClusterRole (must-gather-nonadmin-clusterrole) and subject (must-gather-nonadmin-user). For consistency and clarity, consider renaming to must-gather-nonadmin-cr-binding.

Apply this diff:

-  name: must-gather-cr-binding
+  name: must-gather-nonadmin-cr-binding
test/library/utils.go (3)

35-37: Use consistent context throughout the function.

The function uses context.TODO() for polling but context.Background() for the namespace creation call. Consider using the same context throughout, or preferably use d.context (the context field stored in DynamicResourceLoader) for consistency and proper cancellation support.

Apply this diff:

-	if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
+	if err := wait.PollUntilContextTimeout(d.context, 1*time.Second, 30*time.Second, true, func(ctx context.Context) (bool, error) {
 		var err error
-		got, err = d.KubeClient.CoreV1().Namespaces().Create(context.Background(), namespace, metav1.CreateOptions{})
+		got, err = d.KubeClient.CoreV1().Namespaces().Create(ctx, namespace, metav1.CreateOptions{})

60-63: Use consistent context throughout the function.

Similar to CreateTestingNS, this function uses context.TODO() for polling but context.Background() for API calls. Consider using d.context consistently.

Apply this diff:

-	if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
+	if err := wait.PollUntilContextTimeout(d.context, 1*time.Second, 30*time.Second, true, func(ctx context.Context) (bool, error) {
 		// Poll until namespace is deleted
-		_, err := d.KubeClient.CoreV1().Namespaces().Get(ctx, name, metav1.GetOptions{})
+		_, err := d.KubeClient.CoreV1().Namespaces().Get(d.context, name, metav1.GetOptions{})

76-76: Use consistent context.

Use d.context instead of context.TODO() for consistency with the struct's context field.

Apply this diff:

-	events, err := d.KubeClient.CoreV1().Events(name).List(context.TODO(), metav1.ListOptions{})
+	events, err := d.KubeClient.CoreV1().Events(name).List(d.context, metav1.ListOptions{})
test/library/kube_client.go (2)

17-29: Consider simplifying error handling.

Since GetConfigForTest already calls require.NoError (line 39), the error check and require on line 23 is redundant. The test will abort in GetConfigForTest if there's an error. You could simplify this by removing the intermediate error variable and require call.

Apply this diff:

 func NewClientsConfigForTest(t *testing.T) (kubernetes.Interface, dynamic.Interface) {
 	config, err := GetConfigForTest(t)
-	if err == nil {
-		t.Logf("Found configuration for host %v.\n", config.Host)
-	}
-
-	require.NoError(t, err)
+	// GetConfigForTest already requires no error, so config is always valid here
+	t.Logf("Found configuration for host %v.\n", config.Host)
+	
 	kubeClient, err := kubernetes.NewForConfig(config)
 	require.NoError(t, err)

39-40: Simplify return after require.NoError.

Since require.NoError will abort the test on error, returning err is redundant. Consider returning just config, nil.

Apply this diff:

 	require.NoError(t, err)
-	return config, err
+	return config, nil
test/e2e/must_gather_operator_runner_test.go (1)

63-63: Consider aligning suite name with project name.

The suite name "support-log-gather e2e suite" doesn't clearly indicate this is the must-gather-operator e2e suite. Consider using a name that matches the project for clarity in test reports, such as "must-gather-operator e2e suite".

Apply this diff:

-	RunSpecs(t, "support-log-gather e2e suite", suiteConfig, reporterConfig)
+	RunSpecs(t, "must-gather-operator e2e suite", suiteConfig, reporterConfig)
test/library/dynamic_resources.go (3)

77-77: Silently ignoring error from SetNestedSlice.

The error from unstructured.SetNestedSlice is being discarded. While failures here are rare, silently ignoring the error could mask issues during test debugging.

Consider logging if the operation fails:

-	_ = unstructured.SetNestedSlice(obj.Object, subjects, "subjects")
+	if err := unstructured.SetNestedSlice(obj.Object, subjects, "subjects"); err != nil {
+		// Log but don't fail - this is a best-effort update
+		// The test will likely fail later with a more specific error
+	}

122-131: Inconsistent context usage in delete operation.

Line 125 uses context.TODO() instead of d.context which is stored in the struct. This inconsistency could cause issues if context cancellation is expected to propagate.

 func (d DynamicResourceLoader) DeleteFromFile(assetFunc func(name string) ([]byte, error), filename string, overrideNamespace string) {
 	d.t.Logf("Deleting resource %v\n", filename)
 	deleteFunc := func(t *testing.T, unstructured *unstructured.Unstructured, dynamicResourceInterface dynamic.ResourceInterface) {
-		err := dynamicResourceInterface.Delete(context.TODO(), unstructured.GetName(), metav1.DeleteOptions{})
+		err := dynamicResourceInterface.Delete(d.context, unstructured.GetName(), metav1.DeleteOptions{})
 		d.noErrorSkipNotExisting(err)
 	}

133-142: Same inconsistent context usage in create operation.

Line 136 uses context.TODO() instead of d.context. Apply the same fix for consistency.

 func (d DynamicResourceLoader) CreateFromFile(assetFunc func(name string) ([]byte, error), filename string, overrideNamespace string) {
 	d.t.Logf("Creating resource %v\n", filename)
 	createFunc := func(t *testing.T, unstructured *unstructured.Unstructured, dynamicResourceInterface dynamic.ResourceInterface) {
-		_, err := dynamicResourceInterface.Create(context.TODO(), unstructured, metav1.CreateOptions{})
+		_, err := dynamicResourceInterface.Create(d.context, unstructured, metav1.CreateOptions{})
 		d.noErrorSkipExists(err)
 	}
test/e2e/must_gather_operator_test.go (3)

138-153: Potential test flakiness: AfterEach cleanup may timeout if CR deletion is slow.

The Eventually block waits up to 2 minutes for CR deletion, but if the operator is under load or the deletion cascades through many resources, this could cause intermittent failures. Consider whether 2 minutes is sufficient for all scenarios, especially when Jobs and Pods need cascading deletion.

The timeout appears reasonable for most cases, but consider adding a brief log message if cleanup takes longer than expected to aid debugging flaky tests.


547-552: String matching for security validation is fragile.

Checking for "forbidden" and "denied" substrings in event messages is locale-dependent and may miss variations. This approach could produce false negatives if the message format changes or uses different wording.

Consider checking the event Reason field against known security-related reasons (e.g., FailedCreate, FailedScheduling) rather than substring matching on messages, which is more stable across Kubernetes versions.


697-716: Helper uses nonAdminClient directly, coupling it to package-level state.

The createMustGatherCR function uses the package-level nonAdminClient variable directly rather than accepting it as a parameter. This reduces flexibility and makes the function harder to test in isolation or reuse with different clients.

Consider passing the client as a parameter for better flexibility:

-func createMustGatherCR(name, namespace, serviceAccountName string, retainResources bool) *mustgatherv1alpha1.MustGather {
+func createMustGatherCR(c client.Client, name, namespace, serviceAccountName string, retainResources bool) *mustgatherv1alpha1.MustGather {
 	mg := &mustgatherv1alpha1.MustGather{
 		// ...
 	}
 
-	err := nonAdminClient.Create(testCtx, mg)
+	err := c.Create(testCtx, mg)
 	Expect(err).NotTo(HaveOccurred(), "Failed to create MustGather CR")
 
 	return mg
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef1b31 and d552982.

📒 Files selected for processing (12)
  • go.mod (2 hunks)
  • test/e2e/must_gather_operator_runner_test.go (1 hunks)
  • test/e2e/must_gather_operator_test.go (1 hunks)
  • test/e2e/must_gather_operator_tests.go (0 hunks)
  • test/e2e/testdata/nonadmin-clusterrole-binding.yaml (1 hunks)
  • test/e2e/testdata/nonadmin-clusterrole.yaml (1 hunks)
  • test/e2e/testdata/serviceaccount-clusterrole-binding.yaml (1 hunks)
  • test/e2e/testdata/serviceaccount-clusterrole.yaml (1 hunks)
  • test/e2e/testdata/serviceaccount.yaml (1 hunks)
  • test/library/dynamic_resources.go (1 hunks)
  • test/library/kube_client.go (1 hunks)
  • test/library/utils.go (1 hunks)
💤 Files with no reviewable changes (1)
  • test/e2e/must_gather_operator_tests.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/testdata/nonadmin-clusterrole-binding.yaml
  • go.mod
  • test/e2e/testdata/serviceaccount-clusterrole.yaml
  • test/library/dynamic_resources.go
  • test/library/utils.go
  • test/library/kube_client.go
  • test/e2e/must_gather_operator_test.go
  • test/e2e/must_gather_operator_runner_test.go
  • test/e2e/testdata/serviceaccount-clusterrole-binding.yaml
  • test/e2e/testdata/serviceaccount.yaml
  • test/e2e/testdata/nonadmin-clusterrole.yaml
🧬 Code graph analysis (4)
test/library/dynamic_resources.go (2)
test/library/kube_client.go (1)
  • NewClientsConfigForTest (17-29)
version/version.go (1)
  • Version (4-4)
test/library/utils.go (1)
test/library/dynamic_resources.go (1)
  • DynamicResourceLoader (24-30)
test/e2e/must_gather_operator_test.go (2)
api/v1alpha1/groupversion_info.go (1)
  • AddToScheme (35-35)
api/v1alpha1/mustgather_types.go (3)
  • MustGather (193-199)
  • MustGatherSpec (28-65)
  • MustGatherList (204-208)
test/e2e/must_gather_operator_runner_test.go (1)
test/library/dynamic_resources.go (2)
  • DynamicResourceLoader (24-30)
  • NewDynamicResourceLoader (34-42)
🪛 Checkov (3.2.334)
test/e2e/testdata/nonadmin-clusterrole.yaml

[medium] 1-25: Minimize wildcard use in Roles and ClusterRoles

(CKV_K8S_49)

🔇 Additional comments (12)
go.mod (1)

14-14: Addition of testify for e2e test assertions looks good.

The addition of github.com/stretchr/testify v1.10.0 is appropriate for Go testing and aligns with the PR's shift to Ginkgo-based e2e testing. The version is current and well-maintained. The accompanying indirect dependency github.com/pmezard/go-difflib (line 61) is a transitive dependency from testify and is expected. Verification confirms that removed dependencies (osde2e-common and e2e-framework) are completely removed from the codebase with no lingering imports.

test/e2e/testdata/serviceaccount.yaml (1)

1-8: LGTM!

The ServiceAccount manifest is correctly structured for dynamic e2e test scenarios with runtime namespace assignment.

test/e2e/testdata/serviceaccount-clusterrole.yaml (1)

1-31: LGTM!

The ClusterRole correctly defines read-only permissions appropriate for non-admin e2e test scenarios. The resource and verb combinations are suitable for must-gather operations.

test/e2e/testdata/nonadmin-clusterrole.yaml (1)

8-10: Verify the need for wildcard permissions on mustgathers.

The ClusterRole grants wildcard verbs ["*"] on mustgathers resources, providing full control (create, update, delete, patch, etc.). While this may be intentional for comprehensive e2e testing, consider whether specific verbs (e.g., ["get", "list", "watch", "create", "delete"]) would be sufficient. Explicitly listing required verbs follows the principle of least privilege and reduces potential security risks.

Based on static analysis hints (CKV_K8S_49).

If the wildcard is necessary for test coverage, consider adding a comment explaining why full permissions are required.

test/library/kube_client.go (1)

33-33: InsecureSkipTLSVerify is acceptable for e2e tests.

Setting InsecureSkipTLSVerify: true is appropriate for test environments where certificate validation may not be configured. This is a common pattern in e2e testing.

test/e2e/must_gather_operator_runner_test.go (2)

33-40: LGTM!

The test directory resolution correctly handles both CI and local development environments.


42-53: LGTM!

The cluster name extraction is simple and effective for test logging. Note that the returned name includes the port (e.g., api.cluster.example.com:6443), which is acceptable for identification purposes.

test/library/dynamic_resources.go (1)

24-42: LGTM - Clean struct design and constructor.

The DynamicResourceLoader struct properly encapsulates the required clients and test context. The constructor delegates client creation to NewClientsConfigForTest, maintaining separation of concerns.

test/e2e/must_gather_operator_test.go (4)

661-673: LGTM - Clean impersonation client creation.

The helper properly creates an impersonated client by copying the admin config and setting the Impersonate field. Good separation of concerns.


675-695: LGTM - Operator deployment verification is thorough.

The function validates both namespace existence and deployment readiness, with clear logging for debugging.


52-60: No action needed. The variables loader and ns are properly declared as package-level variables in test/e2e/must_gather_operator_runner_test.go and are correctly accessible throughout the package, including in test/e2e/must_gather_operator_test.go. This is standard Go package-scope behavior and will not cause compilation errors.


461-485: The test assertion is correct. When a Job references a non-existent ServiceAccount, Kubernetes deterministically prevents pod creation, leaving job.Status.Active at 0. The Consistently() check over 30 seconds appropriately verifies this behavior. The test properly validates that Job creation succeeds while pod scheduling fails due to the missing ServiceAccount.

Comment on lines +11 to +13
subjects:
- kind: ServiceAccount
name: must-gather-serviceaccount
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add required namespace field to ServiceAccount subject.

ServiceAccount subjects in ClusterRoleBinding must specify a namespace. Without it, the binding will fail to apply.

Apply this diff to fix:

 subjects:
   - kind: ServiceAccount
     name: must-gather-serviceaccount
+    namespace: <namespace-name>

Note: Based on test/e2e/testdata/serviceaccount.yaml line 5, the namespace is set at runtime. Ensure consistency in how you apply both manifests.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
subjects:
- kind: ServiceAccount
name: must-gather-serviceaccount
subjects:
- kind: ServiceAccount
name: must-gather-serviceaccount
namespace: <namespace-name>
🤖 Prompt for AI Agents
In test/e2e/testdata/serviceaccount-clusterrole-binding.yaml around lines 11-13,
the ClusterRoleBinding subject for the ServiceAccount is missing the required
namespace field; add a namespace entry to the subject (e.g., namespace:
<must-gather-namespace>) so the subject becomes fully qualified, and ensure the
value matches the namespace used at runtime in
test/e2e/testdata/serviceaccount.yaml (line 5) or the template/variable used to
inject that namespace.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
test/library/kube_client.go (2)

29-38: Redundant error return and duplicate logging.

GetConfigForTest calls require.NoError(t, err) at line 37, which means the function will never return a non-nil error (it will fail the test instead). The returned err at line 38 is always nil, making the error return value misleading.

Additionally, the host is logged here at line 34, and also in NewClientsConfigForTest at line 18, resulting in duplicate log messages.

Consider either:

  1. Removing the error return value since it's always nil after require.NoError, or
  2. Not using require.NoError and letting callers handle the error.
 func GetConfigForTest(t TestingT) (*rest.Config, error) {
 	loader := clientcmd.NewDefaultClientConfigLoadingRules()
 	clientConfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loader, &clientcmd.ConfigOverrides{ClusterInfo: api.Cluster{InsecureSkipTLSVerify: true}})
 	config, err := clientConfig.ClientConfig()
-	if err == nil {
-		t.Logf("Found configuration for host %v.\n", config.Host)
-	}
-
 	require.NoError(t, err)
+	t.Logf("Found configuration for host %v.\n", config.Host)
 	return config, err
 }

15-27: Duplicate host logging with GetConfigForTest.

Both GetConfigForTest (line 34) and NewClientsConfigForTest (line 18) log the host information, resulting in duplicate log output. Since GetConfigForTest always logs on success (after require.NoError), the logging here is redundant.

 func NewClientsConfigForTest(t TestingT) (kubernetes.Interface, dynamic.Interface) {
 	config, err := GetConfigForTest(t)
-	if err == nil {
-		t.Logf("Found configuration for host %v.\n", config.Host)
-	}
-
 	require.NoError(t, err)
 	kubeClient, err := kubernetes.NewForConfig(config)
test/e2e/must_gather_operator_test.go (2)

675-695: Local variable shadows package-level ns.

At line 677, ns := &corev1.Namespace{} creates a local variable that shadows the package-level ns declared in must_gather_operator_runner_test.go. This works in this context since you're only using it to fetch the namespace, but it could cause confusion.

 func verifyOperatorDeployment() {
 	ginkgo.By("Checking must-gather-operator namespace exists")
-	ns := &corev1.Namespace{}
-	err := adminClient.Get(testCtx, client.ObjectKey{Name: operatorNamespace}, ns)
+	operatorNS := &corev1.Namespace{}
+	err := adminClient.Get(testCtx, client.ObjectKey{Name: operatorNamespace}, operatorNS)
 	Expect(err).NotTo(HaveOccurred(), "must-gather-operator namespace should exist")

138-154: Silent error handling in AfterEach cleanup.

At line 141, _ = adminClient.Delete(testCtx, mustGatherCR) silently ignores errors. While cleanup errors are often non-critical, completely ignoring them could mask legitimate issues during test debugging.

Consider logging the error if deletion fails unexpectedly:

 		ginkgo.By("Cleaning up MustGather CR")
-		_ = adminClient.Delete(testCtx, mustGatherCR)
+		if err := adminClient.Delete(testCtx, mustGatherCR); err != nil && !apierrors.IsNotFound(err) {
+			ginkgo.GinkgoWriter.Printf("Warning: failed to delete MustGather CR: %v\n", err)
+		}
test/library/dynamic_resources.go (1)

128-148: Uses context.TODO() instead of stored context.

The DynamicResourceLoader stores a context in d.context (line 34), but DeleteFromFile (line 131) and CreateFromFile (line 142) use context.TODO() instead. This bypasses any context cancellation or timeout that might be set on the stored context.

 func (d DynamicResourceLoader) DeleteFromFile(assetFunc func(name string) ([]byte, error), filename string, overrideNamespace string) {
 	d.t.Logf("Deleting resource %v\n", filename)
 	deleteFunc := func(t TestingT, unstructured *unstructured.Unstructured, dynamicResourceInterface dynamic.ResourceInterface) {
-		err := dynamicResourceInterface.Delete(context.TODO(), unstructured.GetName(), metav1.DeleteOptions{})
+		err := dynamicResourceInterface.Delete(d.context, unstructured.GetName(), metav1.DeleteOptions{})
 		d.noErrorSkipNotExisting(err)
 	}

 	d.do(deleteFunc, assetFunc, filename, overrideNamespace)
 	d.t.Logf("Resource %v deleted\n", filename)
 }

 func (d DynamicResourceLoader) CreateFromFile(assetFunc func(name string) ([]byte, error), filename string, overrideNamespace string) {
 	d.t.Logf("Creating resource %v\n", filename)
 	createFunc := func(t TestingT, unstructured *unstructured.Unstructured, dynamicResourceInterface dynamic.ResourceInterface) {
-		_, err := dynamicResourceInterface.Create(context.TODO(), unstructured, metav1.CreateOptions{})
+		_, err := dynamicResourceInterface.Create(d.context, unstructured, metav1.CreateOptions{})
 		d.noErrorSkipExists(err)
 	}

 	d.do(createFunc, assetFunc, filename, overrideNamespace)
 	d.t.Logf("Resource %v created\n", filename)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d552982 and 94de487.

📒 Files selected for processing (5)
  • test/e2e/must_gather_operator_runner_test.go (1 hunks)
  • test/e2e/must_gather_operator_test.go (1 hunks)
  • test/library/dynamic_resources.go (1 hunks)
  • test/library/kube_client.go (1 hunks)
  • test/library/utils.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/library/utils.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/library/kube_client.go
  • test/e2e/must_gather_operator_test.go
  • test/e2e/must_gather_operator_runner_test.go
  • test/library/dynamic_resources.go
🧬 Code graph analysis (4)
test/library/kube_client.go (1)
test/library/dynamic_resources.go (1)
  • TestingT (24-28)
test/e2e/must_gather_operator_test.go (2)
api/v1alpha1/groupversion_info.go (1)
  • AddToScheme (35-35)
api/v1alpha1/mustgather_types.go (3)
  • MustGather (193-199)
  • MustGatherSpec (28-65)
  • MustGatherList (204-208)
test/e2e/must_gather_operator_runner_test.go (1)
test/library/dynamic_resources.go (2)
  • DynamicResourceLoader (30-36)
  • NewDynamicResourceLoader (40-48)
test/library/dynamic_resources.go (2)
test/library/kube_client.go (1)
  • NewClientsConfigForTest (15-27)
version/version.go (1)
  • Version (4-4)
🔇 Additional comments (6)
test/e2e/must_gather_operator_runner_test.go (2)

66-77: LGTM - BeforeSuite correctly uses GinkgoT().

The loader is correctly initialized with GinkgoT() at line 75, which properly integrates with the Ginkgo test framework for error reporting. The setup follows a reasonable pattern for e2e test initialization.


26-31: No action needed. The test suite correctly uses ginkgo.Ordered on the Describe block (line 71 of must_gather_operator_test.go), which guarantees that BeforeAll executes before all subsequent specs and that all tests run in the defined order. This is the proper Ginkgo v2 pattern for managing shared state across tests. The cross-file package-level ns variable is safe because test ordering is explicitly controlled.

test/e2e/must_gather_operator_test.go (2)

71-109: Comprehensive test setup looks good.

The BeforeAll setup correctly:

  • Initializes admin config and client
  • Verifies operator deployment before running tests
  • Creates a dedicated test namespace
  • Sets up RBAC resources (ClusterRole, ClusterRoleBinding, ServiceAccount)
  • Creates a non-admin impersonated client

The ordered test execution with ginkgo.Ordered ensures proper sequencing.


237-292: Thorough Job verification tests.

The test properly validates:

  • Job creation with correct ServiceAccount
  • Container specifications including gather container
  • Output volume presence for artifacts

The debugging output when non-admin can't see the Job (lines 249-259) is helpful for troubleshooting RBAC issues.

test/library/dynamic_resources.go (2)

23-48: Well-designed abstraction for test utilities.

The TestingT interface provides a clean abstraction that works with both *testing.T and GinkgoT(). The DynamicResourceLoader struct encapsulates all necessary clients and context for dynamic resource operations, making it easy to use in e2e tests.


86-126: Core resource loading logic is solid.

The do() method correctly:

  • Loads and decodes YAML/JSON assets
  • Resolves GVK to REST mapping via discovery
  • Handles namespace-scoped vs cluster-scoped resources appropriately
  • Applies namespace overrides where needed
  • Properly updates subject namespaces for ClusterRoleBindings (line 119)

@swghosh
Copy link
Member

swghosh commented Dec 22, 2025

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 22, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 94de487 and 8791f83.

📒 Files selected for processing (4)
  • test/e2e/must_gather_operator_runner_test.go
  • test/e2e/must_gather_operator_test.go
  • test/library/utils.go
  • test/must-gather.yaml
💤 Files with no reviewable changes (1)
  • test/must-gather.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/must_gather_operator_test.go
  • test/library/utils.go
  • test/e2e/must_gather_operator_runner_test.go
🧬 Code graph analysis (3)
test/e2e/must_gather_operator_test.go (2)
api/v1alpha1/groupversion_info.go (1)
  • AddToScheme (35-35)
api/v1alpha1/mustgather_types.go (3)
  • MustGather (193-199)
  • MustGatherSpec (28-65)
  • MustGatherList (204-208)
test/library/utils.go (1)
test/library/dynamic_resources.go (1)
  • DynamicResourceLoader (30-36)
test/e2e/must_gather_operator_runner_test.go (1)
test/library/dynamic_resources.go (2)
  • DynamicResourceLoader (30-36)
  • NewDynamicResourceLoader (40-48)

Comment on lines +50 to +58
var _ = BeforeSuite(func() {
var err error
cfg, err = config.GetConfig()
Expect(err).NotTo(HaveOccurred())

By("creating dynamic resources client")
loader = library.NewDynamicResourceLoader(context.TODO(), GinkgoT())

})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pass a proper context instead of context.TODO() to the loader.

Line 56 passes context.TODO() to NewDynamicResourceLoader, which becomes the d.context field used throughout the loader's methods (including the utilities in test/library/utils.go). This prevents proper test cancellation and cleanup. Use a test-scoped context that can be cancelled when the suite terminates.

🔎 Proposed fix

Add a suite-level context variable and use it:

 var (
 	loader library.DynamicResourceLoader
 	cfg    *rest.Config
 	ns     *corev1.Namespace
+	suiteCtx context.Context
+	suiteCancel context.CancelFunc
 )
 
 var _ = BeforeSuite(func() {
 	var err error
 	cfg, err = config.GetConfig()
 	Expect(err).NotTo(HaveOccurred())
 
 	By("creating dynamic resources client")
-	loader = library.NewDynamicResourceLoader(context.TODO(), GinkgoT())
+	suiteCtx, suiteCancel = context.WithCancel(context.Background())
+	loader = library.NewDynamicResourceLoader(suiteCtx, GinkgoT())
+})
 
+var _ = AfterSuite(func() {
+	if suiteCancel != nil {
+		suiteCancel()
+	}
 })

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In test/e2e/must_gather_operator_runner_test.go around lines 50 to 58, the
loader is created with context.TODO() which prevents proper cancellation and
cleanup; replace that with a suite-level cancellable context: declare a
package/suite-level context and cancel function (e.g., suiteCtx, suiteCancel),
initialize them in BeforeSuite using context.WithCancel(context.Background())
and pass suiteCtx to NewDynamicResourceLoader, and ensure you call suiteCancel()
in AfterSuite to cancel and clean up tests.

Comment on lines +18 to +46
func (d DynamicResourceLoader) CreateTestNS(namespacePrefix string, noSuffix bool) (*corev1.Namespace, error) {
namespace := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"e2e-test": "support-log-gather",
},
},
}

if noSuffix {
namespace.Name = namespacePrefix
} else {
namespace.GenerateName = fmt.Sprintf("%v-", namespacePrefix)
}

var got *corev1.Namespace
if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
var err error
got, err = d.KubeClient.CoreV1().Namespaces().Create(context.Background(), namespace, metav1.CreateOptions{})
if err != nil {
log.Printf("Error creating namespace: %v", err)
return false, nil
}
return true, nil
}); err != nil {
return nil, err
}
return got, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use d.context instead of context.TODO() and context.Background().

Both the polling loop (line 34) and the Create call (line 36) ignore d.context, preventing proper cancellation and timeout propagation from the test framework. If the test is cancelled or times out, this function will continue polling and creating resources, potentially causing resource leaks.

🔎 Proposed fix
-	if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
+	if err := wait.PollUntilContextTimeout(d.context, 1*time.Second, 30*time.Second, true, func(ctx context.Context) (bool, error) {
 		var err error
-		got, err = d.KubeClient.CoreV1().Namespaces().Create(context.Background(), namespace, metav1.CreateOptions{})
+		got, err = d.KubeClient.CoreV1().Namespaces().Create(ctx, namespace, metav1.CreateOptions{})
 		if err != nil {
 			log.Printf("Error creating namespace: %v", err)
 			return false, nil
 		}
 		return true, nil
 	}); err != nil {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (d DynamicResourceLoader) CreateTestNS(namespacePrefix string, noSuffix bool) (*corev1.Namespace, error) {
namespace := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"e2e-test": "support-log-gather",
},
},
}
if noSuffix {
namespace.Name = namespacePrefix
} else {
namespace.GenerateName = fmt.Sprintf("%v-", namespacePrefix)
}
var got *corev1.Namespace
if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
var err error
got, err = d.KubeClient.CoreV1().Namespaces().Create(context.Background(), namespace, metav1.CreateOptions{})
if err != nil {
log.Printf("Error creating namespace: %v", err)
return false, nil
}
return true, nil
}); err != nil {
return nil, err
}
return got, nil
}
func (d DynamicResourceLoader) CreateTestNS(namespacePrefix string, noSuffix bool) (*corev1.Namespace, error) {
namespace := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"e2e-test": "support-log-gather",
},
},
}
if noSuffix {
namespace.Name = namespacePrefix
} else {
namespace.GenerateName = fmt.Sprintf("%v-", namespacePrefix)
}
var got *corev1.Namespace
if err := wait.PollUntilContextTimeout(d.context, 1*time.Second, 30*time.Second, true, func(ctx context.Context) (bool, error) {
var err error
got, err = d.KubeClient.CoreV1().Namespaces().Create(ctx, namespace, metav1.CreateOptions{})
if err != nil {
log.Printf("Error creating namespace: %v", err)
return false, nil
}
return true, nil
}); err != nil {
return nil, err
}
return got, nil
}
🤖 Prompt for AI Agents
In test/library/utils.go around lines 18 to 46, the polling loop and Create call
use context.TODO() and context.Background(), ignoring the loader's d.context
which prevents proper cancellation/timeouts; replace context.TODO() passed to
wait.PollUntilContextTimeout with d.context and replace context.Background()
passed to Namespaces().Create with d.context so the poll and create operations
respect the test framework's cancellation and timeout; ensure the function
returns any context-derived errors from the poll instead of swallowing them.

Comment on lines +48 to +71
func (d DynamicResourceLoader) DeleteTestingNS(name string, shouldDumpEvents func() bool) error {
ctx := context.Background()
if shouldDumpEvents() {
d.DumpEventsInNamespace(name)
}

err := d.KubeClient.CoreV1().Namespaces().Delete(ctx, name, metav1.DeleteOptions{})
if err != nil {
log.Printf("Error deleting namespace %v, err: %v", name, err)
}

if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
// Poll until namespace is deleted
_, err := d.KubeClient.CoreV1().Namespaces().Get(ctx, name, metav1.GetOptions{})
if err != nil && k8serrors.IsNotFound(err) {
return true, nil
}
return false, nil
}); err != nil {
log.Printf("Timed out after 30s waiting for namespace %v to become deleted", name)
return err
}
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use d.context instead of context.Background() and context.TODO().

This function creates a new context.Background() (line 49) and uses context.TODO() in polling (line 59), ignoring d.context. This prevents proper cancellation and cleanup when tests are terminated, potentially leaving namespaces in terminating state.

🔎 Proposed fix
 func (d DynamicResourceLoader) DeleteTestingNS(name string, shouldDumpEvents func() bool) error {
-	ctx := context.Background()
 	if shouldDumpEvents() {
 		d.DumpEventsInNamespace(name)
 	}
 
-	err := d.KubeClient.CoreV1().Namespaces().Delete(ctx, name, metav1.DeleteOptions{})
+	err := d.KubeClient.CoreV1().Namespaces().Delete(d.context, name, metav1.DeleteOptions{})
 	if err != nil {
 		log.Printf("Error deleting namespace %v, err: %v", name, err)
 	}
 
-	if err := wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, 30*time.Second, true, func(context.Context) (bool, error) {
+	if err := wait.PollUntilContextTimeout(d.context, 1*time.Second, 30*time.Second, true, func(ctx context.Context) (bool, error) {
 		// Poll until namespace is deleted
-		_, err := d.KubeClient.CoreV1().Namespaces().Get(ctx, name, metav1.GetOptions{})
+		_, err := d.KubeClient.CoreV1().Namespaces().Get(ctx, name, metav1.GetOptions{})
 		if err != nil && k8serrors.IsNotFound(err) {
 			return true, nil
 		}
 		return false, nil
 	}); err != nil {
🤖 Prompt for AI Agents
In test/library/utils.go around lines 48 to 71, the function creates a new
context.Background() and uses context.TODO(), ignoring the loader's d.context
which prevents proper cancellation; replace both context.Background() and
context.TODO() with d.context (e.g., set ctx := d.context at the top) and use
that ctx when calling Namespaces().Delete and as the context argument to
wait.PollUntilContextTimeout so the operations respect cancellation and test
cleanup.

Comment on lines +73 to +84
func (d DynamicResourceLoader) DumpEventsInNamespace(name string) {
log.Printf("Dumping events in namespace %s...", name)
events, err := d.KubeClient.CoreV1().Events(name).List(context.TODO(), metav1.ListOptions{})
if err != nil {
log.Printf("Error listing events in namespace %s: %v", name, err)
return
}

for _, e := range events.Items {
log.Printf("At %v - event for %v %v: %v %v: %v", e.FirstTimestamp, e.InvolvedObject.Kind, e.InvolvedObject.Name, e.Source, e.Reason, e.Message)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use d.context instead of context.TODO() for consistency.

While this is a best-effort logging operation, using d.context ensures the event listing respects test cancellation and avoids unnecessary API calls when the test framework is shutting down.

🔎 Proposed fix
 func (d DynamicResourceLoader) DumpEventsInNamespace(name string) {
 	log.Printf("Dumping events in namespace %s...", name)
-	events, err := d.KubeClient.CoreV1().Events(name).List(context.TODO(), metav1.ListOptions{})
+	events, err := d.KubeClient.CoreV1().Events(name).List(d.context, metav1.ListOptions{})
 	if err != nil {
 		log.Printf("Error listing events in namespace %s: %v", name, err)
 		return
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (d DynamicResourceLoader) DumpEventsInNamespace(name string) {
log.Printf("Dumping events in namespace %s...", name)
events, err := d.KubeClient.CoreV1().Events(name).List(context.TODO(), metav1.ListOptions{})
if err != nil {
log.Printf("Error listing events in namespace %s: %v", name, err)
return
}
for _, e := range events.Items {
log.Printf("At %v - event for %v %v: %v %v: %v", e.FirstTimestamp, e.InvolvedObject.Kind, e.InvolvedObject.Name, e.Source, e.Reason, e.Message)
}
}
func (d DynamicResourceLoader) DumpEventsInNamespace(name string) {
log.Printf("Dumping events in namespace %s...", name)
events, err := d.KubeClient.CoreV1().Events(name).List(d.context, metav1.ListOptions{})
if err != nil {
log.Printf("Error listing events in namespace %s: %v", name, err)
return
}
for _, e := range events.Items {
log.Printf("At %v - event for %v %v: %v %v: %v", e.FirstTimestamp, e.InvolvedObject.Kind, e.InvolvedObject.Name, e.Source, e.Reason, e.Message)
}
}
🤖 Prompt for AI Agents
In test/library/utils.go around lines 73 to 84, the call to context.TODO() when
listing events should use d.context so the call respects test cancellation;
change List(context.TODO(), ...) to List(d.context, ...) (ensure d.context is a
valid context.Context on the DynamicResourceLoader struct).

Copy link
Member

@swghosh swghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, looking great except one small nit.

(also, I din't 👀 through the dynamic_resource_loader with the assumption it was copied as-is from https://github.com/openshift/cert-manager-operator/tree/master/test/library)

I'm only concerned that the serviceaccount-clusterrole will be unable to collect a full must-gather during the gather Job (cc: @shivprakashmuley to evaluate if that's blocking/not for this PR; or follow-up in #264 or similar).

Thank you for the great work and pulling this through @praveencodes!

Comment on lines +581 to +605
ginkgo.It("should fail when non-admin user tries to use privileged ServiceAccount", func() {
mustGatherName := fmt.Sprintf("test-privileged-sa-%d", time.Now().UnixNano())

ginkgo.By("Attempting to create MustGather with non-existent privileged SA")
mg = createMustGatherCR(mustGatherName, ns.Name, "cluster-admin-sa", false) // SA doesn't exist

ginkgo.By("Verifying Job is created but Pod fails due to missing SA")
job := &batchv1.Job{}
Eventually(func() error {
return adminClient.Get(testCtx, client.ObjectKey{
Name: mustGatherName,
Namespace: ns.Name,
}, job)
}).WithTimeout(2 * time.Minute).WithPolling(5 * time.Second).Should(Succeed())

// Job should exist but pods won't be created due to missing SA
Consistently(func() int32 {
_ = adminClient.Get(testCtx, client.ObjectKey{
Name: mustGatherName,
Namespace: ns.Name,
}, job)
return job.Status.Active
}).WithTimeout(30*time.Second).WithPolling(5*time.Second).Should(Equal(int32(0)),
"Job should not have active pods due to missing ServiceAccount")
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is a good test to have, "check if absent serviceaccount actually fails"
do you think this is wrongly worded as a "privileged ServiceAccount" check?

Comment on lines +633 to +638
for _, container := range gatherPod.Spec.Containers {
if container.SecurityContext != nil && container.SecurityContext.Privileged != nil {
Expect(*container.SecurityContext.Privileged).To(BeFalse(),
"Container should not run in privileged mode")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sufficient check, I'd say.
Or if you want to take further you can check the SCC/PSA annotations on the pod not being "privileged".

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 23, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: praveencodes, swghosh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 23, 2025

@praveencodes: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/validate-boilerplate 8791f83 link false /test validate-boilerplate

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@shivprakashmuley
Copy link
Contributor

/label no-qe
as E2E is not user facing

@openshift-ci openshift-ci bot added the no-qe Indicates that QE signoff is not required. label Dec 24, 2025
@shivprakashmuley
Copy link
Contributor

/verified

@openshift-ci-robot
Copy link

@shivprakashmuley: The /verified command must be used with one of the following actions: by, later, remove, or bypass. See https://docs.ci.openshift.org/docs/architecture/jira/#premerge-verification for more information.

Details

In response to this:

/verified

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@shivprakashmuley
Copy link
Contributor

/verified bypass

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 24, 2025
@openshift-ci-robot
Copy link

@shivprakashmuley: The verified label has been added.

Details

In response to this:

/verified bypass

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 130bc8a into openshift:master Dec 24, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. no-qe Indicates that QE signoff is not required. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants