Skip to content

Migrating user_certs_test to ote#1993

Closed
gangwgr wants to merge 2 commits intoopenshift:mainfrom
gangwgr:user_certs_test
Closed

Migrating user_certs_test to ote#1993
gangwgr wants to merge 2 commits intoopenshift:mainfrom
gangwgr:user_certs_test

Conversation

@gangwgr
Copy link
Copy Markdown
Contributor

@gangwgr gangwgr commented Jan 8, 2026

Migrating user_certs_test to ote

## Summary
  Migrates the `TestNamedCertificates` test suite to the OTE (OpenShift Test Extended) Ginkgo framework while maintaining dual-compatibility with traditional Go tests.

  **Critical Performance Fix**: Initial OTE migration used `DescribeTable` with `BeforeAll/AfterAll`, which incorrectly triggered 19 separate API server rollouts (one per subtest), taking ~10 hours. Fixed by using a simple `g.It()` wrapper that calls the shared test function once, reducing test time to ~34 minutes.

  ## Changes

  ### Added `test/e2e/user_certs.go` (+420 lines)
  - Add Ginkgo test wrapper for OTE test discovery:
    - Simple `g.It()` wrapper that calls shared function once
    - **Avoids performance issues** with `DescribeTable` approach
  - Extract test logic into shared function with `testing.TB` interface:
    - `testNamedCertificates(t testing.TB)` - Tests all 19 named certificate scenarios
  - Add helper functions for certificate testing:
    - `newTestCertInfo()` - Creates test certificate info
    - `createTLSSecret()` - Creates TLS secrets
    - `deleteSecret()` - Deletes secrets
    - `updateAPIServerClusterConfigSpec()` - Updates API server config
    - `removeNamedCertificatesBySecretName()` - Cleanup helper
    - `serialNumberOfCertificateFromSecretOrFail()` - Gets cert serial numbers
    - `getReturnedCertSerialNumber()` - Verifies returned certificates
    - `getKubernetesServiceClusterIPOrFail()` - Gets service IP
    - `getExternalAPIServiceHostNameOrFail()` - Gets external hostname
    - `getInternalAPIServiceHostNameOrFail()` - Gets internal hostname

  ### Modified `test/e2e/user_certs_test.go` (+6/-397 = -391 net)
  - **Keep test runner** that calls shared function:
    ```go
    func TestNamedCertificates(t *testing.T) {
        testNamedCertificates(t)
    }
  - Remove all test implementation logic (moved to user_certs.go)

  Modified test/library/library.go (+1/-1)

  - Change GenerateNameForTest signature from *testing.T to testing.TB
  - Enables use in both standard Go tests and Ginkgo

  Modified test/library/crypto.go (+2/-2)

  - Change NewServerCertificate signature from *testing.T to testing.TB
  - Change NewCertificateAuthorityCertificate signature from *testing.T to testing.TB
  - Enables use in both standard Go tests and Ginkgo

  Modified test/e2e/serviceaccountissuer.go (+9/-6)

  - Minor refactoring for consistency with OTE pattern

  Modified test/e2e/serviceaccountissuer_test.go (+36/-6)

  - Updated test structure for OTE compatibility

  Net change: +30 lines (479 added - 449 removed across test files)

  Performance Fix

  Problem: Initial migration used g.Context with g.Ordered, BeforeAll/AfterAll, and DescribeTable with 19 entries. This caused:
  - 19 separate API server rollouts (one per Entry)
  - Each rollout: ~32 minutes
  - Total time: ~10 hours

  Solution: Simplified to single g.It() that calls testNamedCertificates() once:
  g.It("[Operator][Serial][Timeout:40m] TestNamedCertificates", func() {
      testNamedCertificates(g.GinkgoTB())
  })

  Result:
  - 1 API server rollout: ~32 minutes
  - All 19 test cases: ~2 minutes
  - Total time: ~34 minutes (94% reduction)

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

Walkthrough

Refactors test helpers to accept testing.TB, adds a new end-to-end test that verifies API server named TLS certificates, updates TestNamedCertificates to call a shared helper, and removes an import and a wait-for-stabilization call from the service-account-issuer test.

Changes

Cohort / File(s) Change Summary
Test library signature updates
test/library/crypto.go, test/library/library.go
Changed signatures to accept testing.TB instead of *testing.T for NewServerCertificate, NewCertificateAuthorityCertificate, and GenerateNameForTest. No logic changes.
New E2E test
test/e2e/user_certs.go
Added a Ginkgo/Gomega e2e test that generates CA and server certs, creates TLS secrets in openshift-config, configures API server named-cert entries, maps hostnames to expected certificate serials, polls the API server to verify presented certs, and cleans up secrets.
Test entry and minor refactor
test/e2e/user_certs_test.go, test/e2e/serviceaccountissuer.go
TestNamedCertificates now delegates to testNamedCertificates. Removed testlibraryapi import alias and the call to wait for API server stabilization in the service-account-issuer test; updated test description string.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2026
@gangwgr gangwgr force-pushed the user_certs_test branch 2 times, most recently from d5eda9f to 68f850b Compare January 8, 2026 10:05
@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 8, 2026

/test e2e-aws-ovn-serial-1of2

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 9, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr gangwgr changed the title [WIP]Migrating user_certs_test to ote Migrating user_certs_test to ote Jan 9, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2026
@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 9, 2026

/test e2e-gcp-operator-serial-ote

7 similar comments
@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 9, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 12, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 12, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 13, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 13, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 14, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 15, 2026

/test e2e-gcp-operator-serial-ote

@gangwgr gangwgr force-pushed the user_certs_test branch 4 times, most recently from aef3a35 to 5a1225b Compare January 16, 2026 04:24
@wangke19
Copy link
Copy Markdown
Contributor

Update a Summary for PR, you can see my PR openshift/service-ca-operator#305, better let me know, such as:
Modified in test/e2e/e2e_test.go (+6/-129 = -123 net)

@wangke19
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gangwgr, wangke19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

err = pollForOperandIssuer(t, kubeClient, []string{"https://kubernetes.default.svc"})
require.NoError(t, err, "pollForOperandIssuer failed")
// Wait for API server to stabilize after configuration change
testlibraryapi.WaitForAPIServerToStabilizeOnTheSameRevision(t, kubeClient.Pods(operatorclient.TargetNamespace))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread test/e2e/user_certs.go Outdated
)

var _ = g.Describe("[sig-api-machinery] kube-apiserver operator", func() {
g.It("[Operator][Serial][Timeout:40m] TestNamedCertificates", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep the same name as gotest

Comment thread test/e2e/user_certs.go Outdated
)

var _ = g.Describe("[sig-api-machinery] kube-apiserver operator", func() {
g.It("[Operator][Serial][Timeout:40m] TestNamedCertificates", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this (It) should also describe the purpose

Comment thread test/e2e/user_certs.go
for _, tc := range testCases {
tc := tc // capture range variable

// Use t.Run if available (*testing.T), otherwise run directly (Ginkgo's testing.TB)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting.

normally we would:

setup the test
then run cases as subtest

mhm, should we do that ?
what is the benefit of having subtests vs one single test ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gangwgr @wangke19 what do you think about ^

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p0lyn0mial
Copy link
Copy Markdown
Contributor

p0lyn0mial commented Jan 16, 2026

/hold

@wangke19 i'm still reviewing the PR :)

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2026
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 16, 2026

New changes are detected. LGTM label has been removed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/e2e/user_certs.go`:
- Around line 228-245: The poll callback passed to wait.PollImmediate must not
call require.NoError (which calls t.FailNow); in both branches replace the
require.NoError inside the callback with returning (false, err) when
getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName) fails, and
compare/return the boolean only otherwise; then after wait.PollImmediate
completes, call require.NoError(t, err, "test case %s failed", tc.name) (or
require.NoError(t, err) in the non-Ginkgo path) to fail the test with the poll
error—update the callback in both the non-Ginkgo and Ginkgo branches referencing
getReturnedCertSerialNumber and wait.PollImmediate accordingly.
🧹 Nitpick comments (2)
test/e2e/user_certs.go (2)

261-265: Consider adding MinVersion to TLS config for defense-in-depth.

The static analyzer flagged the missing MinVersion. While this is test code connecting to a trusted internal API server with InsecureSkipVerify: true (appropriate for certificate inspection), adding MinVersion: tls.VersionTLS12 would be a defense-in-depth measure.

♻️ Optional improvement
 	tlsConf := &tls.Config{
 		VerifyPeerCertificate: verifyPeerCertificate,
 		ServerName:            serverName,
 		InsecureSkipVerify:    true,
+		MinVersion:            tls.VersionTLS12,
 	}

317-317: Use corev1.TLSCertKey constant for consistency.

The hardcoded "tls.crt" string works but is inconsistent with createTLSSecret which uses the corev1.TLSCertKey constant on line 301.

♻️ Optional improvement
-	certificates, err := cert.ParseCertsPEM(secret.Data["tls.crt"])
+	certificates, err := cert.ParseCertsPEM(secret.Data[corev1.TLSCertKey])
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1225b and 35b4311.

📒 Files selected for processing (2)
  • test/e2e/serviceaccountissuer.go
  • test/e2e/user_certs.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/user_certs.go
  • test/e2e/serviceaccountissuer.go
🪛 ast-grep (0.40.5)
test/e2e/user_certs.go

[warning] 260-264: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
VerifyPeerCertificate: verifyPeerCertificate,
ServerName: serverName,
InsecureSkipVerify: true,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🔇 Additional comments (5)
test/e2e/serviceaccountissuer.go (1)

29-29: Verify the timeout removal is intentional.

The [Timeout:30m] tag was removed from the test description. Ensure this is intentional and that the test framework or CI configuration provides adequate timeout handling to prevent indefinitely hanging tests.

test/e2e/user_certs.go (4)

34-38: LGTM!

The Ginkgo wrapper is well-structured with a descriptive test name and appropriate 40-minute timeout given the expected ~34-minute runtime.


40-113: LGTM!

The test setup properly initializes clients, creates test certificates with appropriate cleanup via defer, and retrieves baseline certificate serial numbers for comparison. Good use of assert.NoError in cleanup defers vs require.NoError for critical setup steps.


278-304: LGTM!

Secret management helpers are well-implemented with proper error handling and use of standard Kubernetes constants.


353-420: LGTM!

API hostname retrieval and test certificate helpers are clean and follow consistent patterns with appropriate error handling and logging.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread test/e2e/user_certs.go
Comment on lines +228 to +245
err := wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
// connect to apiserver using a custom ServerName and examine the returned certificate's
// serial number to determine if the expected serving certificate was returned.
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
require.NoError(t, err)
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err)
})
} else {
// Ginkgo path - no subtests available
t.Logf("Running test case: %s", tc.name)
err = wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
require.NoError(t, err)
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err, "test case %s failed", tc.name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid require.NoError inside poll callbacks.

Using require.NoError inside wait.PollImmediate callbacks is problematic because require calls t.FailNow(), which uses runtime.Goexit(). This can cause unexpected behavior when called from goroutines or callbacks. Instead, return the error from the callback and handle it after the poll completes.

🐛 Proposed fix
 			err := wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
 				// connect to apiserver using a custom ServerName and examine the returned certificate's
 				// serial number to determine if the expected serving certificate was returned.
 				serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
-				require.NoError(t, err)
+				if err != nil {
+					return false, nil // retry on error
+				}
 				return tc.expectedSerialNumber == serialNumber, nil
 			})
 			require.NoError(t, err)

Apply the same fix to the Ginkgo path (lines 240-244):

 		err = wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
 			serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
-			require.NoError(t, err)
+			if err != nil {
+				return false, nil // retry on error
+			}
 			return tc.expectedSerialNumber == serialNumber, nil
 		})
 		require.NoError(t, err, "test case %s failed", tc.name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err := wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
// connect to apiserver using a custom ServerName and examine the returned certificate's
// serial number to determine if the expected serving certificate was returned.
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
require.NoError(t, err)
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err)
})
} else {
// Ginkgo path - no subtests available
t.Logf("Running test case: %s", tc.name)
err = wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
require.NoError(t, err)
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err, "test case %s failed", tc.name)
err := wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
// connect to apiserver using a custom ServerName and examine the returned certificate's
// serial number to determine if the expected serving certificate was returned.
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
if err != nil {
return false, nil // retry on error
}
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err)
} else {
// Ginkgo path - no subtests available
t.Logf("Running test case: %s", tc.name)
err = wait.PollImmediate(time.Second, time.Minute, func() (bool, error) {
serialNumber, err := getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
if err != nil {
return false, nil // retry on error
}
return tc.expectedSerialNumber == serialNumber, nil
})
require.NoError(t, err, "test case %s failed", tc.name)
🤖 Prompt for AI Agents
In `@test/e2e/user_certs.go` around lines 228 - 245, The poll callback passed to
wait.PollImmediate must not call require.NoError (which calls t.FailNow); in
both branches replace the require.NoError inside the callback with returning
(false, err) when getReturnedCertSerialNumber(kubeConfig.Host, tc.serverName)
fails, and compare/return the boolean only otherwise; then after
wait.PollImmediate completes, call require.NoError(t, err, "test case %s
failed", tc.name) (or require.NoError(t, err) in the non-Ginkgo path) to fail
the test with the poll error—update the callback in both the non-Ginkgo and
Ginkgo branches referencing getReturnedCertSerialNumber and wait.PollImmediate
accordingly.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Jan 16, 2026

@openshift-ci openshift-ci Bot closed this Jan 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 16, 2026

@gangwgr: Closed this PR.

Details

In response to this:

/close
for https://redhat-internal.slack.com/archives/CC3CZCQHM/p1768565818656259?thread_ts=1768456327.979689&cid=CC3CZCQHM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants