Skip to content

Conversation

@mtulio
Copy link
Collaborator

@mtulio mtulio commented Nov 26, 2025

Summary

This PR addresses OCP 4.20+ compatibility issues where the kubernetes/conformance suite was removed from openshift-tests, requiring dynamic extraction from the k8s-tests-ext binary. Additionally, it refactors the client architecture to improve code organization and maintainability.

Problem

Starting with OCP 4.20, the kubernetes/conformance suite is no longer available in openshift-tests. This causes the kube-conformance plugin to fail when trying to run conformance tests. Additionally, the client initialization code was scattered across multiple packages, making it harder to maintain and test.

Solution

1. Dynamic Test Extraction (OCP 4.20+ Support)

Added an init container (extract-k8s-conformance-tests) to the kube-conformance plugin that:

  1. Retrieves the hyperkube image from the cluster release
  2. Extracts the k8s-tests-ext.gz binary
  3. Runs the binary to list all tests and filters conformance tests
  4. Saves the test list to /tmp/shared/k8s-conformance-tests.list
  5. The tests container uses this list via the entrypoint script

2. Suite Name Enforcement by Cluster Version

Added logic to automatically detect cluster version and set the appropriate suite name:

  • OCP 4.19 and earlier: Uses kubernetes/conformance (default)
  • OCP 4.20+: Uses kubernetes/conformance/parallel (new parallel suite)

Implementation:

  • Added setSuiteName() function in pkg/run/run.go that:
    • Detects cluster version using the ClusterVersion API
    • Parses version to determine if >= 4.20
    • Sets suiteNameKubernetesConformance ConfigMap key accordingly
  • Plugin consumes this via DEFAULT_SUITE_NAME environment variable

3. Client Architecture Refactoring

Consolidated client creation into a unified Client struct:

Before:

kclient, sclient, err := client.CreateClients()
restConfig, err := client.CreateRestConfig()

After:

cli, err := client.NewClient()
// Access via cli.KClient, cli.SClient, cli.RestConfig

Benefits:

  • Single source of truth for client initialization
  • Easier to pass clients between functions
  • Better testability with comprehensive unit tests
  • Reduced code duplication

Files Updated:

  • pkg/client/client.go - New unified Client struct
  • pkg/client/client_test.go - Comprehensive unit tests (24 test scenarios)
  • pkg/run/run.go - Updated to use new client architecture
  • pkg/run/validations.go - Updated function signatures
  • pkg/status/status.go - Updated to use new client
  • pkg/destroy/destroy.go - Updated to use new client
  • pkg/cmd/adm/e2ed/taintNode.go - Updated to use new client

Changes

Core Implementation Files

data/templates/plugins/openshift-kube-conformance.yaml

  • Added extract-k8s-conformance-tests init container (73 insertions)
  • Extracts tests using jq with grep fallback for compatibility
  • Gracefully handles failures by falling back to default suite
  • Consumes DEFAULT_SUITE_NAME from ConfigMap for version-specific suite selection
  • Fixed yamllint issues (line length, string continuation)

pkg/run/run.go

  • Added setSuiteName() function to detect cluster version and set suite name
  • Updated to use new unified Client struct
  • Enhanced dry-run mode to print ConfigMaps as JSON
  • Fixed nil pointer dereference by correcting variable shadowing
  • Added suite name to plugin configuration via ConfigMap

pkg/client/client.go

  • Refactored to use unified Client struct with KClient, SClient, and RestConfig
  • Replaced CreateClients() with NewClient() for better encapsulation
  • Made createRestConfig() private for better encapsulation

pkg/client/client_test.go (NEW)

  • Added comprehensive unit tests for client creation
  • 24 test scenarios covering success and error paths
  • Tests for environment variable priority and configuration fallback

Supporting Files

pkg/run/validations.go

  • Updated function signatures to accept RestConfig as parameter
  • Removed dependency on client package for config creation

pkg/status/status.go

  • Updated to use new Client struct
  • Fixed missing assignment bug in client initialization

pkg/destroy/destroy.go

  • Updated to use NewClient() instead of CreateClients()

pkg/cmd/adm/e2ed/taintNode.go

  • Updated to use new client architecture

.containerignore / .gitignore

  • Added patterns for better build artifact management

Backward Compatibility

  • OCP 4.19 and earlier: Falls back to kubernetes/conformance suite
  • OCP 4.20+: Uses kubernetes/conformance/parallel suite dynamically
  • Graceful degradation: If version detection or test extraction fails, uses default suite
  • Client refactoring: Fully backward compatible, no API changes to external consumers

Related PRs

Testing

Validation Performed

  • ✅ All tests pass: make test (30 packages)
  • ✅ Go vet clean: make vet
  • ✅ YAML linting passes: yamllint
  • ✅ Build successful: make build
  • ✅ Client unit tests: 24 test scenarios covering all code paths
  • ✅ Dry-run mode tested with verbose output

CI Validation

  • Validated with CI rehearsal job on OCP 4.21 cluster
  • All linter checks passing (go_lint, yaml)

Migration Notes

For developers updating code that uses the client package:

Before:

kclient, sclient, err := client.CreateClients()
if err != nil {
    return err
}

After:

cli, err := client.NewClient()
if err != nil {
    return err
}
// Use cli.KClient, cli.SClient, cli.RestConfig

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025
@mtulio mtulio force-pushed the spike-420-issues branch 3 times, most recently from 4b8e2df to 13ab6eb Compare December 1, 2025 18:46
mtulio added a commit to mtulio/provider-certification-plugins that referenced this pull request Dec 1, 2025
This commit implements two critical fixes for OCP 4.20+ compatibility:

1. **Failure propagation mechanism**
   - Detects when blocker plugins fail and stops dependent plugins
   - Prevents wasted execution time on tests that can't succeed
   - Exception: artifacts collector (plugin 99) always runs
   - Uses plugin ID check for cleaner, more maintainable logic
   - Location: openshift-tests-plugin/pkg/plugin/plugin.go:716

2. **Kubernetes conformance test extraction support**
   - Consumes extracted test list from init container
   - Checks for /tmp/shared/k8s-conformance-tests.list
   - Falls back to default suite if extraction unavailable
   - Works with init container defined in opct repository
   - Location: openshift-tests-plugin/plugin/entrypoint-tests.sh:59-76

Failure propagation chain:
- 05 (upgrade) → blocks → 10 (kube-conformance)
- 10 (kube-conformance) → blocks → 20 (conformance-validated)
- 20 (conformance-validated) → blocks → 80 (replay)
- 80 (replay) → blocks → 99 (artifacts-collector)
- Plugin 99 (artifacts collector) always runs regardless of failures

Related: redhat-openshift-ecosystem/opct#183

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@mtulio mtulio changed the title fix: resolve CI failures for OCP 4.20+ clusters fix: extract k8s conformance tests for OCP 4.20+ Dec 1, 2025
@mtulio mtulio changed the title fix: extract k8s conformance tests for OCP 4.20+ OCPBUGS-66219: fix: extract k8s conformance tests for OCP 4.20+ Dec 1, 2025
@mtulio mtulio added the kind/bug Categorizes issue or PR as related to a bug. label Dec 1, 2025
@mtulio mtulio marked this pull request as ready for review December 1, 2025 20:17
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025
@openshift-ci openshift-ci bot requested review from jcpowermac and jinhli December 1, 2025 20:17
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025
mtulio added a commit to mtulio/provider-certification-tool that referenced this pull request Dec 2, 2025
Fixed linter issues found in CI checks:

**YAML linting (data/templates/plugins/openshift-kube-conformance.yaml)**:
- Fixed line-length error on line 55 (93 > 80 characters)
- Split echo message across two lines for better readability

**Go linting (pkg/client/client_test.go)**:
- Fixed errcheck warnings by handling return values from os.Setenv/os.Unsetenv
- Used blank identifier (_) to explicitly ignore errors in test setup/cleanup
- These are intentional in test contexts where environment setup failures
  would be caught by subsequent test assertions

Validation:
- ✅ yamllint passes for openshift-kube-conformance.yaml
- ✅ make test - all tests passing
- ✅ make vet - no issues found
- ✅ go test ./pkg/client - all client tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@mtulio mtulio added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 2, 2025
@mtulio mtulio removed the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 4, 2025
@mtulio
Copy link
Collaborator Author

mtulio commented Dec 4, 2025

PR validated by openshift/release#71865 (comment)

@jcpowermac
Copy link
Collaborator

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcpowermac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2025
openshift-merge-bot bot pushed a commit to redhat-openshift-ecosystem/provider-certification-plugins that referenced this pull request Dec 5, 2025
…and failure propagation (#82)

## Summary

This PR implements two critical fixes for OCP 4.20+ compatibility and
improved CI efficiency:
1. Failure propagation mechanism to stop dependent plugins when
prerequisites fail
2. Support for consuming extracted Kubernetes conformance tests from
init container

## Changes

### 1. Failure Propagation (plugin.go)

**Problem**: When a conformance plugin fails, subsequent dependent
plugins continue running and waste execution time on tests that cannot
succeed.

**Solution**: Added failure detection in `RunDependencyWaiter()` that:
- Checks if blocker plugin status is "failed"
- Returns error to stop dependent plugin execution
- Exception: Plugin 99 (artifacts-collector) always runs to collect logs

**Code change** (`openshift-tests-plugin/pkg/plugin/plugin.go:714-719`):
```go
// Check if the blocker plugin failed and propagate failure to dependent plugins
// Exception: artifacts collector (99-openshift-artifacts-collector) should always run
if pStatusBlocker.Status == "failed" && p.ID() != PluginId99 {
    log.Errorf("Blocker plugin[%s] failed. Propagating failure to dependent plugin[%s]", pluginBlocker, p.Name())
    return fmt.Errorf("blocker plugin %s failed, stopping execution of dependent plugin %s", pluginBlocker, p.Name())
}
```

**Failure propagation chain**:
- 05 (upgrade) → blocks → 10 (kube-conformance)
- 10 (kube-conformance) → blocks → 20 (conformance-validated)
- 20 (conformance-validated) → blocks → 80 (replay)
- 80 (replay) → blocks → 99 (artifacts-collector)

### 2. Kubernetes Conformance Test Extraction (entrypoint-tests.sh)

**Problem**: OCP 4.20+ removed the `kubernetes/conformance` suite from
openshift-tests.

**Solution**: Added logic to consume extracted test list from init
container:
- Checks for `/tmp/shared/k8s-conformance-tests.list` (created by init
container in opct repo)
- Uses extracted tests if available and non-empty
- Falls back to default suite if extraction fails or file is missing

**Code change**
(`openshift-tests-plugin/plugin/entrypoint-tests.sh:59-76`):
```bash
# Check if we have extracted k8s conformance tests from OTE
K8S_CONFORMANCE_LIST="/tmp/shared/k8s-conformance-tests.list"
if [[ "${PLUGIN_NAME:-}" == "openshift-kube-conformance" ]] && [[ -f "${K8S_CONFORMANCE_LIST}" ]]; then
    TEST_COUNT=$(wc -l < "${K8S_CONFORMANCE_LIST}")
    if [[ $TEST_COUNT -gt 0 ]]; then
        echo "Using extracted Kubernetes conformance tests from OTE (${TEST_COUNT} tests)"
        cp "${K8S_CONFORMANCE_LIST}" "${CTRL_SUITE_LIST}"
        echo "Tests extracted from k8s-tests-ext binary" > ${CTRL_SUITE_LIST}.log
    else
        # Fallback to default suite
    fi
fi
```

## Benefits

1. **Improved CI efficiency**: Stops wasted execution time on dependent
tests
2. **Clearer failure signals**: Failed prerequisites immediately
propagate
3. **OCP 4.20+ support**: Works with extracted conformance tests
4. **Backward compatible**: Falls back gracefully on older versions
5. **Cleaner code**: Uses plugin ID instead of name/alias checks

## Related PRs

- OPCT PR: redhat-openshift-ecosystem/opct#183
  - Adds init container to extract k8s conformance tests from OTE

## Testing

Validated with CI rehearsal jobs on OCP 4.20+ clusters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
mtulio and others added 3 commits December 5, 2025 15:53
Added init container to dynamically extract Kubernetes conformance tests
from the OTE k8s-tests-ext binary to support OCP 4.20+ where the
kubernetes/conformance suite was removed from openshift-tests.

Changes:
- Added extract-k8s-conformance-tests init container
- Extracts k8s-tests-ext.gz from hyperkube image
- Filters conformance tests using jq or grep as fallback
- Saves test list to /tmp/shared/k8s-conformance-tests.list
- Tests container uses extracted list via entrypoint-tests.sh
- Gracefully falls back to default suite if extraction fails

This ensures the kubernetes/conformance suite continues to work on
OCP 4.20+ clusters while maintaining backward compatibility with
earlier versions.

Related: redhat-openshift-ecosystem/provider-certification-plugins#82

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fixed linter issues found in CI checks:

**YAML linting (data/templates/plugins/openshift-kube-conformance.yaml)**:
- Fixed line-length error on line 55 (93 > 80 characters)
- Split echo message across two lines for better readability

**Go linting (pkg/client/client_test.go)**:
- Fixed errcheck warnings by handling return values from os.Setenv/os.Unsetenv
- Used blank identifier (_) to explicitly ignore errors in test setup/cleanup
- These are intentional in test contexts where environment setup failures
  would be caught by subsequent test assertions

Validation:
- ✅ yamllint passes for openshift-kube-conformance.yaml
- ✅ make test - all tests passing
- ✅ make vet - no issues found
- ✅ go test ./pkg/client - all client tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 5, 2025

New changes are detected. LGTM label has been removed.

@mtulio
Copy link
Collaborator Author

mtulio commented Dec 5, 2025

/lgtm /approve

rebased after conflict, merging after approvals.

@mtulio mtulio merged commit 2be4d9c into redhat-openshift-ecosystem:main Dec 5, 2025
16 of 17 checks passed
@mtulio mtulio deleted the spike-420-issues branch December 5, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants