OTA-1580: Further tests for `oc adm upgrade status` #30109

petr-muller · 2025-08-13T14:31:07Z

This is a cleaned up content of #30100 which it supersedes.

The PR adds a programmatic model of oc adm upgrade status outputs, based on
something that vaguely resembles a recursive descent parser. The model itself
is somewhat similar to the page object pattern used in web application testing:
instead of tests checking their stuff over the raw output, they interact over
a programmatic model of the output.

Every successfully captured output shapshot is parsed into a programmatic model,
which itself serves as a test (any valid output should be possible to model).

These models are then checked by four new tests:

Test for control plane section content
Test for worker section content
Test for health section content
Test for consistent update lifecycle reporting over time

Some tests will want to walk the snapshots in a timewise order so it is more practical to maintain them in a slice.

The output with most information is more useful that the one with less.

…tputs This is more complicated than I wanted but here we are. It is something like a recursive descent parser of `oc adm upgrade status` outputs, parsing into a programmatic model that the tests can interact with (similar to how page objects work in web application testing)

This adds a test that for each successfuly collected `oc adm upgrade status` output builds the programmatic model ("page objedct"). This serves as a basic layout test (anything that cannot be parsed into a model is likely a bad output) and also a foundation for further tests that can use the model as a basis for their checks instead of depending on the textual output.

Checks the control plane section using the programmatic model.

Checks the worker section using the programmatic model.

Checks the health section using the programmatic model.

Add a test that the reported cluter update state is consistent over all snapshot and goes through expected update stages

openshift-ci-robot · 2025-08-13T14:31:11Z

@petr-muller: This pull request references OTA-1580 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set.

Details

In response to this:

This is a cleaned up content of #30100 which it supersedes.

The PR adds a programmatic model of oc adm upgrade status outputs, based on
something that vaguely resembles a recursive descent parser. The model itself
is somewhat similar to the page object pattern used in web application testing:
instead of tests checking their stuff over the raw output, they interact over
a programmatic model of the output.

Every successfully captured output shapshot is parsed into a programmatic model,
which itself serves as a test (any valid output should be possible to model).

These models are then checked by four new tests:

Test for control plane section content

Test for worker section content

Test for health section content

Test for consistent update lifecycle reporting over time

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

petr-muller · 2025-08-13T14:32:59Z

/cc @wking @hongkailiu

…arning

petr-muller · 2025-08-13T18:54:20Z

/test ?

openshift-ci · 2025-08-13T18:54:36Z

@petr-muller: The following commands are available to trigger required jobs:

/test e2e-aws-jenkins

/test e2e-aws-ovn-fips

/test e2e-aws-ovn-image-registry

/test e2e-aws-ovn-microshift

/test e2e-aws-ovn-microshift-serial

/test e2e-aws-ovn-serial-1of2

/test e2e-aws-ovn-serial-2of2

/test e2e-gcp-ovn

/test e2e-gcp-ovn-builds

/test e2e-gcp-ovn-image-ecosystem

/test e2e-gcp-ovn-upgrade

/test e2e-metal-ipi-ovn-ipv6

/test e2e-vsphere-ovn

/test e2e-vsphere-ovn-upi

/test images

/test lint

/test okd-scos-images

/test unit

/test verify

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-ovn-cmd

/test e2e-aws-csi

/test e2e-aws-disruptive

/test e2e-aws-etcd-certrotation

/test e2e-aws-etcd-recovery

/test e2e-aws-ovn

/test e2e-aws-ovn-cgroupsv2

/test e2e-aws-ovn-edge-zones

/test e2e-aws-ovn-etcd-scaling

/test e2e-aws-ovn-kube-apiserver-rollout

/test e2e-aws-ovn-kubevirt

/test e2e-aws-ovn-serial-ipsec

/test e2e-aws-ovn-serial-publicnet-1of2

/test e2e-aws-ovn-serial-publicnet-2of2

/test e2e-aws-ovn-single-node

/test e2e-aws-ovn-single-node-serial

/test e2e-aws-ovn-single-node-techpreview

/test e2e-aws-ovn-single-node-techpreview-serial

/test e2e-aws-ovn-single-node-upgrade

/test e2e-aws-ovn-upgrade

/test e2e-aws-ovn-upgrade-rollback

/test e2e-aws-ovn-upi

/test e2e-aws-proxy

/test e2e-azure

/test e2e-azure-ovn-etcd-scaling

/test e2e-azure-ovn-upgrade

/test e2e-baremetalds-kubevirt

/test e2e-external-aws

/test e2e-external-aws-ccm

/test e2e-external-vsphere-ccm

/test e2e-gcp-csi

/test e2e-gcp-disruptive

/test e2e-gcp-fips-serial-1of2

/test e2e-gcp-fips-serial-2of2

/test e2e-gcp-ovn-etcd-scaling

/test e2e-gcp-ovn-rt-upgrade

/test e2e-gcp-ovn-techpreview

/test e2e-gcp-ovn-techpreview-serial-1of2

/test e2e-gcp-ovn-techpreview-serial-2of2

/test e2e-gcp-ovn-usernamespace

/test e2e-hypershift-conformance

/test e2e-metal-ipi-ovn

/test e2e-metal-ipi-ovn-bgp-virt-dualstack

/test e2e-metal-ipi-ovn-bgp-virt-dualstack-techpreview

/test e2e-metal-ipi-ovn-dualstack

/test e2e-metal-ipi-ovn-dualstack-bgp

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

/test e2e-metal-ipi-ovn-dualstack-local-gateway

/test e2e-metal-ipi-ovn-kube-apiserver-rollout

/test e2e-metal-ipi-serial-1of2

/test e2e-metal-ipi-serial-2of2

/test e2e-metal-ipi-serial-ovn-ipv6-1of2

/test e2e-metal-ipi-serial-ovn-ipv6-2of2

/test e2e-metal-ipi-virtualmedia

/test e2e-metal-ovn-single-node-live-iso

/test e2e-metal-ovn-single-node-with-worker-live-iso

/test e2e-metal-ovn-two-node-arbiter

/test e2e-metal-ovn-two-node-fencing

/test e2e-openstack-ovn

/test e2e-openstack-serial

/test e2e-vsphere-ovn-dualstack-primaryv6

/test e2e-vsphere-ovn-etcd-scaling

/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd

pull-ci-openshift-origin-main-e2e-aws-csi

pull-ci-openshift-origin-main-e2e-aws-disruptive

pull-ci-openshift-origin-main-e2e-aws-ovn

pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2

pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones

pull-ci-openshift-origin-main-e2e-aws-ovn-fips

pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout

pull-ci-openshift-origin-main-e2e-aws-ovn-microshift

pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial

pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2

pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade

pull-ci-openshift-origin-main-e2e-aws-ovn-upgrade

pull-ci-openshift-origin-main-e2e-aws-proxy

pull-ci-openshift-origin-main-e2e-azure

pull-ci-openshift-origin-main-e2e-gcp-csi

pull-ci-openshift-origin-main-e2e-gcp-ovn

pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade

pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview

pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-1of2

pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2

pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade

pull-ci-openshift-origin-main-e2e-hypershift-conformance

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-local-gateway

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout

pull-ci-openshift-origin-main-e2e-metal-ipi-serial-1of2

pull-ci-openshift-origin-main-e2e-metal-ipi-serial-2of2

pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6-1of2

pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6-2of2

pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia

pull-ci-openshift-origin-main-e2e-openstack-ovn

pull-ci-openshift-origin-main-e2e-vsphere-ovn

pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi

pull-ci-openshift-origin-main-images

pull-ci-openshift-origin-main-lint

pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn

pull-ci-openshift-origin-main-okd-scos-images

pull-ci-openshift-origin-main-unit

pull-ci-openshift-origin-main-verify

pull-ci-openshift-origin-main-verify-deps

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

petr-muller · 2025-08-13T19:31:38Z

/test e2e-azure-ovn-upgrade

`oc adm upgrade status` emits operators with linebreaks in messages in a poor way which we can tolerate for now but will fix in the future

Fixed a typo in a condition, for "nodes are not updated" we need to test `!cp.NodesUpdated`

MCO churn sometimes briefly tricks our code into thinking the cluster is updating, we need to tolerate for now

hongkailiu

There are still lots of details I need to catch up and understand.
But please do not block on my review comments: they are mainly questions I collected while reading the code.

hongkailiu · 2025-08-13T20:44:06Z

pkg/monitortests/cli/adm_upgrade/status/monitortest.go

 	var total int
-	for when, observed := range w.ocAdmUpgradeStatus {
+	for _, snap := range w.ocAdmUpgradeStatus {
 		total++


nit:
(this is code introduced by me:)
We can use len(w.ocAdmUpgradeStatus) (which works for both map and slice) instead of counting the elements.

hongkailiu · 2025-08-13T21:28:41Z

pkg/monitortests/cli/adm_upgrade/status/outputmodel.go

+	"strings"
+)
+
+type ControlPlaneStatus struct {


nit:
ControlPlaneStatus, WorkersStatus, and Health could be private and are unlikely to be used out of the admupgradestatus pkg.

hongkailiu · 2025-08-13T21:57:02Z

pkg/monitortests/cli/adm_upgrade/status/outputmodel.go

+	var getMessage func() (string, error)
+	if strings.HasPrefix(line, "Message: ") {
+		getMessage = p.parseHealthMessage
+		health.Detailed = true


Isnt health.Detailed true all the time?
Because the status cmd we execute in the test is with --details=all.

hongkailiu · 2025-08-14T01:07:33Z

pkg/monitortests/cli/adm_upgrade/status/monitortest.go

 	}

+	if total == 0 {
+		noFailures.SkipMessage = &junitapi.SkipMessage{


Should we Fail or Skip here?
Do we have a case in CI that justifies total==0?

hongkailiu · 2025-08-14T01:08:16Z

pkg/monitortests/cli/adm_upgrade/status/monitortest.go

+
 	// Zero failures is too strict for at least SNO clusters
-	p := (len(failures) / total) * 100
+	p := (float32(len(failures)) / float32(total)) * 100


Thanks for catching this. My bad.

hongkailiu · 2025-08-14T01:28:52Z

pkg/monitortests/cli/adm_upgrade/status/monitortest.go

+		if err != nil {
+			return false, fmt.Errorf("failed to get cluster version: %w", err)
 		}
+		return len(cv.Status.History) > len(w.initialClusterVersion.Status.History), nil


It depends on cv.Status.History (when the collection is done) to tell if the test is an upgrade test.
If an upgrade test failed to refresh cv.Status.History for any reason, the testing result might be misleading.

I understand that my way might be even worse.
28dc69b#diff-840f994ffd52dd53189c8e78b470a8c93a1d6a7cbaf7eac9a5e83c5e16deec7cR74-R77

Ideally, the framework should tell us if a test is doing a cluster upgrade or not.

hongkailiu · 2025-08-14T01:37:43Z

pkg/monitortests/cli/adm_upgrade/status/monitortest.go

+		// and we do not need to skip
+		expectedLayout.SkipMessage = nil
+
+		if observed.out == "" {


I feel this should cause the parser to error out and then be falling into the observed.err != nil case.

openshift-trt · 2025-08-14T04:17:57Z

Job Failure Risk Analysis for sha: 5db0df8

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2	Medium [sig-instrumentation] Metrics should grab all metrics from kubelet /metrics/resource endpoint [Suite:openshift/conformance/parallel] [Suite:k8s] This test has passed 96.52% of 2071 runs on release 4.20 [Overall] in the last week. Open Bugs e2e-aws-ovn-edge-zones is unstable Kubelet metrics endpoint test regressed
pull-ci-openshift-origin-main-e2e-hypershift-conformance	Medium [sig-sippy] infrastructure should work This test has passed 87.53% of 3738 runs on release 4.20 [Overall] in the last week.

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 5db0df8

"[sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status control plane section is consistent" [Total: 33, Pass: 33, Fail: 0, Flake: 0]
"[sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status health section is consistent" [Total: 33, Pass: 33, Fail: 0, Flake: 0]
"[sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status output has expected layout" [Total: 33, Pass: 33, Fail: 0, Flake: 0]
"[sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status snapshots reflect the cluster upgrade lifecycle" [Total: 33, Pass: 33, Fail: 0, Flake: 0]
"[sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status workers section is consistent" [Total: 33, Pass: 33, Fail: 0, Flake: 0]

wking

Job analysis has everything passing 100%.

/lgtm

openshift-ci · 2025-08-14T05:27:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/monitortests/cli/adm_upgrade/OWNERS~~ [petr-muller,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2025-08-14T05:28:25Z

In case any of the failed jobs are blockers:

/retest-required

openshift-ci · 2025-08-14T08:12:47Z

@petr-muller: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2	`5db0df8`	link	false	`/test e2e-gcp-ovn-techpreview-serial-2of2`
ci/prow/okd-scos-e2e-aws-ovn	`5db0df8`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-gcp-ovn-techpreview	`5db0df8`	link	false	`/test e2e-gcp-ovn-techpreview`
ci/prow/e2e-aws-ovn-edge-zones	`5db0df8`	link	false	`/test e2e-aws-ovn-edge-zones`
ci/prow/e2e-aws-ovn-single-node	`5db0df8`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-hypershift-conformance	`5db0df8`	link	false	`/test e2e-hypershift-conformance`
ci/prow/e2e-metal-ipi-ovn-dualstack	`5db0df8`	link	false	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-metal-ipi-virtualmedia	`5db0df8`	link	false	`/test e2e-metal-ipi-virtualmedia`
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway	`5db0df8`	link	false	`/test e2e-metal-ipi-ovn-dualstack-local-gateway`
ci/prow/e2e-aws-ovn-cgroupsv2	`5db0df8`	link	false	`/test e2e-aws-ovn-cgroupsv2`
ci/prow/e2e-aws-disruptive	`5db0df8`	link	false	`/test e2e-aws-disruptive`
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout	`5db0df8`	link	false	`/test e2e-metal-ipi-ovn-kube-apiserver-rollout`
ci/prow/e2e-aws-ovn-single-node-upgrade	`5db0df8`	link	false	`/test e2e-aws-ovn-single-node-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2025-08-14T11:03:31Z

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.20.0-202508140915.p0.g2557f36.assembly.stream.el9.
All builds following this will include this PR.

hongkailiu and others added 10 commits August 13, 2025 13:21

Refactor the noFailures case

0231e7b

upgrade status CLI monitortest: use slice instead of map

bd0e775

Some tests will want to walk the snapshots in a timewise order so it is more practical to maintain them in a slice.

upgrade status CLI monitortest: snapshot --details=all

87f9fc9

The output with most information is more useful that the one with less.

upgrade status CLI monitortest: control plane section test

2b11ffb

Checks the control plane section using the programmatic model.

upgrade status CLI monitortest: worker section test

cc074d6

Checks the worker section using the programmatic model.

upgrade status CLI monitortest: health section test

0e82fae

Checks the health section using the programmatic model.

upgrade status CLI monitortest: save initial ClusterVersion

41649ba

upgrade status CLI monitortest: add update lifecycle check

461503c

Add a test that the reported cluter update state is consistent over all snapshot and goes through expected update stages

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 13, 2025

petr-muller mentioned this pull request Aug 13, 2025

WIP: OTA-1580: Further tests for oc adm upgrade status' #30100

Closed

petr-muller changed the title ~~OTA-1580: Further tests for oc adm upgrade status~~ OTA-1580: Further tests for oc adm upgrade status Aug 13, 2025

openshift-ci bot requested review from PratikMahajan, hongkailiu and wking August 13, 2025 14:33

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2025

petr-muller added 11 commits August 13, 2025 18:03

upgrade status CLI monitortest: test parsing outputs with no alerts w…

218f6f4

…arning

upgrade status CLI monitortest: test noFailures check

af44db0

upgrade status CLI monitortest: fix noFailures check

f496379

upgrade status CLI monitortest: test expectedLayout check

221e31b

upgrade status CLI monitortest: test controlPlane check

fe67027

upgrade status CLI monitortest: fixes in controlPlane check

36d4917

upgrade status CLI monitortest: fix naming conflicts

cc663f6

upgrade status CLI monitortest: test workers check

1c080bc

upgrade status CLI monitortest: fix workers check

53dc37a

upgrade status CLI monitortest: test health check

a26c815

upgrade status CLI monitortest: fix health check

4e2c9a4

petr-muller added 3 commits August 13, 2025 20:52

upgrade status CLI monitortest: make updateLifecycle better testable

9bd4080

upgrade status CLI monitortest: test updateLifecycle check

4b4ff4d

upgrade status CLI monitortest: fix updateLifecycle check

a20d7f1

petr-muller added 3 commits August 14, 2025 00:51

upgrade status CLI monitortest: relax controlPlane check

3f71eec

`oc adm upgrade status` emits operators with linebreaks in messages in a poor way which we can tolerate for now but will fix in the future

upgrade status CLI monitortest: fix controlPlane check

76fcc37

Fixed a typo in a condition, for "nodes are not updated" we need to test `!cp.NodesUpdated`

upgrade status CLI monitortest: relax updateLifecycle check

5db0df8

MCO churn sometimes briefly tricks our code into thinking the cluster is updating, we need to tolerate for now

hongkailiu reviewed Aug 14, 2025

View reviewed changes

wking approved these changes Aug 14, 2025

View reviewed changes

openshift-ci bot assigned wking Aug 14, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2025

openshift-merge-bot bot merged commit 2557f36 into openshift:main Aug 14, 2025
34 of 47 checks passed

petr-muller deleted the ota-1580-03-all-tests branch August 14, 2025 16:32

OTA-1580: Further tests for oc adm upgrade status #30109

OTA-1580: Further tests for oc adm upgrade status #30109

Uh oh!

Conversation

petr-muller commented Aug 13, 2025

Uh oh!

openshift-ci-robot commented Aug 13, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petr-muller commented Aug 13, 2025

Uh oh!

petr-muller commented Aug 13, 2025

Uh oh!

openshift-ci bot commented Aug 13, 2025

Uh oh!

petr-muller commented Aug 13, 2025

Uh oh!

hongkailiu left a comment

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

hongkailiu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-trt bot commented Aug 14, 2025

Uh oh!

wking left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Aug 14, 2025

Uh oh!

wking commented Aug 14, 2025

Uh oh!

openshift-ci bot commented Aug 14, 2025

Uh oh!

Uh oh!

openshift-bot commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

OTA-1580: Further tests for `oc adm upgrade status` #30109

OTA-1580: Further tests for `oc adm upgrade status` #30109

openshift-ci-robot commented Aug 13, 2025 •

edited by openshift-ci bot

Loading