-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OTA-1580: Further tests for oc adm upgrade status
#30109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1580: Further tests for oc adm upgrade status
#30109
Conversation
Some tests will want to walk the snapshots in a timewise order so it is more practical to maintain them in a slice.
The output with most information is more useful that the one with less.
…tputs This is more complicated than I wanted but here we are. It is something like a recursive descent parser of `oc adm upgrade status` outputs, parsing into a programmatic model that the tests can interact with (similar to how page objects work in web application testing)
This adds a test that for each successfuly collected `oc adm upgrade status` output builds the programmatic model ("page objedct"). This serves as a basic layout test (anything that cannot be parsed into a model is likely a bad output) and also a foundation for further tests that can use the model as a basis for their checks instead of depending on the textual output.
Checks the control plane section using the programmatic model.
Checks the worker section using the programmatic model.
Checks the health section using the programmatic model.
Add a test that the reported cluter update state is consistent over all snapshot and goes through expected update stages
|
@petr-muller: This pull request references OTA-1580 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Further tests for oc adm upgrade statusoc adm upgrade status
|
/cc @wking @hongkailiu |
|
/test ? |
|
@petr-muller: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test e2e-azure-ovn-upgrade |
`oc adm upgrade status` emits operators with linebreaks in messages in a poor way which we can tolerate for now but will fix in the future
Fixed a typo in a condition, for "nodes are not updated" we need to test `!cp.NodesUpdated`
MCO churn sometimes briefly tricks our code into thinking the cluster is updating, we need to tolerate for now
hongkailiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still lots of details I need to catch up and understand.
But please do not block on my review comments: they are mainly questions I collected while reading the code.
| var total int | ||
| for when, observed := range w.ocAdmUpgradeStatus { | ||
| for _, snap := range w.ocAdmUpgradeStatus { | ||
| total++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
(this is code introduced by me:)
We can use len(w.ocAdmUpgradeStatus) (which works for both map and slice) instead of counting the elements.
| "strings" | ||
| ) | ||
|
|
||
| type ControlPlaneStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
ControlPlaneStatus, WorkersStatus, and Health could be private and are unlikely to be used out of the admupgradestatus pkg.
| var getMessage func() (string, error) | ||
| if strings.HasPrefix(line, "Message: ") { | ||
| getMessage = p.parseHealthMessage | ||
| health.Detailed = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt health.Detailed true all the time?
Because the status cmd we execute in the test is with --details=all.
| } | ||
|
|
||
| if total == 0 { | ||
| noFailures.SkipMessage = &junitapi.SkipMessage{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we Fail or Skip here?
Do we have a case in CI that justifies total==0?
|
|
||
| // Zero failures is too strict for at least SNO clusters | ||
| p := (len(failures) / total) * 100 | ||
| p := (float32(len(failures)) / float32(total)) * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. My bad.
| if err != nil { | ||
| return false, fmt.Errorf("failed to get cluster version: %w", err) | ||
| } | ||
| return len(cv.Status.History) > len(w.initialClusterVersion.Status.History), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on cv.Status.History (when the collection is done) to tell if the test is an upgrade test.
If an upgrade test failed to refresh cv.Status.History for any reason, the testing result might be misleading.
I understand that my way might be even worse.
28dc69b#diff-840f994ffd52dd53189c8e78b470a8c93a1d6a7cbaf7eac9a5e83c5e16deec7cR74-R77
Ideally, the framework should tell us if a test is doing a cluster upgrade or not.
| // and we do not need to skip | ||
| expectedLayout.SkipMessage = nil | ||
|
|
||
| if observed.out == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this should cause the parser to error out and then be falling into the observed.err != nil case.
|
Job Failure Risk Analysis for sha: 5db0df8
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 5db0df8
|
wking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Job analysis has everything passing 100%.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In case any of the failed jobs are blockers: /retest-required |
|
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
2557f36
into
openshift:main
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-tests |
This is a cleaned up content of #30100 which it supersedes.
The PR adds a programmatic model of
oc adm upgrade statusoutputs, based onsomething that vaguely resembles a recursive descent parser. The model itself
is somewhat similar to the page object pattern used in web application testing:
instead of tests checking their stuff over the raw output, they interact over
a programmatic model of the output.
Every successfully captured output shapshot is parsed into a programmatic model,
which itself serves as a test (any valid output should be possible to model).
These models are then checked by four new tests: