Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 28, 2022

We don't support rollbacks of any kind, let alone minor-version rollbacks. 4.(y-1) -> 4.y -> 4.(y-1) rollback jobs are failing across the board for 4.10 and before. Root-causing and potentially fixing the failures might be interesting, but because the behavior is not supported, and we have limited time to investigate and fix, just drop the jobs. If, in the future, we gain more time for investigation, we can restore these jobs. Details in the commit message with links to example runs for each job I'm dropping.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2022
@openshift-ci openshift-ci bot requested review from dgoodwin and xueqzhan February 28, 2022 20:43
In 4.11, [1], [2], and [3] all have recent passes, so I'm leaving them in.

In 4.10, [4] and [5] have recent passes, so I'm leaving them in.

Checking [6], both [7] and [8] update from 4.9 to 4.10 and start
heading back towards 4.9, but they hang a control-plane node on drain.
Same for the OVN flavor [9,10,11].  Since we don't support minor
rollbacks, or really rollbacks of any sort [12], I'm dropping these
jobs instead of root-causing the hang.

In 4.9, [13] and [14] have recent passes, so I'm leaving them in.  We
already dropped the other 4.8 -> 4.9 -> 4.8 rollback jobs back in
b3d04e5 (ci-operator/config/openshift/release: Drop 4.8 -> 4.9 ->
4.8 rollback jobs, 2021-09-27, openshift#22287).

In 4.8, [15] and [16] have recent passes, so I'm leaving them in.

4.7 -> 4.8 -> 4.7 rollback tests timeout [17,18,19,20,21,22], without
the pretty e2e-interval chart to make identifying the stuck thing
easier.  But again, not supported, so dropping instead of sinking time
into root-causing.

On 4.7, [23] and [24] have recent passes, so I'm leaving them in.

4.6 -> 4.7 -> 4.6 rollback tests timeout [25,26], so dropping them.

On 4.6, [27] has recent passes, so I'm leaving it in.

4.5 -> 4.6 -> 4.7 rollback tests timeout [28,29], failed to build, but
I've been dropping all the 4.y minor rollback jobs since 4.10, so
keeping these around to see if subsequent runs will build and pass
seems unlikely to be worth the effort.  Dropping them too.

4.5 is end-of-life [30], so I'm dropping 4.4 -> 4.5 -> 4.4 rollback
jobs without even looking to see if they're passing.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-e2e-aws-upgrade-rollback
[2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback
[3]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback
[4]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-e2e-aws-upgrade-rollback
[5]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade-rollback-oldest-supported
[6]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade-rollback
[7]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade-rollback/1497288333930270720
[8]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade-rollback/1498013325101895680
[9]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback
[10]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback/1497671569042837504
[11]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback/1498033961199210496
[12]: https://github.com/openshift/openshift-docs/blame/d4762f0f626a4dddb9d7330e63a3bb6cb73f5bb5/modules/update-upgrading-cli.adoc#L160-L162
[13]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-rollback
[14]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade-rollback-oldest-supported
[15]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-rollback
[16]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade-rollback-oldest-supported
[17]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback
[18]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1496996649593999360
[19]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1497721649917595648
[20]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade-rollback
[21]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade-rollback/1497620733604401152
[22]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade-rollback/1497983125802717184
[23]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-ci-4.7-e2e-aws-upgrade-rollback
[24]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-upgrade-rollback-oldest-supported
[25]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-upgrade-rollback
[26]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-upgrade-rollback/1497650430434349056
[27]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#periodic-ci-openshift-release-master-ci-4.6-e2e-aws-upgrade-rollback
[28]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-upgrade-rollback
[29]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.6-upgrade-from-stable-4.5-e2e-aws-upgrade-rollback/1494388672508727296
[30]: https://access.redhat.com/support/policy/updates/openshift#dates
@wking wking force-pushed the drop-failing-rollback-jobs branch from 0a658b5 to 2d73374 Compare February 28, 2022 20:52
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 568cc73 into openshift:master Mar 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@wking: Updated the following 2 configmaps:

  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master__ci-4.10-upgrade-from-stable-4.9.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.10-upgrade-from-stable-4.9.yaml
    • key openshift-release-master__ci-4.5-upgrade-from-stable-4.4.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.5-upgrade-from-stable-4.4.yaml
    • key openshift-release-master__ci-4.6-upgrade-from-stable-4.5.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.6-upgrade-from-stable-4.5.yaml
    • key openshift-release-master__ci-4.7-upgrade-from-stable-4.6.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.7-upgrade-from-stable-4.6.yaml
    • key openshift-release-master__ci-4.8-upgrade-from-stable-4.7.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.8-upgrade-from-stable-4.7.yaml
  • job-config-master-periodics configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-master-periodics.yaml
Details

In response to this:

We don't support rollbacks of any kind, let alone minor-version rollbacks. 4.(y-1) -> 4.y -> 4.(y-1) rollback jobs are failing across the board for 4.10 and before. Root-causing and potentially fixing the failures might be interesting, but because the behavior is not supported, and we have limited time to investigate and fix, just drop the jobs. If, in the future, we gain more time for investigation, we can restore these jobs. Details in the commit message with links to example runs for each job I'm dropping.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the drop-failing-rollback-jobs branch March 3, 2022 05:30
wking added a commit to wking/openshift-release that referenced this pull request Oct 11, 2022
…1-upgrade-from-stable-4.10: Drop failing rollback jobs

Like 2d73374 (origin/pr/26629)
ci-operator/config/openshift/release: Drop failing minor rollback
tests, 2022-02-28, openshift#26629), but for the 4.10-to-4.11-to-4.10
rollbacks.  This time both the OVN and SDN rollback jobs are
perma-failing [1,2], and in both cases the issue is sticking on [3,4]:

  INFO: cluster upgrade is Progressing: Working towards 4.10.35: 614 of 773 done (79% complete), waiting on openshift-controller-manager

with that operator crash-looping on [5,6]:

  F1010 09:51:56.918590       1 cmd.go:138] open /var/run/configmaps/config/config.yaml: permission denied

I haven't dug in more deeply to try to understand that failure, but as
2d73374 points out:

> Since we don't support minor rollbacks, or really rollbacks of any
> sort [12], I'm dropping these jobs instead of root-causing the hang.
> ...
> [12]: https://github.com/openshift/openshift-docs/blame/d4762f0f626a4dddb9d7330e63a3bb6cb73f5bb5/modules/update-upgrading-cli.adoc#L160-L162

Since then, those docs have moved to [7], but the lack of rollback
support still stands.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback
[2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback/1579338022623645696
[4]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback/1578454440359235584
[5]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback/1579338022623645696/artifacts/e2e-aws-ovn-upgrade-rollback/gather-extra/artifacts/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-7fbc8cc67d-zbrv4_openshift-controller-manager-operator_previous.log
[6]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback/1578454440359235584/artifacts/e2e-aws-upgrade-rollback/gather-extra/artifacts/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-7fbc8cc67d-s5pwz_openshift-controller-manager-operator_previous.log
[7]: https://github.com/openshift/openshift-docs/blob/7f87267bc69d65abd96e6b783100195c6b78549f/updating/updating-troubleshooting.adoc
openshift-merge-robot pushed a commit that referenced this pull request Oct 11, 2022
…1-upgrade-from-stable-4.10: Drop failing rollback jobs (#33005)

Like 2d73374 (origin/pr/26629)
ci-operator/config/openshift/release: Drop failing minor rollback
tests, 2022-02-28, #26629), but for the 4.10-to-4.11-to-4.10
rollbacks.  This time both the OVN and SDN rollback jobs are
perma-failing [1,2], and in both cases the issue is sticking on [3,4]:

  INFO: cluster upgrade is Progressing: Working towards 4.10.35: 614 of 773 done (79% complete), waiting on openshift-controller-manager

with that operator crash-looping on [5,6]:

  F1010 09:51:56.918590       1 cmd.go:138] open /var/run/configmaps/config/config.yaml: permission denied

I haven't dug in more deeply to try to understand that failure, but as
2d73374 points out:

> Since we don't support minor rollbacks, or really rollbacks of any
> sort [12], I'm dropping these jobs instead of root-causing the hang.
> ...
> [12]: https://github.com/openshift/openshift-docs/blame/d4762f0f626a4dddb9d7330e63a3bb6cb73f5bb5/modules/update-upgrading-cli.adoc#L160-L162

Since then, those docs have moved to [7], but the lack of rollback
support still stands.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback
[2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback/1579338022623645696
[4]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback/1578454440359235584
[5]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade-rollback/1579338022623645696/artifacts/e2e-aws-ovn-upgrade-rollback/gather-extra/artifacts/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-7fbc8cc67d-zbrv4_openshift-controller-manager-operator_previous.log
[6]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade-rollback/1578454440359235584/artifacts/e2e-aws-upgrade-rollback/gather-extra/artifacts/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-7fbc8cc67d-s5pwz_openshift-controller-manager-operator_previous.log
[7]: https://github.com/openshift/openshift-docs/blob/7f87267bc69d65abd96e6b783100195c6b78549f/updating/updating-troubleshooting.adoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants