Skip to content

HOSTEDCP-966: require hypershift e2e for operators in control plane#39641

Closed
sjenning wants to merge 1 commit intoopenshift:masterfrom
sjenning:require-hypershift-e2e
Closed

HOSTEDCP-966: require hypershift e2e for operators in control plane#39641
sjenning wants to merge 1 commit intoopenshift:masterfrom
sjenning:require-hypershift-e2e

Conversation

@sjenning
Copy link
Contributor

@sjenning sjenning commented May 23, 2023

Hypershift e2e is now blocking on ci release stream.

We should block breaking changes from merging for operators/components running in the Hypershift control plane.

@sjenning sjenning changed the title require hypershift e2e for operators in control plane HOSTEDCP-966: require hypershift e2e for operators in control plane May 23, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 23, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 23, 2023

@sjenning: This pull request references HOSTEDCP-966 which is a valid jira issue.

Details

In response to this:

hypershift e2e is now blocking on ci release stream

We should block breaking changes from merging and blocking release promotion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@sjenning: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-machine-config-operator-master-e2e-hypershift openshift/machine-config-operator presubmit Presubmit changed
pull-ci-openshift-machine-config-operator-release-4.14-e2e-hypershift openshift/machine-config-operator presubmit Presubmit changed
pull-ci-openshift-machine-config-operator-release-4.15-e2e-hypershift openshift/machine-config-operator presubmit Presubmit changed
pull-ci-openshift-etcd-openshift-4.14-e2e-hypershift openshift/etcd presubmit Presubmit changed
pull-ci-openshift-etcd-openshift-4.15-e2e-hypershift openshift/etcd presubmit Presubmit changed
pull-ci-openshift-cluster-storage-operator-master-e2e-hypershift-ovn-conformance openshift/cluster-storage-operator presubmit Presubmit changed
pull-ci-openshift-cluster-storage-operator-release-4.14-e2e-hypershift-ovn-conformance openshift/cluster-storage-operator presubmit Presubmit changed
pull-ci-openshift-cluster-storage-operator-release-4.15-e2e-hypershift-ovn-conformance openshift/cluster-storage-operator presubmit Presubmit changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 10 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 20 rehearsals
Comment: /pj-rehearse max to run up to 35 rehearsals
Comment: /pj-rehearse auto-ack to run up to 10 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci openshift-ci bot requested review from dobsonj and dusk125 May 23, 2023 16:09
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 23, 2023

@sjenning: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 23, 2023

@sjenning: This pull request references HOSTEDCP-966 which is a valid jira issue.

Details

In response to this:

Hypershift e2e is now blocking on ci release stream.

We should block breaking changes from merging for operators/components running in the Hypershift control plane.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Member

enxebre commented May 24, 2023

/lgtm
/approve
/hold
until this is solved openshift/cluster-ingress-operator#930 (comment)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 24, 2023
@sjenning
Copy link
Contributor Author

cc @deads2k @cgwalters @jsafrane

@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented May 24, 2023

I think it makes sense from an MCO perspective, the main concern being how stable this would be.

Looking at the recent runs: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-machine-config-operator-master-e2e-hypershift

Seems about 50/50? I guess we can retest until passing, but it may be hard to determine whether a test failure is due to a flake (looks like it would happen a lot atm) vs actually failing.

Is the MCO tests reflective of the general pass rate? Or are we an outlier? Would there be plans to get the e2e test to say, 80% or 90% success rate? Then I think we would be a bit more confident in the signal and having it required

Alternatively, I am also fine with merging this, but I am concerned we will flake a lot and end up having to override if it becomes more noisy/brittle, which would defeat the purpose of making it required. Would the Hypershift team own and help us debug this test?

@jsafrane
Copy link
Contributor

The hypershift job in cluster-storage-operator succeeded only once in past 3 months. IMO that does not qualify it to be blocking.
In addition, the job output makes it very hard to see what went wrong - they mostly end with "some cluster operators are not available" without any indication which one (e.g. this link). We would need to check artifacts to see that it's Ingress and therefore we can override. Can you add some output to the test that we can see list of unavailable operators + their messages in the output?

@sinnykumari
Copy link
Contributor

Agree with Jerry's assessment. Considering limited knowledge of e2e-hypershift test, i am curious to know which team is responsible to troubleshoot the future failure of this test?

@sjenning
Copy link
Contributor Author

The hypershift team + TRT. It is a release blocking job so if it breaks, we'll know almost immediately and it will be high priority to fix it.

Note hypershift-e2e in the blocking jobs list
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-05-24-220329

Making this required really just provides pre-merge protection for your teams so that you don't break at the release level.

@sinnykumari
Copy link
Contributor

This really helps, thanks for the additional context Seth!

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving from the MCO context, should be good to go, thanks Seth!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 25, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: enxebre, sjenning, yuqi-zhang
Once this PR has been reviewed and has the lgtm label, please assign bertinatto, tjungblu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jsafrane
Copy link
Contributor

I'm updating the actual e2e job used in storage tests + make it blocking in #39783

@sjenning
Copy link
Contributor Author

close in favor of #39854

@sjenning sjenning closed this May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants