Skip to content

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented Oct 1, 2020

Multiple times now, pull requests have merged with e2e-metal-ipi failing
with explicitly clear messages in the CI summary that the failure was
CNO and/or OVN related. We are working on renaming these jobs starting
with 4.7 to indicate they are ovn-ipv6, but it's been proven now that
optional jobs are meaningless.

This sets e2e-metal-ipi to be required to pass on both CNO and OVN
master. These jobs are now as stable as any other platform, so there's
no reason to leave them as optional. It's also the only IPv6 coverage
that exists in OpenShift today.

Multiple times now, pull requests have merged with e2e-metal-ipi failing
with explicitly clear messages in the CI summary that the failure was
CNO and/or OVN related. We are working on renaming these jobs starting
with 4.7 to indicate they are ovn-ipv6, but it's been proven now that
optional jobs are meaningless.

This sets e2e-metal-ipi to be required to pass on both CNO and OVN
master. These jobs are now as stable as any other platform, so there's
no reason to leave them as optional. It's also the only IPv6 coverage
that exists in OpenShift today.
@stbenjam
Copy link
Member Author

stbenjam commented Oct 1, 2020

/cc @trozet @danwinship @russellb

@hardys
Copy link
Contributor

hardys commented Oct 1, 2020

Note the e2e-metal-ipi needs openshift/cluster-network-operator#817 and openshift/ironic-image#107 to merge before it passes

@hardys
Copy link
Contributor

hardys commented Oct 2, 2020

/retest

@stbenjam
Copy link
Member Author

stbenjam commented Oct 2, 2020

/test pj-rehearse

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use build02 for your jobs, we found that a bunch of flakes in CI are due to using that build server:
#12386

Also don't put this in a commit message:
"but it's been proven now that
optional jobs are meaningless."

Additionally, I would make sure your job shows must gather logs extracted in the web browser, rather than making the user download the entire must gather. That would be a good way to enable people to easily check your logs. Here's an example:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/297/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn/1311479684466216960/artifacts/e2e-aws-ovn/

@trozet
Copy link
Contributor

trozet commented Oct 2, 2020

Also, before we require this. In case we switch back to ipv4 again, please change:
https://github.com/openshift-metal3/dev-scripts/blob/master/network.sh#L81

to be ovn-kubernetes as well.

@stbenjam
Copy link
Member Author

stbenjam commented Oct 2, 2020

I wouldn't use build02 for your jobs, we found that a bunch of flakes in CI are due to using that build server:
#12386

We haven't seen that.

Also don't put this in a commit message:
"but it's been proven now that
optional jobs are meaningless."

Well, it's true.

Additionally, I would make sure your job shows must gather logs extracted in the web browser, rather than making the user download the entire must gather. That would be a good way to enable people to easily check your logs. Here's an example:

You can file an issue in our jira project to take care of it, but this was not the reason no one looked. The multiple breakages including the first breakage at the end of 4.6 FF showed the failure reason clearly to be OVN highlighted in yellow in spyglass without having to dig into a must-gather.

gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/297/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn/1311479684466216960/artifacts/e2e-aws-ovn

@hardys
Copy link
Contributor

hardys commented Oct 5, 2020

Also, before we require this. In case we switch back to ipv4 again, please change:
https://github.com/openshift-metal3/dev-scripts/blob/master/network.sh#L81

to be ovn-kubernetes as well.

I already proposed that in openshift-metal3/dev-scripts#1108 and based on discussion there it probably makes sense to set NETWORK_TYPE explicitly for each CI job vs relying on any dev-scripts defaults, that way we can be sure that the configuration used matches the revised naming.

@hardys
Copy link
Contributor

hardys commented Oct 5, 2020

Additionally, I would make sure your job shows must gather logs extracted in the web browser, rather than making the user download the entire must gather. That would be a good way to enable people to easily check your logs. Here's an example:

There are three parts to this - the error pulling the must-gather which you noticed in the failures is a dev-scripts but, that should be fixed via openshift-metal3/dev-scripts#1125

Then there is the problem (as mentioned in that PR) that when the serviceNetwork doesn't come up, must-gather doesn't actually work (arguably a design fault with must-gather, but we could potentially work around it by using scp instead?)

Finally there is the issue that we don't unpack the tarball, which would be a nice improvement, I'll add it to our Jira backlog.

@danwinship
Copy link
Contributor

/lgtm
(though we're also renaming the jobs to have "ipv6" in the name in openshift/ovn-kubernetes so if you want to do that here too while you're here that would be good)

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2020
@openshift-merge-robot openshift-merge-robot merged commit 532ccd4 into openshift:master Oct 5, 2020
@openshift-ci-robot
Copy link
Contributor

@stbenjam: Updated the following 2 configmaps:

  • job-config-master configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-network-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-network-operator/openshift-cluster-network-operator-master-presubmits.yaml
    • key openshift-ovn-kubernetes-master-presubmits.yaml using file ci-operator/jobs/openshift/ovn-kubernetes/openshift-ovn-kubernetes-master-presubmits.yaml
  • job-config-master configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-network-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-network-operator/openshift-cluster-network-operator-master-presubmits.yaml
    • key openshift-ovn-kubernetes-master-presubmits.yaml using file ci-operator/jobs/openshift/ovn-kubernetes/openshift-ovn-kubernetes-master-presubmits.yaml
Details

In response to this:

Multiple times now, pull requests have merged with e2e-metal-ipi failing
with explicitly clear messages in the CI summary that the failure was
CNO and/or OVN related. We are working on renaming these jobs starting
with 4.7 to indicate they are ovn-ipv6, but it's been proven now that
optional jobs are meaningless.

This sets e2e-metal-ipi to be required to pass on both CNO and OVN
master. These jobs are now as stable as any other platform, so there's
no reason to leave them as optional. It's also the only IPv6 coverage
that exists in OpenShift today.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam stbenjam deleted the mandate-ipv6-cno-ovn branch October 5, 2020 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants