Skip to content

Comments

Move gather chain after devscripts-gather step#18473

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
jluhrsen:must-gather-after-devscripts-gather
May 11, 2021
Merged

Move gather chain after devscripts-gather step#18473
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
jluhrsen:must-gather-after-devscripts-gather

Conversation

@jluhrsen
Copy link
Contributor

gather-must-gather is not working in 4.7 baremetalds because
it cannot pull the must gather image. the devscripts-gather
step brings the must-gather image locally and sets the
MUST_GATHER_IMAGE variable to that so the gather-must-gather
step will use that instead.

Signed-off-by: Jamo Luhrsen jluhrsen@gmail.com

@openshift-ci openshift-ci bot requested review from ardaguclu and stbenjam May 11, 2021 03:52
@jluhrsen
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 11, 2021
@jluhrsen jluhrsen force-pushed the must-gather-after-devscripts-gather branch 2 times, most recently from b638e25 to a1e8354 Compare May 11, 2021 06:23
@jluhrsen
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 11, 2021
@jluhrsen
Copy link
Contributor Author

the metal-ipi-ovn jobs do not rehearse properly. apparently the config is too large and things choke and the rehearsal
doesn't work.
However, if you remove a few steps from the job you can get the rehearsal to run with whatever steps
you think are important. I did that with an earlier version of this PR by removing packet-teardown and e2e-test steps.
so the job in the end doesn't have pretty results, but you can see that the devscripts-gather ran first, followed by the
gather-must-gather step. in the gather-must-gather step you can see the image being used as:

Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest

Currently our 4.7 version of this job is perma-failing and blocking a release.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2021

@jluhrsen: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade b638e25ec04c10f24bebc63ac802ef495cacf3bc link /test pj-rehearse
ci/rehearse/openshift/cluster-network-operator/release-4.9/e2e-metal-ipi-ovn-ipv6-ipsec b638e25ec04c10f24bebc63ac802ef495cacf3bc link /test pj-rehearse
ci/rehearse/openshift/router/release-4.9/e2e-metal-ipi-ovn-ipv6 b638e25ec04c10f24bebc63ac802ef495cacf3bc link /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-serial-ipv4 a1e8354b2a82789aecf7634866f47287daa659b4 link /test pj-rehearse
ci/prow/pj-rehearse a1e8354b2a82789aecf7634866f47287daa659b4 link /test pj-rehearse

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stbenjam
Copy link
Member

Looks fine to me if you fix the ordering so gather chain happens before packet teardown, but after devscripts-gather. In my head when I wrote #18414 that was already the case 🤦 Thanks for the fix!

the metal-ipi-ovn jobs do not rehearse properly. apparently the config is too large and things choke and the rehearsal
doesn't work.

We should figure out why you saw this -- I haven't seen it myself, I saw on the Slack thread something was 130kB but it's not clear from the logs or the thread what was too large (maybe I just didn't have enough coffee yet).

gather-must-gather is not working in 4.7 baremetalds because
it cannot pull the must gather image. the devscripts-gather
step brings the must-gather image locally and sets the
MUST_GATHER_IMAGE variable to that so the gather-must-gather
step will use that instead.

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@jluhrsen
Copy link
Contributor Author

the metal-ipi-ovn jobs do not rehearse properly. apparently the config is too large and things choke and the rehearsal
doesn't work.

We should figure out why you saw this -- I haven't seen it myself, I saw on the Slack thread something was 130kB but it's not clear from the logs or the thread what was too large (maybe I just didn't have enough coffee yet).

I'm not sure how to get this fixed, but when I was trying to figure out what was wrong and assuming it was a problem in my
code, I ended up checking other PRs in this area and saw that the rehearsals were failing in those PRs the same way. I guess
the full bit of config that gets generated from the larger steps in the metal-ipi jobs is hitting some OS limit.

@jluhrsen jluhrsen force-pushed the must-gather-after-devscripts-gather branch from a1e8354 to 26d58f0 Compare May 11, 2021 15:45
@knobunc
Copy link
Contributor

knobunc commented May 11, 2021

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 11, 2021
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@stbenjam
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jluhrsen, knobunc, stbenjam, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2021
@openshift-merge-robot openshift-merge-robot merged commit 1cfca80 into openshift:master May 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2021

@jluhrsen: Updated the step-registry configmap in namespace ci at cluster app.ci using the following files:

  • key baremetalds-ipi-post-chain.yaml using file ci-operator/step-registry/baremetalds/ipi/post/baremetalds-ipi-post-chain.yaml
Details

In response to this:

gather-must-gather is not working in 4.7 baremetalds because
it cannot pull the must gather image. the devscripts-gather
step brings the must-gather image locally and sets the
MUST_GATHER_IMAGE variable to that so the gather-must-gather
step will use that instead.

Signed-off-by: Jamo Luhrsen jluhrsen@gmail.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member

the metal-ipi-ovn jobs do not rehearse properly. apparently the config is too large and things choke and the rehearsal
doesn't work.

We should figure out why you saw this -- I haven't seen it myself, I saw on the Slack thread something was 130kB but it's not clear from the logs or the thread what was too large (maybe I just didn't have enough coffee yet).

I'm not sure how to get this fixed, but when I was trying to figure out what was wrong and assuming it was a problem in my
code, I ended up checking other PRs in this area and saw that the rehearsals were failing in those PRs the same way. I guess
the full bit of config that gets generated from the larger steps in the metal-ipi jobs is hitting some OS limit.

Just to come full circle, the rehearsal ends up embedding all the shell scripts and things into the prow job, so it is quite large. We're about to cut our TEST_SKIPS list by more than half, and we can have a look at some other optimizations like more terse comments. That should help reduce the overall size.

@jluhrsen jluhrsen deleted the must-gather-after-devscripts-gather branch July 22, 2021 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants