Add a test to verify that KubeVirt/CNV can be installed on OKD #4269

rmohr · 2019-07-03T11:14:26Z

After openshift is installed on AWS, install the HCO operator from the
kubevirt project and wait on the CR of the kubevirt operator for the
ready condition to occur.

First version, which will not yet run many tests. It only verifies if HCO can be successfully installed. It uses a promoted image from presubmits + the src image to deploy HCO and ran the basic sanity checks.

openshift-ci-robot · 2019-07-03T11:14:40Z

Hi @rmohr. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rmohr · 2019-07-03T11:22:10Z

/cc @stevekuznetsov @trown @eparis @danielBelenky @gbenhaim

Here a very naive approach on how the CNV/KubeVirt deployment can be tested. It just installs our meta-operator and checks if the kubevirt operator adds a ready condition on its CR. This would already have catched at least one issue which we had.

I guess kubectl is not available and oc needs to be used? Would you recommend to have a test script hosted somewhere else and use curl in the TEST_COMMAND to fetch it?

openshift-ci-robot · 2019-07-03T11:22:13Z

@rmohr: GitHub didn't allow me to request PR reviews from the following users: danielBelenky, gbenhaim.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @stevekuznetsov @trown @eparis @danielBelenky @gbenhaim

Here a very naive approach on how the CNV/KubeVirt deployment can be tested. It just installs our meta-operator and checks if the kubevirt operator adds a ready condition on its CR. This would already have catched at least one issue which we had.

I guess kubectl is not available and oc needs to be used? Would you recommend to have a test script hosted somewhere else and use curel in the TEST_COMMAND to fetch it?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

abhinavdahiya · 2019-07-03T13:19:19Z

/hold

This is adding a test on installer repo. Why?

rmohr · 2019-07-03T13:36:46Z

@abhinavdahiya good question if I put it on the right repo. What we need is a test which verifies that latest okd code which is supposed to be released (e.g. release candidate, ...), is compatible with CNV. It should give us a heads-up that something changed in OKD which would break CNV. Is there a better place where this should be added?

abhinavdahiya · 2019-07-03T14:19:31Z

Why don't you create a periodic job...?

rmohr · 2019-07-04T10:19:26Z

Why don't you create a periodic job...?

Hm, maybe. I now moved it to openshift/origin presubmits. Would that make sense? Since what we want to see if API changes or internal changes of OKD changes break something in kubevirt (like it happened with strategic merge patch removal on SCCs). So I guess that would make more sense, since we are not interested in the install process as such. Periodic would work, but then the chance to discuss issues on the PRs which introduce changes already passed.

rmohr · 2019-07-04T10:21:17Z

A more technical question: I would need oc in the container. I could try to run dnf but that would require privileges. I could also try to fetch a static oc build. Also a static go binary which would do the deployment would be an option, which we host somewhere.

rmohr · 2019-07-04T10:21:33Z

/cc @rthallisey FYI

openshift-ci-robot · 2019-07-04T10:21:34Z

@rmohr: GitHub didn't allow me to request PR reviews from the following users: FYI.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @rthallisey FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rmohr · 2019-07-09T07:17:57Z

Should we provide our own TEST_IMAGE which contains our things? It should then have the cluster config set via an environment variable, right?

@stevekuznetsov @trown @eparis any recommendations?

rmohr · 2019-07-09T08:22:23Z

Ok, there is now a TEST_IMAGE set which contains everything. Looking forward to get some feedback. :)

fabiand · 2019-07-15T10:17:32Z

Ping?

smarterclayton · 2019-07-15T14:55:11Z

/hold

Generally we don't add presubmits to origin until the code is proven elsewhere, since there's lots of code that gets put into openshift that you can have elsewhere. I assume you just want to verify that "openshift continues to work with the latest cnv"?

If so, you want to create a release periodic (see #3707) that runs periodically. The build-cop would then notify you if the job fails. We probably need to associate this with an email so the build cop knows who to yell at.

bparees · 2019-07-15T21:27:41Z

Generally we don't add presubmits to origin until the code is proven elsewhere, since there's lots of code that gets put into openshift that you can have elsewhere.

@smarterclayton it's optional:true and alwaysrun: false so do your concerns still apply?

bparees · 2019-07-15T21:30:32Z

ci-operator/jobs/openshift/origin/openshift-origin-master-presubmits.yaml

was something added elsewhere that is building this image?

oh i see, this is being hosted on dockerhub. i think we should do this the "right" way and introduce logic that will build the hco-tests image from a repo on each job run, the way our mainline e2e tests work.

Otherwise you're going to have a hard time synchronizing delivering test+code changes, or knowing why a job failed when you can't confirm what version of the tests it ran against.

I don't want to stick with :latest tag, I need to improve that on the hco side. The issue in my opinion is that there are no releases right now in the hco repo. I am not sure if we can properly solve the scenario with one job. I think we would need two: One which tests latest master (maybe like you suggest) and one which tests the latest release. I would not focus on this image too much for the initial version, if that is fine for you.

eparis · 2019-07-16T19:11:22Z

I think the request is to make it a periodic. I think Jessica laid out a reasonable test of layered product testing:
http://post-office.corp.redhat.com/archives/aos-devel/2019-July/msg00299.html

1) Create a job testing your component on top of latest OCP that runs as a periodic job - this does not block the release but you can run it often enough to get a decent signal
2) Show the job is stable / does not contain a high flake rate
3) Add the job to our nightly / promotion jobs as optional - it will run but not block the release if it fails
4) Make sure the job is reporting on the build cop dashboards so build cops are alerted if it starts failing
5) If the job is running successfully on OCP nightlies for a week - ask the OpenShift architects for approval to mark it as required
6) If you have jobs with the ability to block OCP nightlies, your team must take part in build cop rotations

bparees · 2019-07-16T19:40:54Z

I think the request is to make it a periodic.

i'd like to understand why we wouldn't allow teams to choose to run these tests against their PRs.

rthallisey · 2019-07-16T19:54:56Z

I think there's two ways for layered products to integrate:

e2e testing per PR (no upgrade testing)
Upgrade testing as part of openshift's upgrade CI, which can block releases

I understood Jessica's answer to be regarding 2) and this PR is for 1).

jwforres · 2019-07-16T20:06:15Z

My email was actually about both 1 and 2. You can have e2e testing that runs in a periodic job, or that blocks the releases, no different than upgrades. We'd like to understand the requirement for per PR testing. Ben will be reaching out.

rmohr · 2019-07-29T20:59:41Z

@bparees, as suggested in #4563 (review), RELEASE_IMAGE_TAG is now blank again.

bparees · 2019-07-29T21:09:16Z

@rmohr i think #4563 is going to be what you want. it'll work in rehearse(I think, we can see what happens on that PR), and RELEASE_IMAGE_LATEST gets overwritten during actual CI jobs to use the appropriate release payload image.

rmohr · 2019-07-29T21:12:26Z

@rmohr i think #4563 is going to be what you want. it'll work in rehearse(I think, we can see what happens on that PR), and RELEASE_IMAGE_LATEST gets overwritten during actual CI jobs to use the appropriate release payload image.

Ah, ok I thought I should remove it. Will do.

bparees · 2019-07-29T21:18:54Z

Ah, ok I thought I should remove it. Will do.

yes that was your mistake for listening to me earlier. sorry.

rmohr · 2019-07-29T21:32:31Z

Looks good so far. rehearse job got past the point where it failed before.

rmohr · 2019-07-30T11:31:01Z

@bparees I tried the 4.2 and the 4.1 image. In both cases the rehearsal job times out after four hours.

rmohr · 2019-07-30T11:31:59Z

Here the relevant section from https://storage.googleapis.com/origin-ci-test/logs/rehearse-4269-canary-release-openshift-origin-installer-e2e-aws-4.2-cnv/8/build-log.txt:

2019/07/29 21:16:49 Running pod e2e-aws-cnv
{"component":"entrypoint","file":"prow/entrypoint/run.go:159","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","time":"2019-07-30T01:15:10Z"}
2019/07/30 01:15:10 error: Process interrupted with signal interrupt, exiting in 10s ...
2019/07/30 01:15:10 cleanup: Deleting release pod release-images-latest
2019/07/30 01:15:10 cleanup: Deleting template e2e-aws
2019/07/30 01:15:10 error: unable to gather container logs: [error: Unable to retrieve logs from pod container cli: container "cli" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container artifacts: container "artifacts" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container setup: container "setup" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container teardown: container "teardown" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container test: container "test" in pod "e2e-aws-cnv" is terminated]
2019/07/30 01:15:10 error: unable to signal to artifacts container to terminate in pod e2e-aws-cnv, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2019/07/30 01:15:10 error: unable to retrieve artifacts from pod e2e-aws-cnv: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
{"component":"entrypoint","file":"prow/entrypoint/run.go:237","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process gracefully exited before 15s grace period","time":"2019-07-30T01:15:20Z"}

rmohr · 2019-07-31T09:47:17Z

@bparees thanks a lot for your help! Everything is passing now.

rmohr · 2019-07-31T10:15:23Z

/ok-to-test

bparees · 2019-07-31T13:21:32Z

@rmohr can you squash the commits and i'll lgtm it?

Install the CNV hyperconverged-cluster-operator and run some tests.

bparees · 2019-07-31T13:27:02Z

/lgtm

openshift-ci-robot · 2019-07-31T13:27:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/jobs/openshift/release/OWNERS~~ [bparees]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bparees · 2019-07-31T13:28:03Z

/hold cancel

openshift-ci-robot · 2019-07-31T13:28:20Z

@rmohr: Updated the job-config-misc configmap in namespace ci using the following files:

key openshift-release-release-4.2-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-release-4.2-periodics.yaml

Details

In response to this:

After openshift is installed on AWS, install the HCO operator from the
kubevirt project and wait on the CR of the kubevirt operator for the
ready condition to occur.

First version, which will not yet run many tests. It only verifies if HCO can be successfully installed. It uses a promoted image from presubmits + the src image to deploy HCO and ran the basic sanity checks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-07-31T14:15:56Z

@rmohr: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/pj-rehearse	`2bca718`	link	`/test pj-rehearse`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 3, 2019

openshift-ci-robot requested review from abhinavdahiya and tomassedovic July 3, 2019 11:15

openshift-ci-robot requested review from eparis, stevekuznetsov and trown July 3, 2019 11:22

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2019

rmohr force-pushed the cnv branch from 5ea1158 to c3b5ef4 Compare July 4, 2019 10:14

openshift-ci-robot requested a review from rthallisey July 4, 2019 10:21

bparees reviewed Jul 15, 2019

View reviewed changes

rmohr force-pushed the cnv branch from 8aa12b6 to fc4d73d Compare July 29, 2019 20:49

rmohr force-pushed the cnv branch from 89d5193 to c0267a7 Compare July 31, 2019 07:11

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2019

rmohr force-pushed the cnv branch from 59c6409 to 7dc1572 Compare July 31, 2019 09:00

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2019

openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 31, 2019

Add periodic CNV install job

2bca718

Install the CNV hyperconverged-cluster-operator and run some tests.

rmohr force-pushed the cnv branch from 7dc1572 to 2bca718 Compare July 31, 2019 13:23

bparees mentioned this pull request Jul 31, 2019

[WIP] play with cnv periodic job #4563

Closed

openshift-ci-robot assigned bparees Jul 31, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2019

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2019

openshift-merge-robot merged commit 594c92d into openshift:master Jul 31, 2019

Add a test to verify that KubeVirt/CNV can be installed on OKD #4269

Add a test to verify that KubeVirt/CNV can be installed on OKD #4269

Uh oh!

Conversation

rmohr commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jul 3, 2019

Uh oh!

rmohr commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jul 3, 2019

Uh oh!

abhinavdahiya commented Jul 3, 2019

Uh oh!

rmohr commented Jul 3, 2019

Uh oh!

abhinavdahiya commented Jul 3, 2019

Uh oh!

rmohr commented Jul 4, 2019

Uh oh!

rmohr commented Jul 4, 2019

Uh oh!

rmohr commented Jul 4, 2019

Uh oh!

openshift-ci-robot commented Jul 4, 2019

Uh oh!

rmohr commented Jul 9, 2019

Uh oh!

rmohr commented Jul 9, 2019

Uh oh!

fabiand commented Jul 15, 2019

Uh oh!

smarterclayton commented Jul 15, 2019

Uh oh!

bparees commented Jul 15, 2019

Uh oh!

bparees Jul 15, 2019

Choose a reason for hiding this comment

Uh oh!

bparees Jul 15, 2019

Choose a reason for hiding this comment

Uh oh!

rmohr Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

eparis commented Jul 16, 2019

Uh oh!

bparees commented Jul 16, 2019

Uh oh!

rthallisey commented Jul 16, 2019

Uh oh!

jwforres commented Jul 16, 2019

Uh oh!

rmohr commented Jul 29, 2019

Uh oh!

bparees commented Jul 29, 2019

Uh oh!

rmohr commented Jul 29, 2019

Uh oh!

bparees commented Jul 29, 2019

Uh oh!

rmohr commented Jul 29, 2019

Uh oh!

rmohr commented Jul 30, 2019

Uh oh!

rmohr commented Jul 30, 2019

Uh oh!

rmohr commented Jul 31, 2019

Uh oh!

rmohr commented Jul 31, 2019

Uh oh!

bparees commented Jul 31, 2019

Uh oh!

bparees commented Jul 31, 2019

Uh oh!

openshift-ci-robot commented Jul 31, 2019

Uh oh!

bparees commented Jul 31, 2019

rmohr commented Jul 3, 2019 •

edited

Loading

rmohr commented Jul 3, 2019 •

edited

Loading