Skip to content

Conversation

@rmohr
Copy link
Contributor

@rmohr rmohr commented Jul 3, 2019

After openshift is installed on AWS, install the HCO operator from the
kubevirt project and wait on the CR of the kubevirt operator for the
ready condition to occur.

First version, which will not yet run many tests. It only verifies if HCO can be successfully installed. It uses a promoted image from presubmits + the src image to deploy HCO and ran the basic sanity checks.

@openshift-ci-robot
Copy link
Contributor

Hi @rmohr. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 3, 2019
@rmohr
Copy link
Contributor Author

rmohr commented Jul 3, 2019

/cc @stevekuznetsov @trown @eparis @danielBelenky @gbenhaim

Here a very naive approach on how the CNV/KubeVirt deployment can be tested. It just installs our meta-operator and checks if the kubevirt operator adds a ready condition on its CR. This would already have catched at least one issue which we had.

I guess kubectl is not available and oc needs to be used? Would you recommend to have a test script hosted somewhere else and use curl in the TEST_COMMAND to fetch it?

@openshift-ci-robot
Copy link
Contributor

@rmohr: GitHub didn't allow me to request PR reviews from the following users: danielBelenky, gbenhaim.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @stevekuznetsov @trown @eparis @danielBelenky @gbenhaim

Here a very naive approach on how the CNV/KubeVirt deployment can be tested. It just installs our meta-operator and checks if the kubevirt operator adds a ready condition on its CR. This would already have catched at least one issue which we had.

I guess kubectl is not available and oc needs to be used? Would you recommend to have a test script hosted somewhere else and use curel in the TEST_COMMAND to fetch it?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abhinavdahiya
Copy link
Contributor

/hold

This is adding a test on installer repo. Why?

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2019
@rmohr
Copy link
Contributor Author

rmohr commented Jul 3, 2019

@abhinavdahiya good question if I put it on the right repo. What we need is a test which verifies that latest okd code which is supposed to be released (e.g. release candidate, ...), is compatible with CNV. It should give us a heads-up that something changed in OKD which would break CNV. Is there a better place where this should be added?

@abhinavdahiya
Copy link
Contributor

Why don't you create a periodic job...?

@rmohr
Copy link
Contributor Author

rmohr commented Jul 4, 2019

Why don't you create a periodic job...?

Hm, maybe. I now moved it to openshift/origin presubmits. Would that make sense? Since what we want to see if API changes or internal changes of OKD changes break something in kubevirt (like it happened with strategic merge patch removal on SCCs). So I guess that would make more sense, since we are not interested in the install process as such. Periodic would work, but then the chance to discuss issues on the PRs which introduce changes already passed.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 4, 2019

A more technical question: I would need oc in the container. I could try to run dnf but that would require privileges. I could also try to fetch a static oc build. Also a static go binary which would do the deployment would be an option, which we host somewhere.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 4, 2019

/cc @rthallisey FYI

@openshift-ci-robot
Copy link
Contributor

@rmohr: GitHub didn't allow me to request PR reviews from the following users: FYI.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @rthallisey FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 9, 2019

Should we provide our own TEST_IMAGE which contains our things? It should then have the cluster config set via an environment variable, right?

@stevekuznetsov @trown @eparis any recommendations?

@rmohr
Copy link
Contributor Author

rmohr commented Jul 9, 2019

Ok, there is now a TEST_IMAGE set which contains everything. Looking forward to get some feedback. :)

@fabiand
Copy link

fabiand commented Jul 15, 2019

Ping?

@smarterclayton
Copy link
Contributor

/hold

Generally we don't add presubmits to origin until the code is proven elsewhere, since there's lots of code that gets put into openshift that you can have elsewhere. I assume you just want to verify that "openshift continues to work with the latest cnv"?

If so, you want to create a release periodic (see #3707) that runs periodically. The build-cop would then notify you if the job fails. We probably need to associate this with an email so the build cop knows who to yell at.

@bparees
Copy link
Contributor

bparees commented Jul 15, 2019

Generally we don't add presubmits to origin until the code is proven elsewhere, since there's lots of code that gets put into openshift that you can have elsewhere.

@smarterclayton it's optional:true and alwaysrun: false so do your concerns still apply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was something added elsewhere that is building this image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, this is being hosted on dockerhub. i think we should do this the "right" way and introduce logic that will build the hco-tests image from a repo on each job run, the way our mainline e2e tests work.

Otherwise you're going to have a hard time synchronizing delivering test+code changes, or knowing why a job failed when you can't confirm what version of the tests it ran against.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to stick with :latest tag, I need to improve that on the hco side. The issue in my opinion is that there are no releases right now in the hco repo. I am not sure if we can properly solve the scenario with one job. I think we would need two: One which tests latest master (maybe like you suggest) and one which tests the latest release. I would not focus on this image too much for the initial version, if that is fine for you.

@eparis
Copy link
Member

eparis commented Jul 16, 2019

I think the request is to make it a periodic. I think Jessica laid out a reasonable test of layered product testing:
http://post-office.corp.redhat.com/archives/aos-devel/2019-July/msg00299.html

1) Create a job testing your component on top of latest OCP that runs as a periodic job - this does not block the release but you can run it often enough to get a decent signal
2) Show the job is stable / does not contain a high flake rate
3) Add the job to our nightly / promotion jobs as optional - it will run but not block the release if it fails
4) Make sure the job is reporting on the build cop dashboards so build cops are alerted if it starts failing
5) If the job is running successfully on OCP nightlies for a week - ask the OpenShift architects for approval to mark it as required
6) If you have jobs with the ability to block OCP nightlies, your team must take part in build cop rotations

@bparees
Copy link
Contributor

bparees commented Jul 16, 2019

I think the request is to make it a periodic.

i'd like to understand why we wouldn't allow teams to choose to run these tests against their PRs.

@rthallisey
Copy link

I think there's two ways for layered products to integrate:

  1. e2e testing per PR (no upgrade testing)
  2. Upgrade testing as part of openshift's upgrade CI, which can block releases

I understood Jessica's answer to be regarding 2) and this PR is for 1).

@jwforres
Copy link
Member

My email was actually about both 1 and 2. You can have e2e testing that runs in a periodic job, or that blocks the releases, no different than upgrades. We'd like to understand the requirement for per PR testing. Ben will be reaching out.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 29, 2019

@bparees, as suggested in #4563 (review), RELEASE_IMAGE_TAG is now blank again.

@bparees
Copy link
Contributor

bparees commented Jul 29, 2019

@rmohr i think #4563 is going to be what you want. it'll work in rehearse(I think, we can see what happens on that PR), and RELEASE_IMAGE_LATEST gets overwritten during actual CI jobs to use the appropriate release payload image.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 29, 2019

@rmohr i think #4563 is going to be what you want. it'll work in rehearse(I think, we can see what happens on that PR), and RELEASE_IMAGE_LATEST gets overwritten during actual CI jobs to use the appropriate release payload image.

Ah, ok I thought I should remove it. Will do.

@bparees
Copy link
Contributor

bparees commented Jul 29, 2019

Ah, ok I thought I should remove it. Will do.

yes that was your mistake for listening to me earlier. sorry.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 29, 2019

Looks good so far. rehearse job got past the point where it failed before.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 30, 2019

@bparees I tried the 4.2 and the 4.1 image. In both cases the rehearsal job times out after four hours.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 30, 2019

Here the relevant section from https://storage.googleapis.com/origin-ci-test/logs/rehearse-4269-canary-release-openshift-origin-installer-e2e-aws-4.2-cnv/8/build-log.txt:

2019/07/29 21:16:49 Running pod e2e-aws-cnv
{"component":"entrypoint","file":"prow/entrypoint/run.go:159","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","time":"2019-07-30T01:15:10Z"}
2019/07/30 01:15:10 error: Process interrupted with signal interrupt, exiting in 10s ...
2019/07/30 01:15:10 cleanup: Deleting release pod release-images-latest
2019/07/30 01:15:10 cleanup: Deleting template e2e-aws
2019/07/30 01:15:10 error: unable to gather container logs: [error: Unable to retrieve logs from pod container cli: container "cli" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container artifacts: container "artifacts" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container setup: container "setup" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container teardown: container "teardown" in pod "e2e-aws-cnv" is terminated, error: Unable to retrieve logs from pod container test: container "test" in pod "e2e-aws-cnv" is terminated]
2019/07/30 01:15:10 error: unable to signal to artifacts container to terminate in pod e2e-aws-cnv, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2019/07/30 01:15:10 error: unable to retrieve artifacts from pod e2e-aws-cnv: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
{"component":"entrypoint","file":"prow/entrypoint/run.go:237","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process gracefully exited before 15s grace period","time":"2019-07-30T01:15:20Z"}

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2019
@rmohr
Copy link
Contributor Author

rmohr commented Jul 31, 2019

@bparees thanks a lot for your help! Everything is passing now.

@rmohr
Copy link
Contributor Author

rmohr commented Jul 31, 2019

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 31, 2019
@bparees
Copy link
Contributor

bparees commented Jul 31, 2019

@rmohr can you squash the commits and i'll lgtm it?

Install the CNV hyperconverged-cluster-operator and run some tests.
@bparees
Copy link
Contributor

bparees commented Jul 31, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2019
@bparees
Copy link
Contributor

bparees commented Jul 31, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2019
@openshift-merge-robot openshift-merge-robot merged commit 594c92d into openshift:master Jul 31, 2019
@openshift-ci-robot
Copy link
Contributor

@rmohr: Updated the job-config-misc configmap in namespace ci using the following files:

  • key openshift-release-release-4.2-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-release-4.2-periodics.yaml
Details

In response to this:

After openshift is installed on AWS, install the HCO operator from the
kubevirt project and wait on the CR of the kubevirt operator for the
ready condition to occur.

First version, which will not yet run many tests. It only verifies if HCO can be successfully installed. It uses a promoted image from presubmits + the src image to deploy HCO and ran the basic sanity checks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@rmohr: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/pj-rehearse 2bca718 link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants