manifests: add cvo-overrides #415

abhinavdahiya · 2018-10-04T13:20:43Z

This allows installer to block installation of components in release
manifests that are casing conflicts with old tectonic-operators

Requires openshift/cluster-version-operator#30

@wking this should allow us to stop installing conflicting new operators
/cc @wking

This allows installer to block installation of components in release manifests that are casing conflicts with old tectonic-operators

wking · 2018-10-04T14:07:36Z

I don't understand well enought to review your list of operators to ignore, but +1 on the approach to unstick us.

abhinavdahiya · 2018-10-04T16:36:48Z

/retest

openshift/cluster-version-operator#30 merged.

crawford · 2018-10-04T17:16:04Z

/lgtm

openshift-ci-robot · 2018-10-04T17:16:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2018-10-04T17:58:04Z

openshift/release#1814 fixes

which: no extended.test in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
/bin/bash: line 93: ginkgo: command not found

error from e2e-aws

/retest

…lures These are currently generating a lot of error messages. From [1] (testing openshift/installer#415): Gathering artifacts ... Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log) Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log) Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log) Error from server (NotFound): the server could not find the requested resource ... Error from server (BadRequest): previous terminated container "registry" in pod "registry-b6df966cf-fkhpl" not found Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-2hf2w" not found Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-7pgl9" not found ... Looking at the extracted logs, lots of them are zero (which compresses to 20 bytes): $ POD_LOGS="$(w3m -dump https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/artifacts/e2e-aws/pods/)" $ echo "${POD_LOGS}" | grep '^ *20$' | wc -l 86 $ echo "${POD_LOGS}" | grep '\[file\]' | wc -l 172 And, possibly because of the errors?, the commands are slow with one of the above lines coming out every second or so. The teardown container obviously does some other things as well, but it's taking a significant chunk of our e2e-aws time [2]: 2018/10/04 17:59:00 Running pod e2e-aws 2018/10/04 18:03:25 Container setup in pod e2e-aws completed successfully 2018/10/04 18:16:37 Container test in pod e2e-aws completed successfully 2018/10/04 18:33:31 Container teardown in pod e2e-aws completed successfully 2018/10/04 18:33:31 Pod e2e-aws succeeded after 34m31s So 4.5 minutes to setup, 13 minutes to test, and 17 minutes to teardown. When the test pass, we probably aren't going to be poking around in the logs, so drop log acquisition in those cases to speed up our CI. [1]: https://api.ci.openshift.org/console/project/ci-op-w11cl72x/browse/pods/e2e-aws?tab=logs [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/build-log.txt

With 10 pulls going at once. These are currently generating a lot of error messages. From recent openshift/installer#415 tests [1]: $ oc project ci-op-w11cl72x $ oc logs e2e-aws -c teardown --timestamps 2018-10-04T18:17:06.557740109Z Gathering artifacts ... 2018-10-04T18:17:24.875374828Z Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log) ... 2018-10-04T18:17:29.331684772Z Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log) 2018-10-04T18:17:29.351919855Z Error from server (NotFound): the server could not find the requested resource 2018-10-04T18:17:39.592948165Z Error from server (BadRequest): previous terminated container "registry" in pod "registry-b6df966cf-fkhpl" not found ... 2018-10-04T18:29:24.457841097Z Error from server (BadRequest): previous terminated container "kube-addon-operator" in pod "kube-addon-operator-775d4c8f8d-289zm" not found 2018-10-04T18:29:24.466213055Z Waiting for node logs to finish ... 2018-10-04T18:29:24.466289887Z Deprovisioning cluster ... 2018-10-04T18:29:24.483065903Z level=debug msg="Deleting security groups" ... 2018-10-04T18:33:29.857465158Z level=debug msg="goroutine deleteVPCs complete" So 12 minutes to pull the logs, followed by four minutes for destroy-cluster. Looking at the extracted logs, lots of them are zero (which compresses to 20 bytes): $ POD_LOGS="$(w3m -dump https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/artifacts/e2e-aws/pods/)" $ echo "${POD_LOGS}" | grep '^ *20$' | wc -l 86 $ echo "${POD_LOGS}" | grep '\[file\]' | wc -l 172 So it's possible that the delay is due to the errors, or to a few large logs blocking the old, serial pod/container pulls. With this commit, I've added a new 'queue' command. This command checks to see how many background jobs we have using 'jobs' [2], and idles until we get below 10. Then it launches its particular command in the background. By using 'queue', we'll keep up to 10 log-fetches running in parallel, and the final 'wait' will block for any which still happen to be running by that point. The previous gzip invocations used -c, which dates back to 82d333e (Set up artifact reporting for ci-operator jobs, 2018-05-17, openshift#867). But with these gzip filters running on stdin anyway, the -c was superfluous. I've dropped it in this commit. Moving redirect target to a positional argument is a bit cludgy. I'd rather have a more familiar way of phrasing that redirect, but passing it in as ${1} was the best I've come up with. [1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/build-log.txt [2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/jobs.html

ironcladlou · 2018-10-05T13:07:42Z

pkg/asset/manifests/content/bootkube/cvo-overrides.go

+  namespace: openshift-cluster-network-operator
+  name: cluster-network-operator
+  unmanaged: true
+- kind: Deployment                    # this conflicts with tectonic-ingress-controller-operator


Not that it matters now, but this doesn't actually conflict

@ironcladlou if that'd the case feel free to open a PR to drop this override.

ironcladlou · 2018-10-05T13:08:01Z

pkg/asset/manifests/content/bootkube/cvo-overrides.go

+  namespace: openshift-cluster-dns-operator
+  name: cluster-dns-operator
+  unmanaged: true
+- kind: Deployment                    # this conflicts with kube-core-operator


This doesn't conflict.

Again @ironcladlou if that'd the case feel free to open a PR to drop this override.

Currently things like [1,2] that try to unstick us vs. some external change we need to /hold the other approved PRs to get them out of the merge queue while the fix goes in. With the bot removed from our repository, those PRs would remove themselves as they failed naturally, and we'd just /retest them after the fix lands. We can turn the bot back on once we got back to one-external-workaround a week or so, vs. our current several per day ;). Docs for the repo: syntax are in [3]. [1]: openshift/installer#415 [2]: openshift/installer#425 [3]: https://help.github.com/articles/searching-issues-and-pull-requests/#search-within-a-users-or-organizations-repositories

manifests: add cvo-override

aa271c0

This allows installer to block installation of components in release manifests that are casing conflicts with old tectonic-operators

openshift-ci-robot requested a review from wking October 4, 2018 13:20

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 4, 2018

openshift-ci-robot assigned crawford Oct 4, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2018

This was referenced Oct 4, 2018

Start rendering assets using cluster-kube-apiserver operator renderer #322

Merged

Wrap all errors with context #400

Merged

Change default tt0 cidr #410

Merged

openshift-merge-robot merged commit 93e9292 into openshift:master Oct 4, 2018

wking mentioned this pull request Oct 4, 2018

cluster-launch-installer-e2e: Only pull pod and container logs on failures openshift/release#1815

Closed

wking mentioned this pull request Oct 4, 2018

cluster-launch-installer-e2e: Pull all logs in parallel openshift/release#1817

Merged

ironcladlou reviewed Oct 5, 2018

View reviewed changes

abhinavdahiya deleted the cvo_overrides branch October 5, 2018 13:13

wking mentioned this pull request Oct 5, 2018

ci-operator/jobs/infra-periodics: Opt installer out of periodic-retester openshift/release#1832

Closed

ramr mentioned this pull request Oct 15, 2018

Re-enable cluster ingress operator (remove from overriden set) #467

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

manifests: add cvo-overrides #415

manifests: add cvo-overrides #415

Uh oh!

abhinavdahiya commented Oct 4, 2018 •

edited

Loading

Uh oh!

wking commented Oct 4, 2018 •

edited

Loading

Uh oh!

abhinavdahiya commented Oct 4, 2018

Uh oh!

crawford commented Oct 4, 2018

Uh oh!

openshift-ci-robot commented Oct 4, 2018

Uh oh!

abhinavdahiya commented Oct 4, 2018

Uh oh!

ironcladlou Oct 5, 2018

Uh oh!

abhinavdahiya Oct 5, 2018

Uh oh!

ironcladlou Oct 5, 2018

Uh oh!

abhinavdahiya Oct 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

manifests: add cvo-overrides #415

manifests: add cvo-overrides #415

Uh oh!

Conversation

abhinavdahiya commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinavdahiya commented Oct 4, 2018

Uh oh!

crawford commented Oct 4, 2018

Uh oh!

openshift-ci-robot commented Oct 4, 2018

Uh oh!

abhinavdahiya commented Oct 4, 2018

Uh oh!

ironcladlou Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

ironcladlou Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

abhinavdahiya commented Oct 4, 2018 •

edited

Loading

wking commented Oct 4, 2018 •

edited

Loading