Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 27, 2021

We've been using --enable-default-cluster-version to avoid a cluster-bootstrap vs. cluster-version operator race since d7760ce (#238). But openshift/enhancements#922 is considering passing component configuration to the CVO via ClusterVersion that would affect the manifests getting applied. So this commit pivots to a new option that blocks the whole sync cycle on the existence of a ClusterVersion resource. That way, we know we have up-to-date data before reconciling any manifests (although it might slow down the bootstrapping process a bit).

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2021
…to --wait-for-cluster-version

We've been using --enable-default-cluster-version to avoid a
cluster-bootstrap vs. cluster-version operator race since d7760ce
(pkg/cvo: Drop ClusterVersion defaulting during bootstrap, 2019-08-16, openshift#238).
But [1] is considering passing component configuration to the CVO via
ClusterVersion that would affect the manifests getting applied.  So
this commit pivots to a new option that blocks the whole sync cycle on
the existence of a ClusterVersion resource.  That way, we know we have
up-to-date data before reconciling any manifests (although it might
slow down the bootstrapping process a bit).

[1]: openshift/enhancements#922
@wking wking force-pushed the wait-for-cluster-version branch from 4a958ca to 672a8a0 Compare October 27, 2021 21:23
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2021

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-upgrade 672a8a0 link true /test e2e-agnostic-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Nov 3, 2021

I haven't run this multiple times to know how reliable it is, but the presubmit that excercises the new CVO code during install took 1h12m to install this cluster, while the presubmit that used already-landed CVO code during install took 39 minutes. The difference is almost entirely in the wait-for-bootstrap-complete leg:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/682/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1453472422291312640/artifacts/e2e-agnostic-upgrade/ipi-install-install-stableinitial/artifacts/.openshift_install.log | tail | grep bootstrap
time="2021-10-27T22:08:22Z" level=debug msg="         bootstrap: 5m22s"
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/682/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic/1453472422211620864/artifacts/e2e-agnostic/ipi-install-install/artifacts/.openshift_install.log | tail | grep bootstrap
time="2021-10-27T22:42:00Z" level=debug msg="         bootstrap: 27m35s"

so that's at least consistent with this PR being the reason for the slowdown.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2022
@wking
Copy link
Member Author

wking commented Feb 11, 2022

Thinking this over, the existing No ClusterVersion object and defaulting not enabled, waiting for one guard will keep us idle until we get an install-time ClusterVersion, so no need for this PR.

@wking wking closed this Feb 11, 2022
@wking wking deleted the wait-for-cluster-version branch February 11, 2022 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants