-
Notifications
You must be signed in to change notification settings - Fork 213
Bug 2053359: Feature gate initialization #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2053359: Feature gate initialization #740
Conversation
…ingTechPreviewState
As described in [1], some 4.10 jobs that set TechPreviewNoUpgrade very
early during install are running into trouble like:
1. Early in bootstrap, something sets TechPreviewNoUpgrade.
2. Cluster-version operator comes up, and attempts to figure out the
current featureSet. But because the Kubernetes API is also still
coming up, that fails on an error like [2]:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-76cd65b7bb-4p945_cluster-version-operator.log | grep 'Error getting featuregate value\|tech'
W0204 10:10:40.126809 1 start.go:142] Error getting featuregate value: Get "https://127.0.0.1:6443/apis/config.openshift.io/v1/featuregates/cluster": dial tcp 127.0.0.1:6443: connect: connection refused
I0204 10:19:53.097129 1 techpreviewchangestopper.go:97] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade false.
3. The TechPreviewChangeStopper waits for any FeatureGate changes, but
we don't get any.
4. CVO happily spends hours without synchronizing any of the requested
TechPreviewNoUpgrade manifests.
Step 2 was originally fatal, but I'd softened it in 90b1454
(pkg/start: Log and continue when we fail to retrieve the feature
gate, 2021-12-06, openshift#706). Here are the relevant cases, and how they'd
behave with the different approaches:
a. No Kube-API hiccup on the initial FeatureGate fetch. All
implementations handle this well.
b. Kube-API hiccup on the initial FeatureGate fetch.
i. And the actual FeatureGate value was not TechPreviewNoUpgrade.
Before 90b1454, this would have caused a useless CVO
container restart. Since 90b1454, and unchanged in this
commit, the CVO container's default:
includeTechPreview := false
is correct, and we correctly ignore the hiccup.
ii. The actual FeatureGate value was TechPreviewNoUpgrade. Before
90b1454, this would cause a useful CVO container restart.
From 90b1454 until this commit, we'd hit the bug case where
we'd go an unbounded amount of time failing to reconcile the
TechPreviewNoUpgrade manifests the user was asking for. With
this commit, we notice the divergence right after the informer
caches sync, and restart the CVO container.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2050946#c0
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264
Not Found errors are a clear configuration for "no feature set", so syncHandler should treat them as successful results. This avoids continually requeuing for a new GET call, now that 79742d7 (pkg/featurechangestopper: Seed queue to guard against incorrect startingTechPreviewState, 2022-02-04, openshift#736) is seeding the queue, even in clusters where there is no FeatureSet clusters. And it also allows us to detect set -> removed transitions if those were allowed, although the API-server is supposed to make it impossible to remove TechPreviewNoUpgrade once it has been set [1]. [1]: https://docs.openshift.com/container-platform/4.9/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling
cvInformer does not make sense for FeatureGate due to it filters out resources with name other than `version` by default.
|
@wking: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
vrutkovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrutkovs, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 2053359, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@wking: This pull request references Bugzilla bug 2053359, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/label backport-risk-assessed |
|
/label cherry-pick-approved |
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 2053359 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bringing #736 and #739 back to 4.10.