pkg/controller/template: avoid resyncing templates if the cc didn't change #1177

runcom · 2019-10-16T10:15:58Z

There are two main reasons for this patch. The first one is obivious and
avoids spending time retemplating when there's no need since the
ControllerConfig's spec hasn't changed.
The second reason is king of tricky but what can happen is the
following:

between 4.1 and 4.2 MCO changed the etcd-member image name
placeholder from setupEtcdEnv to setupEtcdEnvKey (along with other
changes to image names)
when an upgrade from 4.1 starts the following happens:
a) the new MCO is rolled out
b) the new MCC is rolled out (note the template controller is also
rolled out here and can start before the new CC is rolled out with the
new image name placeholder!)
c) RACE! the template controller starts running before the new
ControllerConfig with the new image name placeholder is created
d) the MCC generates templates w/o the image name for the etcd member
e) cluster dies

The patch does a simple thing but ensures that the above scenario isn't
hit since we're not going to retemplate anymore if the CC hasn't changed
(yet).

Signed-off-by: Antonio Murdaca [email protected]

…hange There are two main reasons for this patch. The first one is obivious and avoids spending time retemplating when there's no need since the ControllerConfig's spec hasn't changed. The second reason is king of tricky but what can happen is the following: 0) between 4.1 and 4.2 MCO changed the etcd-member image name placeholder from `setupEtcdEnv` to `setupEtcdEnvKey` (along with other changes to image names) 1) when an upgrade from 4.1 starts the following happens: a) the new MCO is rolled out b) the new MCC is rolled out (note the template controller is also rolled out here and can start before the new CC is rolled out with the new image name placeholder!) c) RACE! the template controller starts running before the new ControllerConfig with the new image name placeholder is created d) the MCC generates templates w/o the image name for the etcd member e)_cluster dies The patch does a simple thing but ensures that the above scenario isn't hit since we're not going to retemplate anymore if the CC hasn't changed (yet). Signed-off-by: Antonio Murdaca <[email protected]>

runcom · 2019-10-16T11:10:04Z

/skip

runcom · 2019-10-16T12:11:27Z

/retest

runcom · 2019-10-16T13:13:00Z

lemme check if something is wrong on upgrade now that this fails consistently

runcom · 2019-10-16T13:13:48Z

uhm:

level=info msg="Cluster operator {} {} is {} with {}: {}%!(EXTRA string=insights, v1.ClusterStatusConditionType=Disabled, v1.ConditionStatus=False, string=, string=)"
level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console"

/retest

alaypatel07 · 2019-10-16T15:39:24Z

@runcom I don't understand the working of MCO fully but from what I observed in the repro, even the node that is supposed to be drained and restarts had image: '', is that something that will be covered by this PR?

Should the workflow be: a drain and reboot all the master nodes and then roll out the new manifests based on new ControllerConfig?

michaelgugino

Need some logic to abort if an image key is not found or empty.

kikisdeliveryservice · 2019-10-16T17:18:12Z

Approach makes sense to me, doing some more runs to confirm, but so far seems like it avoided the bug.
Though we're now hitting failed to sync secret cache issues..

runcom · 2019-10-16T17:25:18Z

Need some logic to abort if an image key is not found or empty.

that would be a nice addition indeed :)

rphillips · 2019-10-16T18:35:38Z

/lgtm

openshift-ci-robot · 2019-10-16T18:35:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rphillips, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rphillips · 2019-10-16T18:36:37Z

/cherrypick release-4.2

openshift-cherrypick-robot · 2019-10-16T18:36:38Z

@rphillips: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

smarterclayton · 2019-10-16T18:51:53Z

/test e2e-gcp-upgrade

openshift-ci-robot · 2019-10-16T20:20:50Z

@runcom: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-scaleup-rhel7	`6e091e8`	link	`/test e2e-aws-scaleup-rhel7`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mrunalp · 2019-10-16T20:46:15Z

/test e2e-gcp-upgrade

smarterclayton · 2019-10-16T21:02:00Z

/override ci/prow/e2e-gcp-upgrade

openshift-ci-robot · 2019-10-16T21:02:06Z

@smarterclayton: Overrode contexts on behalf of smarterclayton: ci/prow/e2e-gcp-upgrade

Details

In response to this:

/override ci/prow/e2e-gcp-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kikisdeliveryservice · 2019-10-16T21:03:31Z

/cherrypick release-4.2

openshift-cherrypick-robot · 2019-10-16T21:03:32Z

@kikisdeliveryservice: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2019-10-16T23:01:38Z

@rphillips: new pull request created: #1182

Details

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bug 1762868: revert #1177 and fix common templates in MCs

…1189-to-release-4.2 [release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs

openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 16, 2019

openshift-ci-robot requested review from LorbusChris and ericavonb October 16, 2019 10:16

hexfusion mentioned this pull request Oct 16, 2019

WIP CI only: avoid resyncing templates #1178

Closed

kikisdeliveryservice self-requested a review October 16, 2019 16:13

michaelgugino suggested changes Oct 16, 2019

View reviewed changes

rphillips mentioned this pull request Oct 16, 2019

add setupEtcdEnvKey to images.json #1180

Closed

openshift-ci-robot assigned rphillips Oct 16, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2019

openshift-merge-robot merged commit 18c9e83 into openshift:master Oct 16, 2019

openshift-cherrypick-robot mentioned this pull request Oct 16, 2019

Bug 1762565: pkg/controller/template: avoid resyncing templates if the cc didn't change #1182

Merged

runcom deleted the notemplatingcc branch October 16, 2019 23:23

runcom mentioned this pull request Oct 18, 2019

Bug 1762868: revert #1177 and fix common templates in MCs #1189

Merged

cgwalters mentioned this pull request Oct 20, 2019

Bug 1763695: [release-4.2] pkg/daemon: drain before applying changes #1194

Merged

openshift-merge-robot added a commit that referenced this pull request Oct 21, 2019

Merge pull request #1189 from runcom/cc-versioninig

ffc5b54

Bug 1762868: revert #1177 and fix common templates in MCs

openshift-ci-robot mentioned this pull request Oct 21, 2019

[release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs #1202

Merged

openshift-merge-robot added a commit that referenced this pull request Oct 22, 2019

Merge pull request #1202 from openshift-cherrypick-robot/cherry-pick-…

d73d5c6

…1189-to-release-4.2 [release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs

pkg/controller/template: avoid resyncing templates if the cc didn't change #1177

pkg/controller/template: avoid resyncing templates if the cc didn't change #1177

Uh oh!

Conversation

runcom commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runcom commented Oct 16, 2019

Uh oh!

runcom commented Oct 16, 2019

Uh oh!

runcom commented Oct 16, 2019

Uh oh!

runcom commented Oct 16, 2019

Uh oh!

alaypatel07 commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelgugino left a comment

Choose a reason for hiding this comment

Uh oh!

kikisdeliveryservice commented Oct 16, 2019

Uh oh!

runcom commented Oct 16, 2019

Uh oh!

rphillips commented Oct 16, 2019

Uh oh!

openshift-ci-robot commented Oct 16, 2019

Uh oh!

rphillips commented Oct 16, 2019

Uh oh!

openshift-cherrypick-robot commented Oct 16, 2019

Uh oh!

smarterclayton commented Oct 16, 2019

Uh oh!

openshift-ci-robot commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrunalp commented Oct 16, 2019

Uh oh!

smarterclayton commented Oct 16, 2019

Uh oh!

openshift-ci-robot commented Oct 16, 2019

Uh oh!

kikisdeliveryservice commented Oct 16, 2019

Uh oh!

openshift-cherrypick-robot commented Oct 16, 2019

Uh oh!

openshift-cherrypick-robot commented Oct 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

runcom commented Oct 16, 2019 •

edited

Loading

alaypatel07 commented Oct 16, 2019 •

edited

Loading

openshift-ci-robot commented Oct 16, 2019 •

edited

Loading