-
Notifications
You must be signed in to change notification settings - Fork 462
pkg/controller/template: avoid resyncing templates if the cc didn't change #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…hange There are two main reasons for this patch. The first one is obivious and avoids spending time retemplating when there's no need since the ControllerConfig's spec hasn't changed. The second reason is king of tricky but what can happen is the following: 0) between 4.1 and 4.2 MCO changed the etcd-member image name placeholder from `setupEtcdEnv` to `setupEtcdEnvKey` (along with other changes to image names) 1) when an upgrade from 4.1 starts the following happens: a) the new MCO is rolled out b) the new MCC is rolled out (note the template controller is also rolled out here and can start before the new CC is rolled out with the new image name placeholder!) c) RACE! the template controller starts running before the new ControllerConfig with the new image name placeholder is created d) the MCC generates templates w/o the image name for the etcd member e)_cluster dies The patch does a simple thing but ensures that the above scenario isn't hit since we're not going to retemplate anymore if the CC hasn't changed (yet). Signed-off-by: Antonio Murdaca <[email protected]>
|
/skip |
|
/retest |
|
lemme check if something is wrong on upgrade now that this fails consistently |
|
uhm: /retest |
|
@runcom I don't understand the working of MCO fully but from what I observed in the repro, even the node that is supposed to be drained and restarts had Should the workflow be: a drain and reboot all the master nodes and then roll out the new manifests based on new ControllerConfig? |
michaelgugino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some logic to abort if an image key is not found or empty.
|
Approach makes sense to me, doing some more runs to confirm, but so far seems like it avoided the bug. |
that would be a nice addition indeed :) |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rphillips, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-4.2 |
|
@rphillips: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-gcp-upgrade |
|
@runcom: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/test e2e-gcp-upgrade |
|
/override ci/prow/e2e-gcp-upgrade |
|
@smarterclayton: Overrode contexts on behalf of smarterclayton: ci/prow/e2e-gcp-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-4.2 |
|
@kikisdeliveryservice: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@rphillips: new pull request created: #1182 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bug 1762868: revert #1177 and fix common templates in MCs
…1189-to-release-4.2 [release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs
There are two main reasons for this patch. The first one is obivious and
avoids spending time retemplating when there's no need since the
ControllerConfig's spec hasn't changed.
The second reason is king of tricky but what can happen is the
following:
placeholder from
setupEtcdEnvtosetupEtcdEnvKey(along with otherchanges to image names)
a) the new MCO is rolled out
b) the new MCC is rolled out (note the template controller is also
rolled out here and can start before the new CC is rolled out with the
new image name placeholder!)
c) RACE! the template controller starts running before the new
ControllerConfig with the new image name placeholder is created
d) the MCC generates templates w/o the image name for the etcd member
e) cluster dies
The patch does a simple thing but ensures that the above scenario isn't
hit since we're not going to retemplate anymore if the CC hasn't changed
(yet).
Signed-off-by: Antonio Murdaca [email protected]