-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530
Comments
I really like the idea of "don't restart any" and allow users to restart as they see fit. |
There are two issues with preventative restarts:
We have a couple of options:
|
@abrainerd and I had a good chat about requirements for this. I am going to submit a "definition of problem" here, and then see if I can propose a UI for the solution. |
Here is my proposal for the node control during upgrade: Strategic Goal: System engineer who runs an upgrade can maintain the stability of a workload during the upgrade notes:
Options:
Considerations:
because of this Option 2. seems like a bad approach, but 1. should work. I will use a 2nd comment to go over what the first options will look like. |
Batching node MCR upgrade proposal: Tactic: a system engineer can control node upgrade by running launchpad repeatedly, specifying nodes that should be ignored in each batch, by using a change in the launchpad yaml host declarations. Risks:
Details:
|
@ebourgeois - at your convenience, looking for your feedback on this plan, as discussed. Thank you! |
Hi @james-nesbitt and @abrainerd, I am good with this solution, allow us to have the option of running anything we need in between restarts. @james-nesbitt , your details and risks are spot on and I am good with this approach. |
Effort is proceeding. I have the internal documents setup for lining this up for a release, but will try to work on getting this moved to github (project.) My goal is to get a techical preview ready for review within 2 weeks, with a full release schedule lined up for a bit later (full integration testing capacity is limited at the moment.) |
#558 is a first attempt. it is a small change, but required a lot of evaluation of the behaviour of the installed components. The change allows skipping of certain hosts when upgrading MCR. The change is still a draft because I need to evaluate what workload risks ther when the MKE verison upgrades the kubernetes components - if the KME upgrade destabilzes workload en-masse, then this change may not be enough to allow host upgrade batching. |
I will do the PR fixups (linting, tests etc) and the probably put out a PR specific release. |
The PR has a tagged release for testing: https://github.com/Mirantis/launchpad/releases/tag/v1.5.11-530-tp2 Quick note: this is our first release since the repo restructure, so it is our first chance to properly test our release process. |
A useful enhancement for launchpad would be to add functionality so that during an MCR/Engine upgrade, users could have the ability to specify on which nodes a restart of dockerd would occur. Some uses cases of large clusters have their upgrades performed in batches of workers, so that pods can be shifted around to avoid impact. The idea is a user could specify the specific host(s) for restarts of dockerd to occur instead of it restarting dockerd on all hosts one by one in a linear fashion. This results in less/unnecessary impact and disruption during an upgrade.
As a suggestion, perhaps a "don’t restart” flag to launchpad, which would tell launchpad to not do anything during the “Restart MCR” phase. Thank you!
@abrainerd : this is migrated from the other repo
The text was updated successfully, but these errors were encountered: