-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine pool nodes are not rolled during update #2217
Comments
I think it's this bug kubernetes-sigs/cluster-api-provider-aws#4071 |
I've encountered a similar situation when other machine pool settings were updated. I ended up manually rolling the nodes from the AWS console. |
FYI CAPz has the a very similar issues with MachinePool Roll updates , https://github.com/giantswarm/giantswarm/issues/25188 , reason why we reverted to MachineDeployments |
Here's why this happens: upstream comment. I'll try to figure out implementation ideas for the fix. |
In kubernetes-sigs/cluster-api-provider-aws#4071 (comment), I figured that cluster-api changes are also needed to speed up time until nodes get refreshed. Therefore I started working on making our fork https://github.com/giantswarm/cluster-api ready to work with. First change to be upstream: kubernetes-sigs/cluster-api#8586 so we can build only the production components controller images of CAPI, but avoid building not required components (clusterctl, test stuff). Once merged, our fork should not differ from upstream apart from our CircleCI config. |
First part of the fix in CAPA: kubernetes-sigs/cluster-api-provider-aws#4245 |
@AndiDog do u know if there was any decision made? |
Next step is to discuss the issue in the CAPI office hours, since in the CAPA meeting we noted that it's generic across providers and CAPI needs to be adapted at best. |
Continuing the discussion with CAPI maintainers in the new issue kubernetes-sigs/cluster-api#8858 and Slack thread. |
I'm writing down a proposed solution which requires some contract from CAPI to infra providers (e.g. CAPA) so they can get a checksum of the user data without the bootstrap token – or in other terms, a checksum of In the meantime, I want to try a workaround: CAPA always rolls out the new launch template if |
The proposed workaround, i.e. adding a hash of the machine pool's |
Proposed solution in the CAPI issue: kubernetes-sigs/cluster-api#8858 (comment) |
My first proposal would lead to quite some contract changes between CAPI and bootstrap providers, so not sure if it's feasible or desired. Therefore, to get a quick turnaround, I tried the "swap the |
from wg capa migration sync: critical for adidas poc (as we need to do manual work to upgrade the cluster) |
I have a meeting with upstream this Friday to see how we can go on. None of the workarounds worked until now because of CAPI/CAPA bugs or shortcomings. |
Working and tested solution in kubernetes-sigs/cluster-api-provider-aws#4619 (combined with newer CAPI version), so moving this to blocked until we progress on the upstream PRs |
We're almost done here. CAPI/CAPA forks and cluster-aws have the feature. Waiting for the upstream PR (targeting CAPA v2.4.0 because it's a new feature) kubernetes-sigs/cluster-api-provider-aws#4619 to be merged before closing this issue. There's also still a small docs PR open: giantswarm/docs#2028. |
Upstream PR is merged now |
Issue
The machine pool nodes are not rolled if the
KubeadmConfig
is changed and the nodes needs to be manually rolled.Details
I noticed during the
thunder
update, as there were many changes to bothcluster-aws
and other stuff, it looked like it rolled nodes but the changes for proxy were not rolled as well, once I had deleted the nodes manually they came up with proper config.We should first confirm if the scenario is reproducible, and if it is we need to find a solution or be aware of the issue.
How to try to reproduce:
The text was updated successfully, but these errors were encountered: