-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Trigger machine pool instance refresh (node rollout) if bootstrap config reference changes #4619
✨ Trigger machine pool instance refresh (node rollout) if bootstrap config reference changes #4619
Conversation
3480765
to
1653865
Compare
@@ -225,6 +234,7 @@ func (r *AWSMachinePoolReconciler) reconcileNormal(ctx context.Context, machineP | |||
|
|||
ec2Svc := r.getEC2Service(ec2Scope) | |||
asgsvc := r.getASGService(clusterScope) | |||
reconSvc := r.getReconcileService(ec2Scope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the code base, so this may be a basic question but, what's the difference between ec2Svc
and reconSvc
? They both seem to be calling ec2.NewService()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because I split interfaces such that I can call the actual reconciliation code in tests while still mocking EC2 API calls. I don't know why reconciliation was part of a mockable interface in the first place, but that's where I started from with my change.
if len(tags) > 0 { | ||
// tag instances | ||
// tag instances | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use blocks like { }
on other parts of the code? I'm wondering if we want to use it, or if it improves things. If we don't use it anywhere else, I wouldn't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, the purpose of the if len(tags) > 0
test seems to be to avoid creating and appending a spec
element to the tagSpecifications
with zero tags in spec.Tags
list.
Does the AWS sdk reject a LaunchTemplateTagSpecificationRequest
with a list of zero tags? If AWS can support that, we can remove this block and the if len(tags) > 0
test from below, it just makes this code harder to read. Otherwise it seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are zero tags per category (here: ec2.ResourceTypeInstance
and ec2.ResourceTypeVolume
), we just shouldn't add the tag specification. That's why the guard is there. And I think having a block, whether it's just { ... }
or if ... { ... }
around each category, together with a comment above each block, separates the logic somewhat cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the linter errors failing the tests. Overall this looks solid.
if len(tags) > 0 { | ||
// tag instances | ||
// tag instances | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, the purpose of the if len(tags) > 0
test seems to be to avoid creating and appending a spec
element to the tagSpecifications
with zero tags in spec.Tags
list.
Does the AWS sdk reject a LaunchTemplateTagSpecificationRequest
with a list of zero tags? If AWS can support that, we can remove this block and the if len(tags) > 0
test from below, it just makes this code harder to read. Otherwise it seems fine.
// tag EBS volumes | ||
spec = &ec2.LaunchTemplateTagSpecificationRequest{ResourceType: aws.String(ec2.ResourceTypeVolume)} | ||
// tag EBS volumes | ||
if len(tags) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the above answer solve your question? Here, tags
could be empty and we want an if
, while for instance tags, we now have at least one (the newly added tag used for the feature) and need no if
.
pkg/cloud/services/interfaces.go
Outdated
// from EC2Interface so that we can test the behavior of our non-mock implementations. For example, by not mocking | ||
// the ReconcileLaunchTemplate function, but mocking EC2Interface, we can test which EC2 API operations would have | ||
// been called. | ||
type ReconcileInterface interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better interface name we can choose? I find that I am confused when I would want to add method here or to the EC2Interface, given that both interfaces contain methods names with "Reconcile" . It may be confusing for future contributors when deciding where an additional method should go and how to decide.
I'm not sure what a better name would be, but something closer to the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EC2Interface
mocks the AWS EC2 API, so ReconcileLaunchTemplate
(which I moved out) and ReconcileTags
(out of scope for my PR) should not be in there. If I move the ReconcileTags
function as well into the new interface, it should become clearer for someone working on the code, particularly if we state that it's for the functionality of the controller as opposed to raw AWS requests. Given that change, what about MachinePoolReconcileInterface
since those functions are exclusively used in awsmachinepool_controller.go
and awsmanagedmachinepool_controller.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would clear up my confusion, sgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also fixed lint errors.
/lgtm |
1653865
to
2f77269
Compare
launchTemplateUserDataSecretKey = &apimachinerytypes.NamespacedName{ | ||
Namespace: parts[0], | ||
Name: parts[1], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
return i, userdata.ComputeHash(decodedUserData), nil | ||
return i, userdata.ComputeHash(decodedUserData), launchTemplateUserDataSecretKey, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it okay for launchTemplateUserDataSecretKey
to be potentially nil here, if the above loop fails? Or should we return an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must be backward-compatible and support cases where the tag is missing by mistake/bug/intervention. So nil
is fine. I changed this line to explicit nil
here, and added the return
for the found case into the loop as you commented above.
2f77269
to
77bb30d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
77bb30d
to
6f6df3a
Compare
Rebased and only one (obvious to me) test line was conflicted. @cnmcavoy @fiunchinho could you take another look? @richardcase we'd need approval here as well. I'm not sure if this should go into v2.3.x? |
6f6df3a
to
169894f
Compare
Rebased because that fixed E2E tests in another PR. /test pull-cluster-api-provider-aws-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
169894f
to
f0033b1
Compare
/retest |
f0033b1
to
2cddfed
Compare
/test pull-cluster-api-provider-aws-e2e |
/milestone v2.4.0 |
/retest |
@AndiDog could you rebase the PR and retry E2E tests? |
…fig reference changes
2cddfed
to
7d2df5d
Compare
/retest |
/test pull-cluster-api-provider-aws-e2e |
Still one E2E failure related to LBs. The test function |
/test pull-cluster-api-provider-aws-e2e |
/approve cc @fiunchinho @cnmcavoy @AndiDog for final review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ankitasw, richardcase The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
We discussed this and alternative solutions at length in kubernetes-sigs/cluster-api#8858 and CAPI/CAPA office hours. Eventually, we turned to a solution that is not surprising for users: if the bootstrap config reference (e.g.
KubeadmConfig
) changes within aMachinePool
/AWSMachinePool
combo, i.e. its name changes, new nodes should be rolled out. With this, it is not necessary to parse the bootstrap data (= user data) and detect changes within those scripts (cloud-init, ignition, maybe other formats in the future). Instead, we only need to store the previously-used bootstrap data secret key (namespace+name) in the launch template, and compare on every reconciliation whether that reference changed. For example, if an operating user wants to roll out a change toKubeadmConfig.spec.files
, i.e. add new files on the nodes, it's now sufficient to create a newly-namedKubeadmConfig
. For my specific usage, this means naming the objectKubeadmConfig/my-config-<hash of its spec>
and the Helm chart upgrade will take care to deploy this new object and delete the old one (which I tested and works perfectly fine).Testing of this change showed that CAPI+CAPA correctly, and in the correct order without possible race conditions (🤞), render the bootstrap data secret, mark the bootstrap config ready, create a new launch template version and trigger instance refresh (which creates new EC2 instances). The only minor glitch I found is #4618, but that can be solved separately.
Since reconciliation of launch templates wasn't covered in tests at all, I changed the mock interfaces a little so it could be tested, and added some test cases.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Towards kubernetes-sigs/cluster-api#8858 and #4071
For the feature to work, users require a CAPI version that includes kubernetes-sigs/cluster-api#8667. Without that, CAPA won't see an updated
MachinePool.Spec.Template.Spec.Bootstrap.DataSecretName
to reconcile.Checklist:
Release note: