Add Linux VM extension bootstrap script and conditions#1232
Conversation
6828aea to
1ddcd95
Compare
1ddcd95 to
0366cd4
Compare
|
Just realized it's a wip pr, please feel free to ignore the review comments that are not relevant. |
0366cd4 to
240c5b5
Compare
| return err | ||
| } | ||
| _, err = future.Result(ac.vmextensions) | ||
| _, err := ac.vmextensions.CreateOrUpdate(ctx, resourceGroupName, vmName, name, parameters) |
There was a problem hiding this comment.
@devigned thoughts on this approach of doing "async" reconciliation? Is it significantly less efficient than storing the future or since we only care about eventually getting the extension status do you think it's appropriate in this situation?
There was a problem hiding this comment.
I'm thinking in the future we can move this over to VM CreateOrUpdate once that's async. For now the extension could be a long-running operation (timeout is 20 minutes) so we don't want that to be part of a blocking operation
There was a problem hiding this comment.
I think this should be fine. Though, it would be easy to call this multiple times if one is not tracking an ongoing operation via a secret. With the AMP future, we check to see if the operation is done prior to proceeding, which the Azure SDK for Go can help with. In this case, you'd be responsible for managing that state.
436e638 to
e700487
Compare
c861661 to
e7e6060
Compare
| switch infrav1.ProvisioningState(provisioningState) { | ||
| case infrav1.Succeeded: | ||
| m.V(4).Info("extension provisioning state is succeeded", "vm extension", extensionName, "scale set", m.Name()) | ||
| conditions.MarkTrue(m.AzureMachinePool, infrav1.BootstrapSucceededCondition) |
There was a problem hiding this comment.
it'd be nice to set conditions per VMSS VM, hopefully that gets easier with #819
There was a problem hiding this comment.
Right now it's at the AzureMachinePool.Status level:
status:
conditions:
- lastTransitionTime: "2021-03-16T23:53:39Z"
status: "True"
type: BoostrapSucceeded
- lastTransitionTime: "2021-03-16T23:49:37Z"
status: "True"
type: ScaleSetRunning
|
Thanks for the initial review @shysank! I've addressed most of your comments. I still have docs and unit tests as TODOs. |
e7e6060 to
ad702de
Compare
dea6219 to
669b1c0
Compare
669b1c0 to
7d5b441
Compare
7d5b441 to
638c14f
Compare
|
@CecileRobertMichon: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
This is ready for review now |
|
when I try running it, I don't see the then goes directly to not sure why yet, will dig into it a bit more update: I see this error in my logs |
devigned
left a comment
There was a problem hiding this comment.
Looks great. Only had a minor comment about async APIs.
| return err | ||
| } | ||
| _, err = future.Result(ac.vmextensions) | ||
| _, err := ac.vmextensions.CreateOrUpdate(ctx, resourceGroupName, vmName, name, parameters) |
There was a problem hiding this comment.
I think this should be fine. Though, it would be easy to call this multiple times if one is not tracking an ongoing operation via a secret. With the AMP future, we check to see if the operation is done prior to proceeding, which the Azure SDK for Go can help with. In this case, you'd be responsible for managing that state.
@nader-ziada I wonder if you're doing something differently, this is what I see when scaling up: That first step I'm hoping we can change to |
|
I tried it again and seems to work as expected |
|
/lgtm @devigned any comments? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: devigned The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it: Implements bootstrap failure detection using conditions and VM extensions as proposed in #1076.
This is what a user will see when an AzureMachine has been successfully provisioned but has not yet finished running the bootstrap script:
After the bootstrap script has executed successfully, the AzureMachine status shows:
If for some reason the bootstrap script fails to execute, the status will show:
Also renames type "VMState" to "ProvisioningState" to be more generic and match Azure naming. This is a breaking change for anyone importing CAPZ types, but should have no impact on user templates and existing CRDs (conversion from v1alpha3 to v1alpha4 is auto-generated).
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #603
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: