Skip to content

add Azure machine pool rolling upgrades with MaxSurge, MaxUnavailable and DeletePolicy#1332

Merged
k8s-ci-robot merged 5 commits into
kubernetes-sigs:masterfrom
devigned:surge-take2
Jun 9, 2021
Merged

add Azure machine pool rolling upgrades with MaxSurge, MaxUnavailable and DeletePolicy#1332
k8s-ci-robot merged 5 commits into
kubernetes-sigs:masterfrom
devigned:surge-take2

Conversation

@devigned
Copy link
Copy Markdown
Contributor

@devigned devigned commented Apr 22, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduces AzureMachinePoolMachine resource to replace AzureMachinePool.Status.Instances in accordance to Azure Machine Pool Machines Proposal.

This also introduces the machine pool rolling upgrade strategy which provides, MaxSurge, MaxUnavailable, and DeletePolicy modeled after machine deployments.

Functional changes:

  • Ability to delete individual AzureMachinePoolMachines
  • MaxSurge to surge machine pools upon upgrade of the machine pool template
  • MaxUnavailable to ensure a minimum number of machines are available at any time
  • DeletePolicy to enable preference on which nodes to delete, Oldest, Newest, Random
  • Fully async reconciliation of AzureMachinePool resources

Non-goals of the PR (follow up PRs):

  • implement cordon and drain per Azure machine pool machine
  • implement instance update rather than delete / re-create

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
related #1105
fixes #1361

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

add Azure machine pool rolling upgrades with MaxSurge, MaxUnavailable and DeletePolicy. 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2021
@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Apr 22, 2021
@devigned devigned changed the title add AzureMachinePoolMachine resource to migrate AzureMachinePool add Azure machine pool rolling upgrades with MaxSurge, MaxUnavailable and DeletePolicy Apr 22, 2021
@devigned
Copy link
Copy Markdown
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2021
Copy link
Copy Markdown
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing but I'm having a hard time following all the different changes... It would greatly help review if you were able to split out any unrelated improvements to other PRs and structure the changes in this PR in multiple commits.

Comment thread api/v1alpha4/conditions_consts.go Outdated
Comment thread azure/scope/machinepool.go Outdated
Comment thread azure/scope/machinepool.go Outdated
Comment thread azure/scope/machinepool.go Outdated
Comment thread azure/scope/machinepool.go Outdated
Comment thread exp/api/v1alpha4/azuremachinepool_types.go Outdated
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2021
@devigned devigned force-pushed the surge-take2 branch 5 times, most recently from 6409b27 to 1c493b9 Compare May 4, 2021 15:20
@devigned devigned mentioned this pull request May 4, 2021
3 tasks
@devigned devigned force-pushed the surge-take2 branch 2 times, most recently from 4cdbbd0 to 81f945b Compare May 4, 2021 19:02
Copy link
Copy Markdown
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still working through reviewing all the files... thanks for splitting up commits

Comment thread pkg/coalescing/mocks/doc.go Outdated
Comment thread test/e2e/helpers.go Outdated
Comment thread test/e2e/helpers.go Outdated
Comment thread api/v1alpha4/conditions_consts.go Outdated
Comment thread azure/scope/machinepool.go
Comment thread azure/types.go
}

// ProviderID returns the K8s provider ID for the VMSS instance
func (vm VMSSVM) ProviderID() string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this stuff really belong in here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to put them where ever. They definitely didn't belong in the ./exp/api where they were before.

I do think they belong here as they are basically the spec structs which have yet to be refactored for the exp controllers.

Comment thread azure/types.go Outdated
Comment thread api/v1alpha4/types.go Outdated
Comment thread api/v1alpha4/types.go Outdated
Comment thread azure/converters/vmss.go Outdated
Comment thread azure/converters/vmss.go
Comment thread azure/errors.go Outdated
Comment thread exp/controllers/azuremachinepoolmachine_controller.go Outdated
Comment thread main.go
if err = controllers.NewAzureMachineReconciler(
mgr.GetClient(),
ctrl.Log.WithName("controllers").WithName("AzureMachine"),
if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it normal that readyz check takes the same args as healthz check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was just keeping what is already in HEAD.

if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil {
setupLog.Error(err, "unable to create ready check")
os.Exit(1)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/coalescing/reconciler.go Outdated
Comment thread test/e2e/config/azure-dev.yaml Outdated
spec:
identity: SystemAssigned
location: ${AZURE_LOCATION}
strategy:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these required changes to the templates or are they illustrative / used as examples? I don't see any docs updates or additions in this PR, what parts of the PR do you think are user-facing changes that need documentation updates?

Copy link
Copy Markdown
Contributor Author

@devigned devigned Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not required. I only added them for illustrative purposes.

The item that needs documentation is the rollout and delete policies. I'll create an issue to address rather than adding to an already enormous PR.

#1427

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

test failed because of quota issues in centralus: #1428

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

this fixes #819 and #1361, correct?

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

/retest

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

/lgtm

and, given @nader-ziada's comment above

/approve

/pony

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@CecileRobertMichon: pony image

Details

In response to this:

/lgtm

and, given @nader-ziada's comment above

/approve

/pony

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Jun 9, 2021

@devigned: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-provider-azure-apidiff f96fba5 link /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon CecileRobertMichon linked an issue Jun 9, 2021 that may be closed by this pull request
@devigned
Copy link
Copy Markdown
Contributor Author

devigned commented Jun 9, 2021

this fixes #819 and #1361, correct?

#1361, yes. But #819, no, not until cordon and drain are added to the AzureMachinePoolMachine delete.

@k8s-ci-robot k8s-ci-robot merged commit 2cbcf3b into kubernetes-sigs:master Jun 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.5.0 milestone Jun 9, 2021
willie-yao pushed a commit to willie-yao/cluster-api-provider-azure that referenced this pull request Jul 11, 2025
…ubernetes-sigs#1332)

Signed-off-by: Mariia Kotliarevskaia <mariia.kotliarevskaia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MachinePool has more worker nodes than the desired value

4 participants