Skip to content

Conversation

@mboersma
Copy link
Contributor

@mboersma mboersma commented Nov 10, 2025

What type of PR is this?

/kind cleanup
/area provider/azure

What this PR does / why we need it:

Updates the Azure cluster-autoscaler backend to use Azure SDK v2.

Which issue(s) this PR fixes:

Fixes #8145

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Update Azure SDK to v2

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 10, 2025
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 10, 2025
@mboersma
Copy link
Contributor Author

/cc @jackfrancis

@mboersma
Copy link
Contributor Author

/cc @nojnhuh

@k8s-ci-robot k8s-ci-robot requested a review from nojnhuh November 10, 2025 16:37
Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/hold for review from @tallaxes

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2025
@jackfrancis
Copy link
Contributor

/release-note-edit

Update Azure SDK to v2

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 10, 2025

klog "k8s.io/klog/v2"

"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/deploymentclient"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Azure code was using these client interfaces exported by cloud-provider-azure, but they didn't support Azure SDK v2 and have been removed in current releases.

Copy link
Contributor

@tallaxes tallaxes Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have been replaced in cloud-provider-azure with Azure SDK v2 clients (and mocks), in azclient package (https://github.com/kubernetes-sigs/cloud-provider-azure/tree/master/pkg/azclient) That's what cluster-autoscaler should be migrating to, I think, unless there is a really good reason to use alternative clients.

The two exceptions without existing v2 clients in cloud-provider-azure would be agentpool (autoscaler already has one of its own - though generating one in cloud-provider-azure and using it instead would be better) and resourcesku.

github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v7 v7.1.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v5 v5.1.0-beta.2
Copy link
Contributor Author

@mboersma mboersma Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The armcontainerservice API is up to v8 now, but v6 has some breaking changes so I left this alone. I'll create an issue to update it so we don't fall too far behind the currently supported API.

Edit: see #8790

github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v7 v7.1.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v5 v5.1.0-beta.2
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v7 v7.1.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources/v2 v2.1.0
Copy link
Contributor Author

@mboersma mboersma Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current API version for armresources is v3, but it has some breaking changes and seems to lack DeploymentExtended, so I went with the previous version.

@jackfrancis
Copy link
Contributor

ping @tallaxes, if we want to get this included in 1.35

Copy link
Contributor

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use v2 clients from cloud-provider-azure? Among other things, this would ensure high compatibility with client configuration such as auth methods, cloud endpoints, retry and backoff policies, rate limiting, polling frequency, etc.


klog "k8s.io/klog/v2"

"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/deploymentclient"
Copy link
Contributor

@tallaxes tallaxes Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have been replaced in cloud-provider-azure with Azure SDK v2 clients (and mocks), in azclient package (https://github.com/kubernetes-sigs/cloud-provider-azure/tree/master/pkg/azclient) That's what cluster-autoscaler should be migrating to, I think, unless there is a really good reason to use alternative clients.

The two exceptions without existing v2 clients in cloud-provider-azure would be agentpool (autoscaler already has one of its own - though generating one in cloud-provider-azure and using it instead would be better) and resourcesku.

Comment on lines +268 to 270
// Get v2 credentials for all Azure SDK v2 clients
cred, err := getAgentpoolClientCredentials(cfg)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces (the now unused) newAuthorizer; do we have confidence it supports the same options and configuration as newAuthorizer?

(Also, should not be named Agentpool if it is now used for everything?)

azClientConfig := cfg.getAzureClientConfig(authorizer, env)
azClientConfig.UserAgent = getUserAgentExtension()
// Create common client options for all v2 clients
clientOptions := &policy.ClientOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These won't respect client retry/backoff/rate limiting config?

resp, err := scaleSet.manager.azClient.virtualMachineScaleSetsClient.Get(ctx, scaleSet.manager.config.ResourceGroup, scaleSet.Name, nil)
if err != nil {
klog.Errorf("failed to get information for VMSS: %s, error: %v", scaleSet.Name, err)
return -1, newGetVMSSFailedError(err, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this still be checking for not found? (There are helpers in Azure/azure-sdk-for-go-extentions ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Azure SDK

4 participants