Skip to content

Support passing custom headers to AKS Managed Cluster and Node Pool create/update requests#2020

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
michalno1:main
Feb 23, 2022
Merged

Support passing custom headers to AKS Managed Cluster and Node Pool create/update requests#2020
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
michalno1:main

Conversation

@michalno1
Copy link
Copy Markdown
Contributor

@michalno1 michalno1 commented Jan 28, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it:
To use some AKS features one needs to pass custom headers when creating/updating a resource. One example is when you want to use a GPU node pool, you need to register GPUDedicatedVHDPreview feature and pass a custom header to node pool creation request:

az aks nodepool add \
   ...
   --aks-custom-headers UseGPUDedicatedVHD=true \
   ...

This PR adds a mechanism where you can pass custom headers to cluster and node pool create/update requests by using annotations prefixed with custom-headers.azure.dev/. E.g. to create mentioned GPU node pool one would set "custom-headers.azure.dev/UseGPUDedicatedVHD": true annotation on AzureManagedMachinePool.

Which issue(s) this PR fixes:
fixes #1981, fixes #1979

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Support passing custom headers to cluster and node pool create/update requests to enable additional features

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 28, 2022
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Jan 28, 2022

CLA Signed

The committers are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 28, 2022
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @michalno1!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 28, 2022
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @michalno1. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jan 28, 2022
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 28, 2022
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Jan 28, 2022

CLA Signed

The committers are authorized under a signed CLA.

@zmalik
Copy link
Copy Markdown
Member

zmalik commented Jan 28, 2022

/area managedclusters

@k8s-ci-robot k8s-ci-robot added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Jan 28, 2022
@michalno1 michalno1 changed the title Support passing custom headers to cluster and node pool create/update requests Support passing custom headers to AKS Managed Cluster and Node Pool create/update requests Jan 28, 2022
@CecileRobertMichon
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 28, 2022
@zmalik
Copy link
Copy Markdown
Member

zmalik commented Jan 31, 2022

looks like the tests hit the quota issue

Operation could not be completed as it results in exceeding approved standardDSv3Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: eastus, Current Limit: 200, Current Usage: 200, Additional Required: 6, (Minimum) New Limit Required: 206

/retest

@michalno1
Copy link
Copy Markdown
Contributor Author

/test pull-cluster-api-provider-azure-e2e-exp

Comment thread azure/defaults.go Outdated
// Whatever follows the prefix will be passed as a header to cluster/node pool creation/update requests.
// E.g. add `"custom-headers.azure.dev/UseGPUDedicatedVHD": "true"` annotation to AzureManagedMachinePool CR
// to enable creating GPU nodes by the node pool.
CustomHeaderPrefix = "custom-headers.azure.dev/"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Do annotation/labels follow a specific pattern? In case of annotations I have seen cluster.x-k8s.io in CAPI. May be worth discussing as his will help us being consistent while we may introduce more annotations in near future. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Do you suggest opening a thread in Disussions or should we do it here? Do you have any suggestions? Maybe we can use cluster-azure.x-k8s.io/ for all project annotations and cluster-azure.x-k8s.io/custom-header-{HEADER_NAME} specifically for this feature?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-azure.x-k8s.io -- This one sounds good to me. Copying @CecileRobertMichon @shysank to provide more input.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about infrastructure.cluster.x-k8s.io? cluster-azure.x-k8s.io is the API Group for Cluster API. CAPZ resources use infrastructure.cluster.x-k8s.io as its Group.

GroupVersion = schema.GroupVersion{Group: "infrastructure.cluster.x-k8s.io", Version: "v1beta1"}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon makes sense, thanks. I've changed it.

func (s *ManagedControlPlaneScope) ManagedClusterAnnotations() map[string]string {
return s.ControlPlane.Annotations
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Do we need this function? Can't we simply use s.ControlPlane.Annotations at places wherever required?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why it is so. You basically implemented it as part of ManagedMachinePoolScope interface.

Copy link
Copy Markdown
Contributor Author

@michalno1 michalno1 Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular method implements ManagedClusterScope rather than ManagedMachinePoolScope. This is done in the same spirit as ManagedControlPlaneScope.ManagedClusterSpec().

func (s *ManagedControlPlaneScope) AgentPoolAnnotations() map[string]string {
return s.InfraMachinePool.Annotations
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment thread azure/services/agentpools/agentpools.go Outdated
var result = map[string]string{}
for key, value := range s.scope.AgentPoolAnnotations() {
if strings.HasPrefix(key, azure.CustomHeaderPrefix) {
result[strings.TrimPrefix(key, azure.CustomHeaderPrefix)] = value
Copy link
Copy Markdown
Contributor

@sonasingh46 sonasingh46 Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought:
What if the value is empty? We can have probably move this function to another that takes input as annotations and prefix and returns the filtered values. This could help reuse.

And here we can probably just sanitise the values if required.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored for reuse. What issue do you see with an empty value? We'll just pass a header with empty value if that is what someone intends.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. My only point was -- does the azure api tolerates the empty value. I do not know. Just thought to bring up, in case. If yes, I do not think we need to do anything here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked- API does accept empty values.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

}
}
return result
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 10, 2022
Copy link
Copy Markdown
Contributor

@shysank shysank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @michalno1! The changes in general looks good. Just a couple of clarifications regarding cluster/nodepool update. wdyt about adding some documentation for this feature? I'm fine with doing it in a followup pr if it's easier.

if diff != "" {
log.V(2).Info(fmt.Sprintf("Update required (+new -old):\n%s", diff))
err = s.Client.CreateOrUpdate(ctx, agentPoolSpec.ResourceGroup, agentPoolSpec.Cluster, agentPoolSpec.Name, profile)
err = s.Client.CreateOrUpdate(ctx, agentPoolSpec.ResourceGroup, agentPoolSpec.Cluster, agentPoolSpec.Name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we enable/disable features after node pool creation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on the feature, the headers are usually used with experimental features in AKS so lets say in case of GPU nodepool, we cannot. We cannot convert an existing agentpool to support GPU, its just a creation operation.

I'm ok with providing a way for users to use these experimental features and at most add some warning documentation about untested behaviors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most sane way to move forward is to support "enable feature via custom header" on create only. There may be some aks nodepool update operations that are initiated via an "empty" PUT w/ HTTP header data, but that's kind of weird from a UX perspective and I would not expect to rely upon that.

As @zmalik suggests, we should consider feature configuration delivered via HTTP headers as an interface for AKS to rapidly deliver experimental features, and then eventually they may graduate to fully supported features (after which point capz would have to implement them in their final form).

Does that make sense? Is the PR as-is implemented to forbid updates of custom headers after cluster/nodepool creation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR right now allows for updates, although not intentionally. If there is a non zero diff along with annotations changes, the new headers will be set. I'd rather we be more explicit by adding a validation webhook to make it immutable. wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shysank added immutability check

if diff != "" {
klog.V(2).Infof("Update required (+new -old):\n%s", diff)
managedCluster, err = s.Client.CreateOrUpdate(ctx, managedClusterSpec.ResourceGroupName, managedClusterSpec.Name, managedCluster)
managedCluster, err = s.Client.CreateOrUpdate(ctx, managedClusterSpec.ResourceGroupName, managedClusterSpec.Name, managedCluster, customHeaders)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above regarding enabling/disabling features after cluster creation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shysank added validation webhook with immutability check to AzureManagedCluster as well.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 19, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2022
@michalno1
Copy link
Copy Markdown
Contributor Author

Thanks for the pr @michalno1! The changes in general looks good. Just a couple of clarifications regarding cluster/nodepool update. wdyt about adding some documentation for this feature? I'm fine with doing it in a followup pr if it's easier.

@shysank I've added some docs

Comment thread exp/api/v1beta1/azuremanagedcluster_webhook.go Outdated
Comment thread exp/api/v1beta1/azuremanagedcluster_webhook_test.go Outdated
Comment thread exp/api/v1beta1/azuremanagedcluster_webhook_test.go Outdated
Comment thread exp/api/v1beta1/azuremanagedmachinepool_webhook_test.go Outdated
Comment thread exp/api/v1beta1/azuremanagedmachinepool_webhook_test.go Outdated
Comment thread exp/api/v1beta1/azuremanagedmachinepool_webhook.go Outdated
@shysank
Copy link
Copy Markdown
Contributor

shysank commented Feb 23, 2022

/test pull-cluster-api-provider-azure-e2e

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Feb 23, 2022

@michalno1: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff c0416bc link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@michalno1
Copy link
Copy Markdown
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@shysank
Copy link
Copy Markdown
Contributor

shysank commented Feb 23, 2022

/lgtm
/approve
Great job @michalno1! Thanks for being patient with the reviews.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2022
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shysank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2022
@k8s-ci-robot k8s-ci-robot merged commit b37fc68 into kubernetes-sigs:main Feb 23, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add support for custom-headers for AKS Add support for GPU nodes in AKS (managed cluster)

7 participants