Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r/kubernetes_cluster: handling breaking changes in the AKS API #6095

Merged
merged 31 commits into from
Apr 6, 2020

Conversation

wagnst
Copy link
Contributor

@wagnst wagnst commented Mar 12, 2020

BREAKING CHANGES:

  • Azure Kubernetes Service
    • Due to a breaking change in the AKS API, the azurerm_kubernetes_cluster resource features a significant behavioural change where creating Mixed-Mode Authentication clusters (e.g. using a Service Principal with a Managed Identity) is no longer supported.
    • The AKS Team have confirmed that existing clusters will be updated by the Azure API to use only MSI when a change is made to the Cluster (but not the Node Pool). Whilst Terraform could perform this automatically some environments have restrictions on which tags can be added/removed - as such this operation will need to be performed out-of-band. Instead, upon detecting a Mixed-Mode Cluster which has not yet been updated - or upon detecting a former Mixed-Mode Cluster where the Terraform Configuration still contains a service_principal block - Terraform will output instructions on how to proceed.
    • azurerm_kubernetes_cluster_node_pool - clusters with auto-scale disabled must ensure that min_count and max_count are set to null (or omitted) rather than 0 (since 0 isn't a valid value for these fields).

NOTES:

BUG FIXES:

Fixes #6094
Fixes #6235
Fixes #6257
Fixes #6178

@ghost ghost added the size/XS label Mar 12, 2020
@WodansSon WodansSon changed the title Potential fix for #6094 azurerm_kubernetes_cluster: Cannot set default_pool max_count and min_count to zero when enable_auto_scaling is false Mar 13, 2020
@WodansSon WodansSon changed the title azurerm_kubernetes_cluster: Cannot set default_pool max_count and min_count to zero when enable_auto_scaling is false azurerm_kubernetes_cluster: Cannot set default_pool max_count and min_count to zero when enable_auto_scaling is false Mar 13, 2020
@WodansSon
Copy link
Collaborator

WodansSon commented Mar 19, 2020

@wagnst, hey... this looks mostly good to me, but I would like to push some changes to your repo as a lot of the logic in the expand function no longer makes any sense now that the API version has changed making node_count a required field. I would also like to add a note to the documentation and add a couple of test cases if you don't mind.

@ghost ghost added size/XL and removed size/XS labels Mar 20, 2020
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from updating the docs with the new min this LGTM 👍

@tombuildsstuff tombuildsstuff self-assigned this Mar 26, 2020
@tombuildsstuff tombuildsstuff added this to the v2.4.0 milestone Mar 31, 2020
Steffen Wagner and others added 14 commits April 1, 2020 12:04
If autoscaling is disabled on default_node_pool, those values are set to `0`
…ncipal or MSI blocks

Previously it was possible to use a Service Principal for an AKS Cluster and also give it a
Managed Identity. However due to a breaking change in the AKS API this is no longer possible.

Instead attempting to do this via the API creates an MSI Enabled cluster and ignores the Service
Principal that's specified - as such it's only possible to create a Service Principal or MSI cluster,
not the Service Principal cluster with a Managed Identity attached as before.
…efault node pool

Also fixing a bug where min/max count would be unmodified during updates
@aristosvo
Copy link
Collaborator

aristosvo commented Apr 3, 2020

For the MSI it would make sense to also expose the identity profile with kubelet identity, as exposed from the API:

"identityProfile": {
    "kubeletidentity": {
      "clientId": "00000000000000000000000000000000",
      "objectId": "0000000000000000000000000000000",
      "resourceId": "/subscriptions/00000000000000000000000/resourcegroups/MC_xxxxxxxx_westeurope/providers/Microsoft.ManagedIdentity/userAssignedIdentities/xxxxxx-agentpool"
    }
  },

Do you prefer a separate issue and pull request or could it be included here?

…an `0`

When coalescing between these values - this should be set to `null` rather
than `0` - and fixes the error message which caused the initial confusion.
Whilst setting this field to `0` is convenient - it breaks validation for
when auto-scaling is enabled - as such `0` shouldn't be an allowed value
for `min_count` or `max_count` (since node pools can't be scaled to `0` in
the API)

Instead users can coaslesce this via:

```
locals {
  enable_auto_scale = false
  fixed_count = 2
  min_count   = 1
  max_count   = 4
}

resource "azurerm_kubernetes_cluster_node_pool" "test" {
  # ...
  enable_auto_scaling = local.enable_auto_scale
  node_count          = local.enable_auto_scale ? null : local.fixed_count
  min_count           = local.enable_auto_scale ? local.min_count : null
  max_count           = local.enable_auto_scale ? local.max_count : null
}
```
Copy link
Contributor

@tombuildsstuff tombuildsstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @wagnst

Firstly - thanks for this PR.

The AKS Service has changed quite a bit in the last few weeks - including a number of breaking changes to the AKS API. Having spent some time digging into those API changes - we ended up having a chat with the AKS Team last week to clarify those changes.

With regards to the breaking changes, there's several things to fix here:

  1. Making the service_principal block optional (incorporating service_principal in aks no longer required #6205)
  2. Making the windows_profile block Computed, since a set of Windows credentials are generated if not specified
  3. Handling mixed-mode clusters (e.g. a Service Principal with a Managed Identity) - which are no longer supported due to a breaking change in the API (making any change to the cluster will force-update them on Azure's side to only use MSI)
  4. Updating the acceptance tests/documentation to use MSI rather than a Service Principal - since this is the new recommended approach

Previously @WodansSon has pushed a few commits to this PR to try and resolve those breaking changes in the API - and I hope you don't mind but I'm going to rebase this on top of master and then push a few more.


From our side, rather than setting the fields min_count and max_count to 0 - it'd be better for these to be set to null (since 0 isn't a valid value for this field). As such I believe the root-cause of this confusion is a bug in the error message, which requires updating here.

When wanting to be able to conditionally create an auto-scaled/manually-scaled cluster using the same code, you can instead use the value null rather than 0 for the min_count and max_count fields; for example:

locals {
  enable_auto_scale = false
  fixed_count = 2
  min_count   = 1
  max_count   = 4
}

resource "azurerm_kubernetes_cluster" "test" {
  # ...

  default_node_pool {
    enable_auto_scaling = local.enable_auto_scale
    node_count          = local.enable_auto_scale ? null : local.fixed_count
    min_count           = local.enable_auto_scale ? local.min_count : null
    max_count           = local.enable_auto_scale ? local.max_count : null
  }
}

This allows the validation for the min_count and max_count fields to be valid for when auto-scaling is enabled. Whilst I appreciate this does diverge from the original intention of this Pull Request - I believe this should fix #6020 whilst also keeping the validation correct here.

As such I hope you don't mind but I'm going to push the changes required to fix the breaking changes in the azurerm_kubernetes_cluster resource to this PR - which should mean that we can ship this fix, and the solution to the breaking API changes, in the v2.5 release of the Azure Provider.

Thanks!

@ghost ghost added dependencies size/XXL and removed size/XL labels Apr 6, 2020
@tombuildsstuff tombuildsstuff changed the title azurerm_kubernetes_cluster: Cannot set default_pool max_count and min_count to zero when enable_auto_scaling is false * r/kubernetes_cluster: handling breaking changes in the AKS API Apr 6, 2020
@tombuildsstuff tombuildsstuff changed the title * r/kubernetes_cluster: handling breaking changes in the AKS API r/kubernetes_cluster: handling breaking changes in the AKS API Apr 6, 2020
@tombuildsstuff
Copy link
Contributor

Ignoring a few broken tests due to this bug in the AKS API (which is also an issue before this PR) - the acceptance tests look good to me:

Screenshot 2020-04-06 at 13 58 43

@tombuildsstuff
Copy link
Contributor

@aristosvo just seen this - would you mind opening a new issue for that? A PR would be great too - however it's probably easiest to wait until after this PR is merged given the changes to the AKS resource required to handle the breaking changes to the API here

Thanks!

@ghost
Copy link

ghost commented Apr 9, 2020

This has been released in version 2.5.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.5.0"
}
# ... other configuration ...

@wagnst wagnst deleted the patch-1 branch April 11, 2020 14:24
@ghost
Copy link

ghost commented May 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.