Skip to content

Commit

Permalink
Add support to ignore_size attribute on nodepools
Browse files Browse the repository at this point in the history
  • Loading branch information
robo-cap committed Jun 28, 2024
1 parent 02aeaeb commit baff8a3
Show file tree
Hide file tree
Showing 7 changed files with 220 additions and 5 deletions.
53 changes: 52 additions & 1 deletion docs/src/guide/extensions_cluster_autoscaler.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Extensions: Cluster Autoscaler
# Extensions: Standalone Cluster Autoscaler

Deployed using the [cluster-autoscaler Helm chart](https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler) with configuration from the `worker_pools` variable.

Expand All @@ -13,6 +13,57 @@ The following parameters may be added on each pool definition to enable manageme
* `min_size`: Define the minimum scale of a pool managed by the cluster autoscaler. Defaults to `size` when not provided.
* `max_size`: Define the maximum scale of a pool managed by the cluster autoscaler. Defaults to `size` when not provided.

The cluster-autoscaler will manage the size of the nodepools with the attribute `autoscale = true`. To avoid the conflict between the actual `size` of a nodepool and the `size` defined in the terraform configuration files, you can add the `ignore_size = true` attribute to the nodepool definition in the `worker_pools` variable. This parameter will allow terraform to ignore the [drift](https://developer.hashicorp.com/terraform/tutorials/state/resource-drift) of the size parameter for the specific nodepool.

This setting is strongly recommended for nodepools configured with `autoscale = true`.

Example:

```
worker_pools = {
np-autoscaled = {
description = "Node pool managed by cluster autoscaler",
size = 2,
min_size = 1,
max_size = 3,
autoscale = true,
ignore_size = true # allow nodepool size drift
},
np-autoscaler = {
description = "Node pool with cluster autoscaler scheduling allowed",
size = 1,
allow_autoscaler = true,
},
}
```


For existing deployments is necessary to use the [terraform state mv](https://developer.hashicorp.com/terraform/cli/commands/state/mv) command.

Example:
```
$ terraform plan
...
Terraform will perform the following actions:
# module.oke.module.workers[0].oci_containerengine_node_pool.tfscaled_workers["np-autoscaled"] will be destroyed
...
# module.oke.module.workers[0].oci_containerengine_node_pool.autoscaled_workers["np-autoscaled"] will be created
$ terraform state mv module.oke.module.workers[0].oci_containerengine_node_pool.tfscaled_workers[\"np-autoscaled\"] module.oke.module.workers[0].oci_containerengine_node_pool.autoscaled_workers[\"np-autoscaled\"]
Successfully moved 1 object(s).
$ terraform plan
...
No changes. Your infrastructure matches the configuration.
```

### Notes

Don't set `allow_autoscaler` and `autoscale` to `true` on the same pool. This will cause the cluster autoscaler pod to be unschedulable as the `oke.oraclecloud.com/cluster_autoscaler: managed` node label will override the `oke.oraclecloud.com/cluster_autoscaler: allowed` node label specified by the cluster autoscaler `nodeSelector` pod attribute.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
## Workers
<!-- BEGIN_TF_WORKERS -->

* [oci_containerengine_node_pool.workers](https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/containerengine_node_pool)
* [oci_containerengine_node_pool.tfscaled_workers](https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/containerengine_node_pool)
* [oci_containerengine_virtual_node_pool.workers](https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/containerengine_virtual_node_pool)
* [oci_core_cluster_network.workers](https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/core_cluster_network)
* [oci_core_instance.workers](https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/core_instance)
Expand Down
2 changes: 2 additions & 0 deletions examples/workers/vars-workers-advanced.auto.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ worker_pools = {
os = "Oracle Linux",
os_version = "7",
autoscale = true,
ignore_size = true
},
wg_np-vm-ol8 = {
description = "OKE-managed Node Pool with OKE Oracle Linux 8 image",
Expand All @@ -43,6 +44,7 @@ worker_pools = {
os = "Oracle Linux",
os_version = "8",
autoscale = true,
ignore_size = true
},
wg_np-vm-custom = {
description = "OKE-managed Node Pool with custom image",
Expand Down
1 change: 1 addition & 0 deletions examples/workers/vars-workers-autoscaling.auto.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ worker_pools = {
min_size = 1,
max_size = 3,
autoscale = true,
ignore_size = true
},
np-autoscaler = {
description = "Node pool with cluster autoscaler scheduling allowed",
Expand Down
5 changes: 5 additions & 0 deletions migration.tf
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,8 @@ moved {
from = module.oke.oci_containerengine_node_pool.nodepools
to = module.workers[0].oci_containerengine_node_pool.workers
}

moved {
from = module.workers[0].oci_containerengine_node_pool.workers
to = module.workers[0].oci_containerengine_node_pool.tfscaled_workers
}
3 changes: 2 additions & 1 deletion modules/workers/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ locals {
eviction_grace_duration = 300
force_node_delete = true
extended_metadata = {} # empty pool-specific default
ignore_size = false
image_id = var.image_id
image_type = var.image_type
kubernetes_version = var.kubernetes_version
Expand Down Expand Up @@ -231,7 +232,7 @@ locals {
}

# Maps of worker pool OCI resources by pool name enriched with desired/custom parameters for various modes
worker_node_pools = { for k, v in oci_containerengine_node_pool.workers : k => merge(v, lookup(local.worker_pools_final, k, {})) }
worker_node_pools = { for k, v in merge(oci_containerengine_node_pool.tfscaled_workers, oci_containerengine_node_pool.autoscaled_workers) : k => merge(v, lookup(local.worker_pools_final, k, {})) }
worker_virtual_node_pools = { for k, v in oci_containerengine_virtual_node_pool.workers : k => merge(v, lookup(local.worker_pools_final, k, {})) }
worker_instance_pools = { for k, v in oci_core_instance_pool.workers : k => merge(v, lookup(local.worker_pools_final, k, {})) }
worker_cluster_networks = { for k, v in oci_core_cluster_network.workers : k => merge(v, lookup(local.worker_pools_final, k, {})) }
Expand Down
159 changes: 157 additions & 2 deletions modules/workers/nodepools.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl

# Dynamic resource block for Node Pool groups defined in worker_pools
resource "oci_containerengine_node_pool" "workers" {
resource "oci_containerengine_node_pool" "tfscaled_workers" {
# Create an OKE node pool resource for each enabled entry of the worker_pools map with that mode.
for_each = local.enabled_node_pools
for_each = { for key, value in local.enabled_node_pools: key => value if lookup(value, "ignore_size", false) == false }
cluster_id = var.cluster_id
compartment_id = each.value.compartment_id
defined_tags = each.value.defined_tags
Expand Down Expand Up @@ -156,3 +156,158 @@ resource "oci_containerengine_node_pool" "workers" {
}
}
}

resource "oci_containerengine_node_pool" "autoscaled_workers" {
# Create an OKE node pool resource for each enabled entry of the worker_pools map with that mode.
for_each = { for key, value in local.enabled_node_pools: key => value if lookup(value, "ignore_size", false) == true }
cluster_id = var.cluster_id
compartment_id = each.value.compartment_id
defined_tags = each.value.defined_tags
freeform_tags = each.value.freeform_tags
kubernetes_version = each.value.kubernetes_version
name = each.key
node_shape = each.value.shape
ssh_public_key = var.ssh_public_key

node_config_details {
size = each.value.size
is_pv_encryption_in_transit_enabled = each.value.pv_transit_encryption
kms_key_id = each.value.volume_kms_key_id
nsg_ids = each.value.nsg_ids
defined_tags = each.value.defined_tags
freeform_tags = each.value.freeform_tags

dynamic "placement_configs" {
for_each = each.value.availability_domains
iterator = ad

content {
availability_domain = ad.value
capacity_reservation_id = each.value.capacity_reservation_id
subnet_id = each.value.subnet_id

# Value(s) specified on pool, or null to select automatically
fault_domains = try(each.value.placement_fds, null)

dynamic "preemptible_node_config" {
for_each = each.value.preemptible_config.enable ? [1] : []
content {
preemption_action {
type = "TERMINATE"
is_preserve_boot_volume = each.value.preemptible_config.is_preserve_boot_volume
}
}
}
}
}

dynamic "node_pool_pod_network_option_details" {
for_each = var.cni_type == "flannel" ? [1] : []
content { # Flannel requires cni type only
cni_type = "FLANNEL_OVERLAY"
}
}

dynamic "node_pool_pod_network_option_details" {
for_each = var.cni_type == "npn" ? [1] : []
content { # VCN-Native requires max pods/node, nsg ids, subnet ids
cni_type = "OCI_VCN_IP_NATIVE"
max_pods_per_node = each.value.max_pods_per_node
pod_nsg_ids = compact(tolist(each.value.pod_nsg_ids))
pod_subnet_ids = compact(tolist([each.value.pod_subnet_id]))
}
}
}

node_metadata = merge(
{
apiserver_host = var.apiserver_private_host
oke-kubeproxy-proxy-mode = var.kubeproxy_mode
user_data = lookup(lookup(data.cloudinit_config.workers, each.key, {}), "rendered", "")
},

# Only provide cluster DNS service address if set explicitly; determined automatically in practice.
coalesce(var.cluster_dns, "none") == "none" ? {} : { kubedns_svc_ip = var.cluster_dns },

# Extra user-defined fields merged last
var.node_metadata, # global
lookup(each.value, "node_metadata", {}), # pool-specific
)

node_eviction_node_pool_settings {
eviction_grace_duration = (floor(tonumber(each.value.eviction_grace_duration) / 60) > 0 ?
(each.value.eviction_grace_duration > 3600 ?
format("PT%dM", 60) :
(each.value.eviction_grace_duration % 60 == 0 ?
format("PT%dM", floor(each.value.eviction_grace_duration / 60)) :
format("PT%dM%dS", floor(each.value.eviction_grace_duration / 60), each.value.eviction_grace_duration % 60)
)
) :
format("PT%dS", each.value.eviction_grace_duration)
)
is_force_delete_after_grace_duration = tobool(each.value.force_node_delete)
}

dynamic "node_shape_config" {
for_each = length(regexall("Flex", each.value.shape)) > 0 ? [1] : []
content {
ocpus = each.value.ocpus
memory_in_gbs = ( # If > 64GB memory/core, correct input to exactly 64GB memory/core
(each.value.memory / each.value.ocpus) > 64 ? each.value.ocpus * 64 : each.value.memory
)
}
}

node_pool_cycling_details {
is_node_cycling_enabled = each.value.node_cycling_enabled
maximum_surge = each.value.node_cycling_max_surge
maximum_unavailable = each.value.node_cycling_max_unavailable
}

node_source_details {
boot_volume_size_in_gbs = each.value.boot_volume_size
image_id = each.value.image_id
source_type = "image"
}

lifecycle { # prevent resources changes for changed fields
ignore_changes = [
# kubernetes_version, # e.g. if changed as part of an upgrade
name, defined_tags, freeform_tags,
node_metadata["user_data"], # templated cloud-init
node_config_details[0].placement_configs, # dynamic placement configs
node_config_details[0].size # size
]

precondition {
condition = coalesce(each.value.image_id, "none") != "none"
error_message = <<-EOT
Missing image_id; check provided value if image_type is 'custom', or image_os/image_os_version if image_type is 'oke' or 'platform'.
pool: ${each.key}
image_type: ${coalesce(each.value.image_type, "none")}
image_id: ${coalesce(each.value.image_id, "none")}
EOT
}

precondition {
condition = anytrue([
contains(["instance-pool", "cluster-network"], each.value.mode), # supported modes
length(lookup(each.value, "secondary_vnics", {})) == 0, # unrestricted when empty/unset
])
error_message = "Unsupported option for mode=${each.value.mode}: secondary_vnics"
}

precondition {
condition = coalesce(each.value.capacity_reservation_id, "none") == "none" || length(each.value.availability_domains) == 1
error_message = "A single availability domain must be specified when using a capacity reservation with mode=${each.value.mode}"
}
}

dynamic "initial_node_labels" {
for_each = each.value.node_labels
content {
key = initial_node_labels.key
value = initial_node_labels.value
}
}
}

0 comments on commit baff8a3

Please sign in to comment.