Terraform module for a GKE Kubernetes Cluster in GCP
If you want to utilize this feature make sure to declare a helm provider in your terraform configuration as follows.
provider "helm" {
version = "2.1.2" # see https://github.com/terraform-providers/terraform-provider-helm/releases
kubernetes {
host = module.gke_cluster.cluster_endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = module.gke_cluster.cluster_ca_certificate
}
}Pay attention to the gke_cluster module output variables used here.
If you are using the namespace variable, you may get an error like the following:
Error: Get "http://localhost/api/v1/namespaces/<namespace_name>": dial tcp 127.0.0.1:80: connect: connection refused
In order to fix this, you need to declare a kubernetes provider in your terraform configuration like the following.
provider "kubernetes" {
version = "1.13.3" # see https://github.com/terraform-providers/terraform-provider-kubernetes/releases
load_config_file = false
host = module.gke_cluster.cluster_endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = module.gke_cluster.cluster_ca_certificate
}
data "google_client_config" "google_client" {}Pay attention to the gke_cluster module output variables used here.
Drop the use of attributes such as node_count_initial_per_zone and/or node_count_current_per_zone (if any) from the list of objects in var.node_pools.
This upgrade performs 2 changes:
- Move the declaration of kubernetes secrets into the declaration of kubernets namesapces
- see the Pull Request description at airasia#7
- Ability to create multiple ingress IPs for istio
- read below
Detailed steps provided below:
- Upgrade
gke_clustermodule version to2.7.1 - Run
terraform plan- DO NOT APPLY this plan- the plan may show that some
istioresource(s) (if used any) will be destroyed - we want to avoid any kind of destruction and/or recreation
- P.S. to resolve any changes proposed for
kubernetes_secretresource(s), please refer to this Pull Request description instead
- the plan may show that some
- Set the
istio_ip_namesvariable with at least one item as["ip"]- this is so that the istio IP resource name is backward-compaitble
- Run
terraform plan- DO NOT APPLY this plan- now, the plan may show that a
static_istio_ipresource (if used any) will be destroyed and recreated under new named index - we want to avoid any kind of destruction and/or recreation
- P.S. to resolve any changes proposed for
kubernetes_secretresource(s), please refer to this Pull Request description instead
- now, the plan may show that a
- Move the terraform states
- notice that the plan says your existing static_istio_ip resource (let's say
istioIpX) will be destroyed and new static_istio_ip resource (let's sayistioIpY) will be created - pay attention to the array indexes:
- the
*Xresources (the ones to be destroyed) start with array index[0]- although it may not show[0]in the displayed plan - the
*Yresources (the ones to be created) will show array index with new named index
- the
- Use
terraform state mvto manually move the state ofistioIpXtoistioIpY- refer to https://www.terraform.io/docs/commands/state/mv.html to learn more about how to move Terraform state positions
- once a resource is moved, it will say
Successfully moved 1 object(s).
- notice that the plan says your existing static_istio_ip resource (let's say
- Run
terraform planagain- the plan should now show that no changes required
- this confirms that you have successfully moved all your resources' states to their new position as required by
v2.7.1.
- DONE
- Upgrade
gke_clustermodule version to2.5.1 - Run
terraform plan- DO NOT APPLY this plan- the plan will show that several resources will be destroyed and recreated under new named indexes
- we want to avoid any kind of destruction and/or recreation
- Move the terraform states
- notice that the plan says your existing static_ingress_ip resource(s) (let's say
ingressIpX) will be destroyed and new static_ingress_ip resource(s) (let's sayingressIpY) will be created - also notice that the plan says your existing kubernetes_namespace resource(s) (let's say
namespaceX) will be destroyed and new kubernetes_namespace resource(s) (let's saynamespaceY) will be created - P.S. if you happen to have multiple static_ingress_ip resource(s) and kubernetes_namespace resource(s), then the plan will show these destructions and recreations multiple times. You will need to move the states for EACH of the respective resources one-by-one.
- pay attention to the array indexes:
- the
*Xresources (the ones to be destroyed) start with array index[0]- although it may not show[0]in the displayed plan - the
*Yresources (the ones to be created) will show array indexes with new named indexes
- the
- Use
terraform state mvto manually move the states of each ofingressIpXtoingressIpY, and to move the states of each ofnamespaceXtonamespaceY- refer to https://www.terraform.io/docs/commands/state/mv.html to learn more about how to move Terraform state positions
- once a resource is moved, it will say
Successfully moved 1 object(s). - repeat until all relevant states are moved to their desired positions
- notice that the plan says your existing static_ingress_ip resource(s) (let's say
- Run
terraform planagain- the plan should now show that no changes required
- this confirms that you have successfully moved all your resources' states to their new position as required by
v2.5.1.
- DONE
This upgrade process will:
- drop the use of auxiliary node pools (if any)
- create a new node pool under terraform's array structure
- migrate eixsting deployments/workloads from old node pool to new node pool
- delete old standalone node pool as it's no longer required
Detailed steps provided below:
- While on
v2.2.2, remove the variablescreate_auxiliary_node_poolandauxiliary_node_pool_config.- run
terraform plan&terraform apply - this will remove any
auxiliary_node_poolthat may have been there
- run
- Upgrade gke_cluster module to
v2.3.1and set variablenode_poolswith its required params.- value of
node_pool_namefor the new node pool must be different from the name of the old node pool - run
terraform plan&terraform apply - this will create a new node pool as per the specs provided in
node_pools.
- value of
- Migrate existing deployments/workloads from old node pool to new node pool.
- check status of nodes
kubectl get nodes- confirm that all nodes from all node pools are shown
- confirm that all nodes have status
Ready
- check status of pods
kubectl get pods -o=wide- confirm that all pods have status
Running - confirm that all pods are running on nodes from the old node pool
- cordon the old node pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=<OLD_NODE_POOL_NAME> -o=name); do kubectl cordon "$node"; done- replace <OLD_NODE_POOL_NAME> with the correct value- check status of nodes
kubectl get nodes- confirm that all nodes from the old node pools have status
Ready,SchedulingDisabled - confirm that all nodes from the new node pools have status
Ready
- check status of pods
kubectl get pods -o=wide- confirm that all pods still have status
Running - confirm that all pods are still running on nodes from the old node pool
- initiate rolling restart of all deployments
kubectl rollout restart deployment <DEPLOYMENT_1_NAME> <DEPLOYMENT_2_NAME> <DEPLOYMENT_3_NAME>- replace <DEPLOYMENT_*_NAME> with correct names of existing deployments- check status of pods
kubectl get pods -o=wide- confirm that some pods have status
Runningwhile some new pods have statusContainerCreating - confirm that the new pods with status
ContainerCreatingare running on nodes from the new node pool - repeat status checks until all pods have status
Runningand all pods are running on nodes from the new node pool only
- drain the old node pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=<OLD_NODE_POOL_NAME> -o=name); do kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node"; done- replace <OLD_NODE_POOL_NAME> with the correct value- confirm that the response says
evicting podorevictedfor all remaining pods in the old node pool - this step may take some time
- Migration complete
- check status of nodes
- Upgrade gke_cluster module to
v2.4.2and remove use of any obsolete variables.- remove standalone variables such as
machine_type,disk_size_gb,node_count_initial_per_zone,node_count_min_per_zone,node_count_max_per_zone,node_count_current_per_zonefrom the module which are no longer used for standalone node pool. - run
terraform plan&terraform apply - this will remove the old node pool completely
- remove standalone variables such as
- DONE
- While at
v1.2.9, setcreate_auxiliary_node_pooltoTrue- this will create a new additional node pool according to the values ofvar.auxiliary_node_pool_configbefore proceeding with the breaking change.- Run
terraform apply
- Run
- Migrate all workloads from existing node pool to the newly created auxiliary node pool
- Follow these instructions
- Upgrade
gke_clustermodule tov1.3.0- this will destroy and recreate the GKE node pool whiile the auxiliary node pool from step 1 will continue to serve requests of GKE cluster- Run
terraform apply
- Run
- Migrate all workloads back from the auxiliary node pool to the newly created node pool
- Follow these instructions
- While at
v1.3.0, setcreate_auxiliary_node_pooltoFalse- this will destroy the auxiliary node pool that was created in step 1 as it is no longer needed now- Run
terraform apply
- Run
- Done