-
Notifications
You must be signed in to change notification settings - Fork 1k
Replace EKS test-infra with example #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,56 +1,68 @@ | ||||||
| # Amazon EKS Clusters | ||||||
| # EKS (Amazon Elastic Kubernetes Service) | ||||||
|
|
||||||
| You will need the standard AWS environment variables to be set, e.g. | ||||||
| This example shows how to use the Terraform Kubernetes Provider and Terraform Helm Provider to configure an EKS cluster. The example config builds the EKS cluster and applies the Kubernetes configurations in a single operation. This guide will also show you how to make changes to the underlying EKS cluster in such a way that Kuberntes/Helm resources are recreated after the underlying cluster is replaced. | ||||||
|
|
||||||
| You will need the following environment variables to be set: | ||||||
|
|
||||||
| - `AWS_ACCESS_KEY_ID` | ||||||
| - `AWS_SECRET_ACCESS_KEY` | ||||||
|
|
||||||
| See [AWS Provider docs](https://www.terraform.io/docs/providers/aws/index.html#configuration-reference) for more details about these variables | ||||||
| and alternatives, like `AWS_PROFILE`. | ||||||
| See [AWS Provider docs](https://www.terraform.io/docs/providers/aws/index.html#configuration-reference) for more details about these variables and alternatives, like `AWS_PROFILE`. | ||||||
|
|
||||||
| ## Versions | ||||||
| Ensure that `KUBE_CONFIG_FILE` and `KUBE_CONFIG_FILES` environment variables are NOT set, as they will interfere with the cluster build. | ||||||
|
|
||||||
| You can set the desired version of Kubernetes via the `kubernetes_version` TF variable. | ||||||
| ``` | ||||||
| unset KUBE_CONFIG_FILE | ||||||
dak1n1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| unset KUBE_CONFIG_FILES | ||||||
| ``` | ||||||
|
|
||||||
| See https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html for currently available versions. | ||||||
| To install the EKS cluster using default values, run terraform init and apply from the directory containing this README. | ||||||
|
|
||||||
| You can set the desired version of Kubernetes via the `kubernetes_version` TF variable, like this: | ||||||
| ``` | ||||||
| export TF_VAR_kubernetes_version="1.11" | ||||||
| terraform init | ||||||
| terraform apply | ||||||
| ``` | ||||||
| Alternatively you can pass it to the `apply` command line, like below. | ||||||
|
|
||||||
| ## Worker node count and instance type | ||||||
| ## Kubeconfig for manual CLI access | ||||||
|
|
||||||
| You can control the amount of worker nodes in the cluster as well as their machine type, using the following variables: | ||||||
| This example generates a kubeconfig file in the current working directory. However, the token in this config expires in 15 minutes. The token can be refreshed by running `terraform apply` again. Export the KUBECONFIG to manually access the cluster: | ||||||
|
|
||||||
| - `TF_VAR_workers_count` | ||||||
| - `TF_VAR_workers_type` | ||||||
| ``` | ||||||
| terraform apply | ||||||
| export KUBECONFIG=$(terraform output -raw kubeconfig_path) | ||||||
| kubectl get pods -n test | ||||||
| ``` | ||||||
|
|
||||||
| Export values for them or pass them to the apply command line. | ||||||
| ## Optional variables | ||||||
|
|
||||||
| ## Build the cluster | ||||||
| The Kubernetes version can be specified at apply time: | ||||||
|
|
||||||
| ``` | ||||||
| terraform init | ||||||
| terraform apply -var=kubernetes_version=1.11 | ||||||
| terraform apply -var=kubernetes_version=1.18 | ||||||
| ``` | ||||||
|
|
||||||
| ## Exporting K8S variables | ||||||
| To access the cluster you need to export the `KUBECONFIG` variable pointing to the `kubeconfig` file for the current cluster. | ||||||
| ``` | ||||||
| export KUBECONFIG="$(terraform output kubeconfig_path)" | ||||||
| ``` | ||||||
| See https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html for currently available versions. | ||||||
|
|
||||||
|
|
||||||
| Now you can access the cluster via `kubectl` and you can run acceptance tests against it. | ||||||
| ### Worker node count and instance type | ||||||
|
|
||||||
| The number of worker nodes, and the instance type, can be specified at apply time: | ||||||
|
|
||||||
| To run acceptance tests, your the following command in the root of the repository. | ||||||
| ``` | ||||||
| TESTARGS="-run '^TestAcc'" make testacc | ||||||
| terraform apply -var=workers_count=4 -var=workers_type=m4.xlarge | ||||||
| ``` | ||||||
|
|
||||||
| To run only a specific set of tests, you can replace `^TestAcc` with any regular expression to filter tests by name. | ||||||
| For example, to run tests for Pod resources, you can do: | ||||||
| ## Additional configuration of EKS | ||||||
|
|
||||||
| To view all available configuration options for the EKS module used in this example, see [terraform-aws-modules/eks docs](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest). | ||||||
|
|
||||||
| ## Replacing the EKS cluster and re-creating the Kubernetes / Helm resources | ||||||
|
|
||||||
| When the cluster is initially created, the Kubernetes and Helm providers will not be initialized until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm providers, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials, to avoid passing outdated credentials into the providers. | ||||||
|
||||||
| When the cluster is initially created, the Kubernetes and Helm providers will not be initialized until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm providers, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials, to avoid passing outdated credentials into the providers. | |
| When the cluster is initially created, the Kubernetes and Helm providers will not attempt to read from the Kubernetes API until authentication details are created for the cluster. However, for future operations that may involve replacing the underlying cluster (for example, changing the network where the EKS cluster resides), the EKS cluster will have to be targeted without the Kubernetes/Helm resources in state, as shown below. This is done by removing the `module.kubernetes-config` from Terraform State prior to replacing cluster credentials. Alternatively, keeping the EKS infrastructure and Kubernetes infrastructure in different state files will avoid this scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stef, thanks for breaking things down in such detail.
I appreciate the overall solution that you are proposing here and I think removing the state of the Kubernetes resources deliberately via the state rm command before replacing the cluster is a clever and very valid solution. I think we should promote that and encourage users to adopt it. It follows good practice patterns, by making actions explicit rather than implicit and limiting blast radius of each action. I like it.
What I'm not comfortable with is the messaging in the first phrase only. It can be interpreted by users as stating for a fact that Terraform offers any kind of sequencing guarantees that "the Kubernetes and Helm providers will not attempt to read from the Kubernetes API until authentication details are created for the cluster".
This causality, I'm afraid, is not sustained by the way Terraform is currently designed. After multiple conversations with Terraform engineers, the conclusion has always been that this is not by-design behaviour. The fact that it works (as clearly you have experienced first-hand) is a side-effect and side-effects are dangerous to promote as reliable fact.
In fact, Terraform documentation clearly advises against relying on this assumption here: https://www.terraform.io/docs/language/providers/configuration.html#provider-configuration-1. About midway in that section, it reads:
"You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration)."
I think it's in our users' best interest (but also ours) for us to send a clear and cohesive message about how to properly use Terraform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll give this more thought. Maybe for now, we can just remove this README from the test-infra dir. It was just a copy/paste of our example README anyway. And we can continue the discussion and tuning this file over in the PR where I'm making changes to this README in a more user-facing way. This discussion is also something we can think about for our upcoming meeting regarding the Kubernetes provider authentication issues. (Phil scheduled it for us yesterday -- we'll meet in about 2 weeks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the paragraph in question. It's ok to just wait until the meeting to discuss it further, since we might not do much work on the other PR until we have that talk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great way forward! I'm happy to do a deep dive in this topic online and exchange ideas.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| resource "kubernetes_config_map" "name" { | ||
| metadata { | ||
| name = "aws-auth" | ||
| namespace = "kube-system" | ||
| } | ||
|
|
||
| data = { | ||
| mapRoles = join( | ||
| "\n", | ||
| formatlist(local.mapped_role_format, var.k8s_node_role_arn), | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| # Optional: this kubeconfig file is only used for manual CLI access to the cluster. | ||
| resource "null_resource" "generate-kubeconfig" { | ||
| provisioner "local-exec" { | ||
| command = "aws eks update-kubeconfig --name ${var.cluster_name} --kubeconfig ${path.root}/kubeconfig" | ||
| } | ||
| } | ||
|
|
||
| resource "kubernetes_namespace" "test" { | ||
| metadata { | ||
| name = "test" | ||
| } | ||
| } | ||
|
|
||
| resource "kubernetes_deployment" "test" { | ||
| metadata { | ||
| name = "test" | ||
| namespace= kubernetes_namespace.test.metadata.0.name | ||
| } | ||
| spec { | ||
| replicas = 2 | ||
| selector { | ||
| match_labels = { | ||
| app = "test" | ||
| } | ||
| } | ||
| template { | ||
| metadata { | ||
| labels = { | ||
| app = "test" | ||
| } | ||
| } | ||
| spec { | ||
| container { | ||
| image = "nginx:1.19.4" | ||
| name = "nginx" | ||
|
|
||
| resources { | ||
| limits = { | ||
| memory = "512M" | ||
| cpu = "1" | ||
| } | ||
| requests = { | ||
| memory = "256M" | ||
| cpu = "50m" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| resource helm_release nginx_ingress { | ||
| name = "nginx-ingress-controller" | ||
|
|
||
| repository = "https://charts.bitnami.com/bitnami" | ||
| chart = "nginx-ingress-controller" | ||
|
|
||
| set { | ||
| name = "service.type" | ||
| value = "ClusterIP" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| variable "k8s_node_role_arn" { | ||
| type = string | ||
| } | ||
|
|
||
| variable "cluster_name" { | ||
| type = string | ||
| } | ||
|
|
||
| locals { | ||
| mapped_role_format = <<MAPPEDROLE | ||
| - rolearn: %s | ||
| username: system:node:{{EC2PrivateDNSName}} | ||
| groups: | ||
| - system:bootstrappers | ||
| - system:nodes | ||
| MAPPEDROLE | ||
|
|
||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,12 @@ | ||
| terraform { | ||
| required_providers { | ||
| kubernetes = { | ||
| source = "hashicorp/kubernetes" | ||
| version = "1.13" | ||
| source = "hashicorp/kubernetes" | ||
| version = ">= 2.0.2" | ||
| } | ||
| helm = { | ||
| source = "hashicorp/helm" | ||
| version = ">= 2.0.2" | ||
| } | ||
| aws = { | ||
| source = "hashicorp/aws" | ||
|
|
@@ -11,6 +15,48 @@ terraform { | |
| } | ||
| } | ||
|
|
||
| data "aws_eks_cluster" "default" { | ||
| name = module.cluster.cluster_id | ||
| } | ||
|
|
||
| # This configuration relies on a plugin binary to fetch the token to the EKS cluster. | ||
| # The main advantage is that the token will always be up-to-date, even when the `terraform apply` runs for | ||
| # a longer time than the token TTL. The downside of this approach is that the binary must be present | ||
| # on the system running terraform, either in $PATH as shown below, or in another location, which can be | ||
| # specified in the `command`. | ||
| # See the commented provider blocks below for alternative configuration options. | ||
| provider "kubernetes" { | ||
| host = data.aws_eks_cluster.default.endpoint | ||
| cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data) | ||
| exec { | ||
| api_version = "client.authentication.k8s.io/v1alpha1" | ||
| args = ["eks", "get-token", "--cluster-name", module.vpc.cluster_name] | ||
| command = "aws" | ||
| } | ||
| } | ||
|
|
||
| # This configuration is also valid, but the token may expire during long-running applies. | ||
| # data "aws_eks_cluster_auth" "default" { | ||
| # name = module.cluster.cluster_id | ||
| #} | ||
| #provider "kubernetes" { | ||
| # host = data.aws_eks_cluster.default.endpoint | ||
| # cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data) | ||
| # token = data.aws_eks_cluster_auth.default.token | ||
| #} | ||
|
|
||
| provider "helm" { | ||
| kubernetes { | ||
| host = data.aws_eks_cluster.default.endpoint | ||
| cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data) | ||
| exec { | ||
| api_version = "client.authentication.k8s.io/v1alpha1" | ||
| args = ["eks", "get-token", "--cluster-name", module.vpc.cluster_name] | ||
| command = "aws" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| provider "aws" { | ||
| region = var.region | ||
| } | ||
|
|
@@ -21,22 +67,26 @@ module "vpc" { | |
|
|
||
| module "cluster" { | ||
| source = "terraform-aws-modules/eks/aws" | ||
| version = "v13.2.1" | ||
| version = "14.0.0" | ||
|
|
||
| vpc_id = module.vpc.vpc_id | ||
| subnets = module.vpc.subnets | ||
|
|
||
| cluster_name = module.vpc.cluster_name | ||
| cluster_version = var.kubernetes_version | ||
| manage_aws_auth = false | ||
| # This kubeconfig expires in 15 minutes, so we'll use another method. | ||
| manage_aws_auth = false # Managed in ./kubernetes-config/main.tf instead. | ||
| # This kubeconfig expires in 15 minutes, so we'll use an exec block instead. | ||
| # See ./kubernetes-config/main.tf provider block for details. | ||
| write_kubeconfig = false | ||
|
|
||
| workers_group_defaults = { | ||
| root_volume_type = "gp2" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GP2 volumes are generally more expensive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gp2 was the default in the EKS module, but they recently updated it to gp3. I put it back to the old default because gp3 isn't available in all regions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the commit where it was added. terraform-aws-modules/terraform-aws-eks@76537d1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, if that's the case then disregard my comment. I must have based it on old information and likely GPx volumes are now the only choice. Sorry for the confusion :) |
||
| } | ||
| worker_groups = [ | ||
| { | ||
| instance_type = var.workers_type | ||
| asg_desired_capacity = var.workers_count | ||
| asg_max_size = "10" | ||
| asg_max_size = 4 | ||
| }, | ||
| ] | ||
|
|
||
|
|
@@ -45,11 +95,8 @@ module "cluster" { | |
| } | ||
| } | ||
|
|
||
| module "node-config" { | ||
| source = "./node-config" | ||
| k8s_node_role_arn = list(module.cluster.worker_iam_role_arn) | ||
| cluster_ca = module.cluster.cluster_certificate_authority_data | ||
| cluster_name = module.cluster.cluster_id # creates dependency on cluster creation | ||
| cluster_endpoint = module.cluster.cluster_endpoint | ||
| cluster_oidc_issuer_url = module.cluster.cluster_oidc_issuer_url | ||
| module "kubernetes-config" { | ||
| cluster_name = module.cluster.cluster_id # creates dependency on cluster creation | ||
| source = "./kubernetes-config" | ||
| k8s_node_role_arn = module.cluster.worker_iam_role_arn | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.