-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Support for EKS Managed Node Groups #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for EKS Managed Node Groups #602
Conversation
|
I tried this out, im not sure its compatible with |
This is due to the fact that Managed Node Groups will not work without using a role that has certain policies attached. The policies that need to be attached can be seen here: EKS Node IAM Role. I can add a change to make it so that the role will be made correctly. |
max-rocket-internet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like it @wmorgan6796 💚
Since this is a very important change and many people will likely opt for MNGs in the future, especially more basic users, I think it would be good create an example under examples/managed_node_groups.
This would be a good starting point for beginners and also would ensure that the syntax is checked by the CI checks.
Could you add that also? Make it quite basic, maybe copying roughly what is in the basic example?
…now be added to node groups.
|
@max-rocket-internet I have added the example and fixed a bug that @wbertelsen found as well. |
|
We are also really looking forward to this PR, awesome work. I saw nothing in the changes regarding the aws auth config map. I didn't play much with the product, but I was expecting some simplification in that regard, as the master knows about the worker. |
|
@pierresteiner the aws auth configmap isn't for the workers, its for delegating access to the cluster for IAM roles. Managed Node Groups don't simplify that. |
|
Looks like there is an issue with the One work around is to delete the |
|
I’m unsure if it’s possible to completely remove a block if unused. Maybe it would be a good case to note this in the docs? This seems like more of a bug either on the AWS provider or the AWS API. Any suggested edits would be welcome |
|
I agree this seems like a bug with the provider. I think one potential workaround for the module is to declare two versions of each worker group (one with the remote access and one that omits it entirely). You may then be able to conditionally choose one over the other (via count trickery) depending on whether a key name is set. |
It is actually both, prior to managed node it was necessary to also grant permission to workers role so that they could join the cluster. I don't know if it is still needed (https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) though |
…etting ssh key causing a race condition within the aws provider.
Hmm, I wasn't aware of that. I'm hesitant to change too many things as I'm not familiar with the true inner workings of some parts of this module. |
I think I mitigated this issue with my latest commit. I just made it so that if the ssh key isn't set, then automatically make the |
|
Shouldn't the worker group also have a Edit: |
workers_node_groups.tf
Outdated
| } | ||
|
|
||
| resource "random_pet" "managed_node_groups" { | ||
| count = local.worker_group_managed_node_group_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every occurrence of count = local.worker_group_managed_node_group_count is strong candidate for switching to for_each = toset(var.workers_additional_policies).
This will also let you replace lookup(var.worker_group_managed_node_groups[count.index], "name", count.index) with the more elegant each.key["name"].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I replace that look up though? I would still need to provide a default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Given that node_group_name is a required argument for the eks_node_group resource it seems reasonable to ask that module users specify it along with things like the max size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The given "for_each" argument value is unsuitable: "for_each" supports maps
and sets of strings, but you have provided a set containing type object.
Can't use for each on the random_pets or for anything that references that object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wmorgan6796 I forgot that for_each requires that the key it iterates over be a string. Fortunately node groups must have unique names, so we can just create a map that uses that name as the index:
locals {
node_groups = { for obj in var.worker_group_managed_node_groups : obj.name => obj }
}Then when using for_each you should be able to do key.value["instance_type"].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eytanhanig I can't seem to fix this error when using for_each:
Error: Invalid function argument
on ../../node_groups.tf line 88, in resource "aws_eks_node_group" "workers":
87:
88: [
89:
90:
91:
92:
93:
|----------------
| count.index is 0
| random_pet.node_groups is object with 1 attribute "example"
| var.node_groups is tuple with 1 element
Invalid value for "list" parameter: element 2: string required.
Can you take a look at it and see what you can find?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you're switching from count to for_each, var.node_groups[count.index] becomes each.value.
Here's an example on how to implement this:
resource "random_pet" "node_groups" {
for_each = local.node_groups
separator = "-"
length = 2
keepers = {
instance_type = lookup(each.value, "instance_type", local.workers_group_defaults["instance_type"])
ec2_ssh_key = lookup(each.value, "key_name", local.workers_group_defaults["key_name"])
source_security_group_ids = join("-", compact(
lookup(each.value, "source_security_group_ids", local.workers_group_defaults["source_security_group_id"]
)))
node_group_name = join("-", [var.cluster_name, each.value["name"]])
}
}
resource "aws_eks_node_group" "workers" {
for_each = local.node_groups
node_group_name = join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id])
cluster_name = var.cluster_name
node_role_arn = lookup(each.value, "iam_role_arn", aws_iam_role.node_groups[0].arn)
subnet_ids = lookup(each.value, "subnets", local.workers_group_defaults["subnets"])
scaling_config {
desired_size = lookup(each.value, "node_group_desired_capacity", local.workers_group_defaults["asg_desired_capacity"])
max_size = lookup(each.value, "node_group_max_capacity", local.workers_group_defaults["asg_max_size"])
min_size = lookup(each.value, "node_group_min_capacity", local.workers_group_defaults["asg_min_size"])
}
ami_type = lookup(each.value, "ami_type", null)
disk_size = lookup(each.value, "root_volume_size", null)
instance_types = [lookup(each.value, "instance_type", null)]
labels = lookup(each.value, "node_group_k8s_labels", null)
release_version = lookup(each.value, "ami_release_version", null)
# This sometimes breaks idempotency as described in https://github.com/terraform-providers/terraform-provider-aws/issues/11063
remote_access {
ec2_ssh_key = lookup(each.value, "key_name", "") != "" ? each.value["key_name"] : null
source_security_group_ids = lookup(each.value, "key_name", "") != "" ? lookup(each.value, "source_security_group_ids", []) : null
}
version = aws_eks_cluster.this.version
tags = lookup(each.value, "node_group_additional_tags", null)
lifecycle {
create_before_destroy = true
}
}|
@eytanhanig @max-rocket-internet I updated the minimum versions of Terraform so that we can make use of the for_each blocks for resources. But the linting job for the Github CI only has terraform 0.12.2. Should that be updated or should I revert that change? |
It appears that you'll need to update the image used in lint.yml. |
Yes please also update that: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/.github/workflows/lint.yml#L59 |
|
I believe that I've found a bug in the AWS provider resource
It does become idempotent when used with a valid SSH key. I've not tested it with permutations of the To account for this I believe that that we are limited to the following options:
I've opened a ticket with the aws provider here: hashicorp/terraform-provider-aws#11063 |
node_groups.tf
Outdated
| aws_eks_cluster.this.name, | ||
| lookup(var.node_groups[count.index], "name", count.index), | ||
| random_pet.node_groups[count.index].id | ||
| random_pet.node_groups[lookup(var.node_groups[count.index], "name", count.index)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I'm getting this error:
Error: Invalid function argument
on .terraform/modules/eks/node_groups.tf line 88, in resource "aws_eks_node_group" "workers":
87:
88: [
89:
90:
91:
92:
93:
|----------------
| aws_eks_cluster.this.name is "toro-dev2"
| count.index is 0
| random_pet.node_groups is object with 1 attribute "standard"
| var.node_groups is tuple with 1 element
Invalid value for "list" parameter: element 2: string required.
looks like .id is missing here:
random_pet.node_groups[lookup(var.node_groups[count.index], "name", count.index)].id
tks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this:
resource "random_pet" "node_groups" {
for_each = local.node_groups
separator = "-"
length = 2
keepers = {
instance_type = lookup(each.value, "instance_type", local.workers_group_defaults["instance_type"])
ec2_ssh_key = lookup(each.value, "key_name", local.workers_group_defaults["key_name"])
source_security_group_ids = join("-", compact(
lookup(each.value, "source_security_group_ids", local.workers_group_defaults["source_security_group_id"]
)))
node_group_name = join("-", [var.cluster_name, each.value["name"]])
}
}
resource "aws_eks_node_group" "workers" {
for_each = local.node_groups
node_group_name = join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id])
cluster_name = var.cluster_name
node_role_arn = lookup(each.value, "iam_role_arn", aws_iam_role.node_groups[0].arn)
subnet_ids = lookup(each.value, "subnets", local.workers_group_defaults["subnets"])
scaling_config {
desired_size = lookup(each.value, "node_group_desired_capacity", local.workers_group_defaults["asg_desired_capacity"])
max_size = lookup(each.value, "node_group_max_capacity", local.workers_group_defaults["asg_max_size"])
min_size = lookup(each.value, "node_group_min_capacity", local.workers_group_defaults["asg_min_size"])
}
ami_type = lookup(each.value, "ami_type", null)
disk_size = lookup(each.value, "root_volume_size", null)
instance_types = [lookup(each.value, "instance_type", null)]
labels = lookup(each.value, "node_group_k8s_labels", null)
release_version = lookup(each.value, "ami_release_version", null)
# This sometimes breaks idempotency as described in https://github.com/terraform-providers/terraform-provider-aws/issues/11063
remote_access {
ec2_ssh_key = lookup(each.value, "key_name", "") != "" ? each.value["key_name"] : null
source_security_group_ids = lookup(each.value, "key_name", "") != "" ? lookup(each.value, "source_security_group_ids", []) : null
}
version = aws_eks_cluster.this.version
tags = lookup(each.value, "node_group_additional_tags", null)
lifecycle {
create_before_destroy = true
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eytanhanig I added the changes you suggested and the terraform plan seems to work.
|
whats the status of this? |
max-rocket-internet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE thanks @wmorgan6796!
I fixed conflict in changelog and a small fmt issue.
|
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
PR o'clock
Description
This adds support for the newly announced AWS EKS Managed Node Groups. Though the allowed config options are pretty limited, this adds initial support.
Checklist