Support for EKS Managed Node Groups #602

wmorgan6796 · 2019-11-20T15:16:17Z

PR o'clock

Description

This adds support for the newly announced AWS EKS Managed Node Groups. Though the allowed config options are pretty limited, this adds initial support.

Checklist

[x ] Change added to CHANGELOG.md. All changes must be added and breaking changes and highlighted
[ x] CI tests are passing
[x ] README.md has been updated after any changes to variables and outputs. See https://github.com/terraform-aws-modules/terraform-aws-eks/#doc-generation

wbertelsen · 2019-11-21T23:58:41Z

I tried this out, im not sure its compatible with manage_worker_iam_resources: true
Getting

Error: error creating EKS Node Group (redacted:redacted): InvalidParameterException: The provided nodeRole is invalid.
status code: 400, request id:xx

wmorgan6796 · 2019-11-22T01:12:10Z

manage_worker_iam_resources

This is due to the fact that Managed Node Groups will not work without using a role that has certain policies attached. The policies that need to be attached can be seen here: EKS Node IAM Role. I can add a change to make it so that the role will be made correctly.

max-rocket-internet

Like it @wmorgan6796 💚

Since this is a very important change and many people will likely opt for MNGs in the future, especially more basic users, I think it would be good create an example under examples/managed_node_groups.

This would be a good starting point for beginners and also would ensure that the syntax is checked by the CI checks.

Could you add that also? Make it quite basic, maybe copying roughly what is in the basic example?

…now be added to node groups.

wmorgan6796 · 2019-11-22T16:26:12Z

@max-rocket-internet I have added the example and fixed a bug that @wbertelsen found as well.

pierresteiner · 2019-11-22T16:29:32Z

We are also really looking forward to this PR, awesome work.

I saw nothing in the changes regarding the aws auth config map. I didn't play much with the product, but I was expecting some simplification in that regard, as the master knows about the worker.

wmorgan6796 · 2019-11-22T17:01:35Z

@pierresteiner the aws auth configmap isn't for the workers, its for delegating access to the cluster for IAM roles. Managed Node Groups don't simplify that.

lcycon · 2019-11-23T00:22:58Z

Looks like there is an issue with the remote_access attribute. By my read of things, it appears that a remote_access block with empty key_name and security group ID list is rewritten to a null remote access block by the AWS API. Because of this, Terraform believes it must recreate the workers on every apply.

One work around is to delete the remote_access block entirely if it is unused.

wmorgan6796 · 2019-11-23T01:34:25Z

I’m unsure if it’s possible to completely remove a block if unused. Maybe it would be a good case to note this in the docs? This seems like more of a bug either on the AWS provider or the AWS API. Any suggested edits would be welcome

lcycon · 2019-11-23T01:59:26Z

I agree this seems like a bug with the provider.

I think one potential workaround for the module is to declare two versions of each worker group (one with the remote access and one that omits it entirely). You may then be able to conditionally choose one over the other (via count trickery) depending on whether a key name is set.

pierresteiner · 2019-11-23T08:12:13Z

@pierresteiner the aws auth configmap isn't for the workers, its for delegating access to the cluster for IAM roles. Managed Node Groups don't simplify that.

It is actually both, prior to managed node it was necessary to also grant permission to workers role so that they could join the cluster. I don't know if it is still needed (https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) though

…etting ssh key causing a race condition within the aws provider.

wmorgan6796 · 2019-11-23T11:29:07Z

It is actually both, prior to managed node it was necessary to also grant permission to workers role so that they could join the cluster. I don't know if it is still needed (https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) though

Hmm, I wasn't aware of that. I'm hesitant to change too many things as I'm not familiar with the true inner workings of some parts of this module.

wmorgan6796 · 2019-11-23T11:30:37Z

I agree this seems like a bug with the provider.

I think one potential workaround for the module is to declare two versions of each worker group (one with the remote access and one that omits it entirely). You may then be able to conditionally choose one over the other (via count trickery) depending on whether a key name is set.

I think I mitigated this issue with my latest commit. I just made it so that if the ssh key isn't set, then automatically make the source_security_group_ids an empty list, regardless of what the user puts there.

splieth · 2019-11-25T11:45:16Z

Shouldn't the worker group also have a create_before_destroy = true on it? I guess right now all nodes would be destroyed before new ones would be created.

Edit:
This would also allow rolling updates on worker groups which would be awesome!

eytanhanig · 2019-11-28T01:19:25Z

workers_node_groups.tf

+}
+
+resource "random_pet" "managed_node_groups" {
+  count = local.worker_group_managed_node_group_count


Every occurrence of count = local.worker_group_managed_node_group_count is strong candidate for switching to for_each = toset(var.workers_additional_policies).

This will also let you replace lookup(var.worker_group_managed_node_groups[count.index], "name", count.index) with the more elegant each.key["name"].

Can I replace that look up though? I would still need to provide a default value

Good point. Given that node_group_name is a required argument for the eks_node_group resource it seems reasonable to ask that module users specify it along with things like the max size.

The given "for_each" argument value is unsuitable: "for_each" supports maps and sets of strings, but you have provided a set containing type object.

Can't use for each on the random_pets or for anything that references that object

@wmorgan6796 I forgot that for_each requires that the key it iterates over be a string. Fortunately node groups must have unique names, so we can just create a map that uses that name as the index:

locals { node_groups = { for obj in var.worker_group_managed_node_groups : obj.name => obj } }

Then when using for_each you should be able to do key.value["instance_type"].

@eytanhanig I can't seem to fix this error when using for_each:

Error: Invalid function argument on ../../node_groups.tf line 88, in resource "aws_eks_node_group" "workers": 87: 88: [ 89: 90: 91: 92: 93: |---------------- | count.index is 0 | random_pet.node_groups is object with 1 attribute "example" | var.node_groups is tuple with 1 element Invalid value for "list" parameter: element 2: string required.

Can you take a look at it and see what you can find?

Because you're switching from count to for_each, var.node_groups[count.index] becomes each.value.

Here's an example on how to implement this:

resource "random_pet" "node_groups" { for_each = local.node_groups separator = "-" length = 2 keepers = { instance_type = lookup(each.value, "instance_type", local.workers_group_defaults["instance_type"]) ec2_ssh_key = lookup(each.value, "key_name", local.workers_group_defaults["key_name"]) source_security_group_ids = join("-", compact( lookup(each.value, "source_security_group_ids", local.workers_group_defaults["source_security_group_id"] ))) node_group_name = join("-", [var.cluster_name, each.value["name"]]) } } resource "aws_eks_node_group" "workers" { for_each = local.node_groups node_group_name = join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]) cluster_name = var.cluster_name node_role_arn = lookup(each.value, "iam_role_arn", aws_iam_role.node_groups[0].arn) subnet_ids = lookup(each.value, "subnets", local.workers_group_defaults["subnets"]) scaling_config { desired_size = lookup(each.value, "node_group_desired_capacity", local.workers_group_defaults["asg_desired_capacity"]) max_size = lookup(each.value, "node_group_max_capacity", local.workers_group_defaults["asg_max_size"]) min_size = lookup(each.value, "node_group_min_capacity", local.workers_group_defaults["asg_min_size"]) } ami_type = lookup(each.value, "ami_type", null) disk_size = lookup(each.value, "root_volume_size", null) instance_types = [lookup(each.value, "instance_type", null)] labels = lookup(each.value, "node_group_k8s_labels", null) release_version = lookup(each.value, "ami_release_version", null) # This sometimes breaks idempotency as described in https://github.com/terraform-providers/terraform-provider-aws/issues/11063 remote_access { ec2_ssh_key = lookup(each.value, "key_name", "") != "" ? each.value["key_name"] : null source_security_group_ids = lookup(each.value, "key_name", "") != "" ? lookup(each.value, "source_security_group_ids", []) : null } version = aws_eks_cluster.this.version tags = lookup(each.value, "node_group_additional_tags", null) lifecycle { create_before_destroy = true } }

examples/managed_node_groups/main.tf

wmorgan6796 · 2019-11-28T02:08:23Z

@eytanhanig @max-rocket-internet I updated the minimum versions of Terraform so that we can make use of the for_each blocks for resources. But the linting job for the Github CI only has terraform 0.12.2. Should that be updated or should I revert that change?

eytanhanig · 2019-11-28T02:19:51Z

@eytanhanig @max-rocket-internet I updated the minimum versions of Terraform so that we can make use of the for_each blocks for resources. But the linting job for the Github CI only has terraform 0.12.2. Should that be updated or should I revert that change?

It appears that you'll need to update the image used in lint.yml.

max-rocket-internet · 2019-11-28T09:06:30Z

I updated the minimum versions of Terraform so that we can make use of the for_each blocks for resources. But the linting job for the Github CI only has terraform 0.12.2. Should that be updated or should I revert that change?

Yes please also update that: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/.github/workflows/lint.yml#L59

eytanhanig · 2019-11-29T00:46:32Z

I believe that I've found a bug in the AWS provider resource aws_eks_node_group: Idempotency is broken - and therefore subsequent applies attempt to create & destroy the node group - when remote_access is used in the following way:

remote_access {}
remote_access {} with ec2_ssh_key = ""
remote_access {} with ec2_ssh_key = null

It does become idempotent when used with a valid SSH key. I've not tested it with permutations of the source_security_group_ids argument.

To account for this I believe that that we are limited to the following options:

Completely remove the remote_access configuration block.
Require the inclusion of a valid SSH key.
Wait for a fix to become available before merging this PR.

I've opened a ticket with the aws provider here: hashicorp/terraform-provider-aws#11063

saviogrossi · 2019-11-29T02:27:07Z

node_groups.tf

        aws_eks_cluster.this.name,
        lookup(var.node_groups[count.index], "name", count.index),
-        random_pet.node_groups[count.index].id
+        random_pet.node_groups[lookup(var.node_groups[count.index], "name", count.index)]


Hello, I'm getting this error:

Error: Invalid function argument on .terraform/modules/eks/node_groups.tf line 88, in resource "aws_eks_node_group" "workers": 87: 88: [ 89: 90: 91: 92: 93: |---------------- | aws_eks_cluster.this.name is "toro-dev2" | count.index is 0 | random_pet.node_groups is object with 1 attribute "standard" | var.node_groups is tuple with 1 element Invalid value for "list" parameter: element 2: string required.

looks like .id is missing here:

random_pet.node_groups[lookup(var.node_groups[count.index], "name", count.index)].id

tks!

Try this:

resource "random_pet" "node_groups" { for_each = local.node_groups separator = "-" length = 2 keepers = { instance_type = lookup(each.value, "instance_type", local.workers_group_defaults["instance_type"]) ec2_ssh_key = lookup(each.value, "key_name", local.workers_group_defaults["key_name"]) source_security_group_ids = join("-", compact( lookup(each.value, "source_security_group_ids", local.workers_group_defaults["source_security_group_id"] ))) node_group_name = join("-", [var.cluster_name, each.value["name"]]) } } resource "aws_eks_node_group" "workers" { for_each = local.node_groups node_group_name = join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]) cluster_name = var.cluster_name node_role_arn = lookup(each.value, "iam_role_arn", aws_iam_role.node_groups[0].arn) subnet_ids = lookup(each.value, "subnets", local.workers_group_defaults["subnets"]) scaling_config { desired_size = lookup(each.value, "node_group_desired_capacity", local.workers_group_defaults["asg_desired_capacity"]) max_size = lookup(each.value, "node_group_max_capacity", local.workers_group_defaults["asg_max_size"]) min_size = lookup(each.value, "node_group_min_capacity", local.workers_group_defaults["asg_min_size"]) } ami_type = lookup(each.value, "ami_type", null) disk_size = lookup(each.value, "root_volume_size", null) instance_types = [lookup(each.value, "instance_type", null)] labels = lookup(each.value, "node_group_k8s_labels", null) release_version = lookup(each.value, "ami_release_version", null) # This sometimes breaks idempotency as described in https://github.com/terraform-providers/terraform-provider-aws/issues/11063 remote_access { ec2_ssh_key = lookup(each.value, "key_name", "") != "" ? each.value["key_name"] : null source_security_group_ids = lookup(each.value, "key_name", "") != "" ? lookup(each.value, "source_security_group_ids", []) : null } version = aws_eks_cluster.this.version tags = lookup(each.value, "node_group_additional_tags", null) lifecycle { create_before_destroy = true } }

@eytanhanig I added the changes you suggested and the terraform plan seems to work.

wmorgan6796 · 2019-12-02T17:25:57Z

whats the status of this?

max-rocket-internet

NICE thanks @wmorgan6796!

I fixed conflict in changelog and a small fmt issue.

github-actions · 2022-11-20T02:32:27Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

William Morgan and others added 4 commits November 19, 2019 14:36

Finished first cut of managed node groups

3566e35

Updated formatting and extra fields.

e7a5ae4

Updating Changelog and README

e229273

Merge branch 'master' into master

c6d0ffc

wmorgan6796 mentioned this pull request Nov 20, 2019

Support EKS Managed Node Groups #599

Closed

3 tasks

William Morgan added 5 commits November 20, 2019 10:20

Fixing formatting

9325f31

Merge remote-tracking branch 'origin/master'

e740b95

Fixing docs.

2d36e7b

Updating required Version

7e5191e

Updating changelog

0041063

cabrinha approved these changes Nov 21, 2019

View reviewed changes

max-rocket-internet suggested changes Nov 22, 2019

View reviewed changes

William Morgan added 2 commits November 22, 2019 11:24

Adding example for managed node groups

cea2ce6

Managed IAM Roles for Nodegroups now have correct policies. Tags can …

74585d6

…now be added to node groups.

wmorgan6796 requested a review from max-rocket-internet November 22, 2019 16:28

wmorgan6796 closed this Nov 23, 2019

wmorgan6796 reopened this Nov 23, 2019

Fixing bug where people could set source_security_group_ids without s…

0b325c8

…etting ssh key causing a race condition within the aws provider.

eytanhanig reviewed Nov 28, 2019

View reviewed changes

examples/managed_node_groups/main.tf Outdated Show resolved Hide resolved

William Morgan added 4 commits November 27, 2019 17:26

Updating per comments.

be22931

Updating required versions of terraform

8236f7b

Updating per comments.

b291fa8

Updating vars

4183054

Updating minimum version for terraform

d96c077

William Morgan added 2 commits November 28, 2019 07:24

Change worker_groups_managed_node_groups to node_groups

d11f590

Using for_each on the random_pet

59916c2

saviogrossi reviewed Nov 29, 2019

View reviewed changes

eytanhanig mentioned this pull request Nov 29, 2019

aws_eks_node_group: Idempotency broken for empty remote_access block hashicorp/terraform-provider-aws#11063

Closed

Adding changes recommended by @eytanhanig

a1bb244

metral mentioned this pull request Dec 3, 2019

Add support for AWS Managed NodeGroups pulumi/pulumi-eks#280

Merged

max-rocket-internet added 2 commits December 4, 2019 18:16

Merge branch 'master' into master

2b5b2c7

Update node_groups.tf

e703bc8

max-rocket-internet approved these changes Dec 4, 2019

View reviewed changes

max-rocket-internet merged commit cf3dcc5 into terraform-aws-modules:master Dec 4, 2019

dpiddockcmp mentioned this pull request Dec 7, 2019

Bump minimum Terraform to 0.12.9 #617

Merged

3 tasks

TBeijen mentioned this pull request Dec 10, 2019

Misc. bugs & possible improvements in managed node groups #620

Closed

3 tasks

max-rocket-internet mentioned this pull request Dec 13, 2019

Is the complexity of this module getting too high? #635

Closed

wbertelsen mentioned this pull request Dec 19, 2019

Fix aws-auth config map for managed node groups #627

Merged

3 tasks

TBeijen mentioned this pull request Dec 23, 2019

Managed node group limits to 1 instance type (TF resource allows multiple) #642

Closed

4 tasks

exequielrafaela mentioned this pull request Jan 7, 2020

Support for EKS Managed Node Groups + aws-auth with K8s tf provider binbashar/terraform-aws-eks#5

Merged

github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2022

Uh oh!

Support for EKS Managed Node Groups #602

Support for EKS Managed Node Groups #602

Uh oh!

Conversation

wmorgan6796 commented Nov 20, 2019

PR o'clock

Description

Checklist

Uh oh!

wbertelsen commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wmorgan6796 commented Nov 22, 2019

Uh oh!

max-rocket-internet left a comment

Choose a reason for hiding this comment

Uh oh!

wmorgan6796 commented Nov 22, 2019

Uh oh!

pierresteiner commented Nov 22, 2019

Uh oh!

wmorgan6796 commented Nov 22, 2019

Uh oh!

lcycon commented Nov 23, 2019

Uh oh!

wmorgan6796 commented Nov 23, 2019

Uh oh!

lcycon commented Nov 23, 2019

Uh oh!

pierresteiner commented Nov 23, 2019

Uh oh!

wmorgan6796 commented Nov 23, 2019

Uh oh!

wmorgan6796 commented Nov 23, 2019

Uh oh!

splieth commented Nov 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eytanhanig Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmorgan6796 Nov 28, 2019

Choose a reason for hiding this comment

Uh oh!

eytanhanig Nov 28, 2019

Choose a reason for hiding this comment

Uh oh!

wmorgan6796 Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eytanhanig Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmorgan6796 Nov 28, 2019

Choose a reason for hiding this comment

Uh oh!

eytanhanig Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wmorgan6796 commented Nov 28, 2019

Uh oh!

eytanhanig commented Nov 28, 2019

Uh oh!

max-rocket-internet commented Nov 28, 2019

Uh oh!

eytanhanig commented Nov 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saviogrossi Nov 29, 2019

Choose a reason for hiding this comment

Uh oh!

eytanhanig Nov 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

wbertelsen commented Nov 21, 2019 •

edited

Loading

splieth commented Nov 25, 2019 •

edited

Loading

eytanhanig Nov 28, 2019 •

edited

Loading

wmorgan6796 Nov 28, 2019 •

edited

Loading

eytanhanig Nov 28, 2019 •

edited

Loading

eytanhanig Nov 28, 2019 •

edited

Loading

eytanhanig commented Nov 29, 2019 •

edited

Loading

eytanhanig Nov 29, 2019 •

edited

Loading