Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter iam_policy_attachment fails when providing role name #2461

Closed
1 task done
jukie opened this issue Feb 9, 2023 · 14 comments
Closed
1 task done

Karpenter iam_policy_attachment fails when providing role name #2461

jukie opened this issue Feb 9, 2023 · 14 comments

Comments

@jukie
Copy link

jukie commented Feb 9, 2023

Description

When a desired IAM role name is provided for creation and prefix disabled, the iam_policy_attachment fails with a circular dependency.

Versions

  • Module version [Required]:
    v19.7.0 and v18.31.2 where the issue was closed
  • Terraform version:
    v1.3.7
  • Provider version(s):
    provider registry.terraform.io/hashicorp/aws v4.53.0

Reproduction Code [Required]

This is cdktf json output that has been trimmed/redacted but can still be interacted with directly from terraform cli.

{
  "//": {
    "metadata": {
      "stackName": "example-cluster-eks-jukie",
      "version": "0.15.4"
    },
    "outputs": {
    }
  },
  "module": {
    "eks": {
      "//": {
        "metadata": {
          "path": "example-cluster-eks-jukie/eks",
          "uniqueId": "eks"
        }
      },
      "cluster_version": "1.24",
      "cluster_name": "eks-jukie",
      "vpc_id": "vpc-12345678901234567",
      "subnet_ids": ["subnet-12345678"],
      "manage_aws_auth_configmap": false,
      "create_cluster_security_group": false,
      "create_node_security_group": false,
      "source": "terraform-aws-modules/eks/aws",
      "version": "19.7.0"
    },
    "karpenter": {
      "//": {
        "metadata": {
          "path": "example-cluster-eks-jukie/karpenter",
          "uniqueId": "karpenter"
        }
      },
      "cluster_name": "eks-jukie",
      "create_iam_role": true,
      "create_instance_profile": true,
      "create_irsa": true,
      "depends_on": [
        "module.eks"
      ],
      "iam_role_name": "Karpenter-eks-jukie",
      "iam_role_use_name_prefix": false,
      "irsa_name": "Karpenter-IRSA-eks-jukie",
      "irsa_oidc_provider_arn": "${module.eks.oidc_provider_arn}",
      "irsa_use_name_prefix": false,
      "queue_name": "Karpenter-eks-jukie",
      "source": "terraform-aws-modules/eks/aws//modules/karpenter",
      "version": "19.7.0"
    }
  },
  "provider": {
    "aws": [
      {
        "region": "us-east-1"
      }
    ]
  },
  "terraform": {
    "required_providers": {
      "aws": {
        "source": "hashicorp/aws",
        "version": "4.53.0"
      }
    }
  }
}

Steps to reproduce the behavior:
Attempt a plan on the above block

Expected behavior

Successful plan

Actual behavior

Plan fails with errors on for_each in iam_role_policy_attachment resource

Terminal Output Screenshot(s)

╷
│ Error: Invalid for_each argument
│ 
│   on .terraform/modules/karpenter/modules/karpenter/main.tf line 327, in resource "aws_iam_role_policy_attachment" "this":
│  327:   for_each = { for k, v in toset(compact([
│  328:     "${local.iam_role_policy_prefix}/AmazonEKSWorkerNodePolicy",
│  329:     "${local.iam_role_policy_prefix}/AmazonEC2ContainerRegistryReadOnly",
│  330:     var.iam_role_attach_cni_policy ? local.cni_policy : "",
│  331:   ])) : k => v if local.create_iam_role }
│     ├────────────────
│     │ local.cni_policy is a string, known only after apply
│     │ local.create_iam_role is true
│     │ local.iam_role_policy_prefix is a string, known only after apply
│     │ var.iam_role_attach_cni_policy is true
│ 
│ The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of
│ this resource.
│ 
│ When working with unknown values in for_each, it's better to define the map keys statically in your configuration and place apply-time results only in the map values.
│ 
│ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time to fully converge.

Additional context

Other resources handle this differently as mentioned in #2306 (comment)

@jukie
Copy link
Author

jukie commented Feb 9, 2023

Working example in place of the current iam_policy_attachment and modeled after the EKS module's resource:

resource "aws_iam_role_policy_attachment" "this" {
  for_each = { for k, v in {
    AmazonEKSWorkerNodePolicy          = "${local.iam_role_policy_prefix}/AmazonEKSWorkerNodePolicy",
    AmazonEC2ContainerRegistryReadOnly = "${local.iam_role_policy_prefix}/AmazonEC2ContainerRegistryReadOnly",
    AmazonEKS_CNI_Policy               = var.iam_role_attach_cni_policy ? local.cni_policy : "",
  } : k => v if local.create_iam_role }

  policy_arn = each.value
  role       = aws_iam_role.this[0].name
}

@bryantbiggs
Copy link
Member

we will need a reproduction that we can deploy and displays the error you are seeing

See issue template:

and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

@jukie
Copy link
Author

jukie commented Feb 9, 2023

Updated @bryantbiggs
We use cdktf for rendering but the above json can still be interacted with as normal via terraform init and terraform plan to reproduce the issue.

@jukie jukie changed the title Karpenter iam_policy_attachment fails Karpenter iam_policy_attachment fails when providing role name Feb 9, 2023
@jukie
Copy link
Author

jukie commented Feb 13, 2023

Is that sufficient?

@bryantbiggs
Copy link
Member

unfortunately, no - we work off of vanilla Terraform to troubleshoot issues

@jukie
Copy link
Author

jukie commented Feb 16, 2023

Json is directly supported through terraform cli and can be interacted with through regular commands. JSON is functionally no different than HCL and presents the same issue. Direct HCL can also be found in the closed issue I pointed to.

@jukie
Copy link
Author

jukie commented Feb 16, 2023

Could you please try reproducing with the above code block @bryantbiggs via terraform init and terraform plan?

@mballoni
Copy link

I'm having the same issue.
Context: my cluster creation is on another terraform project with separate state.
Terraform version: 1.3.9

Cluster:

module "eks" {
  source = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.24.0"

  cluster_name              = local.tools_cluster_name
  cluster_version           = "1.25"
  vpc_id                    = var.vpc_id
  private_subnet_ids        = var.subnet_ids
  cluster_service_ipv4_cidr = var.cluster_cidr_block

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  create_cluster_security_group = false
  cluster_security_group_id     = module.eks_security_group.sg_id
  enable_irsa                   = true

  node_security_group_additional_rules = {
    ingress_nodes_karpenter_port = {
      description                   = "Cluster API to Node group for Karpenter webhook"
      protocol                      = "tcp"
      from_port                     = 8443
      to_port                       = 8443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }
  node_security_group_tags = {
    "kubernetes.io/cluster/${local.cluster_name}/nodes" = "owned"
  }

  #https://aws-ia.github.io/terraform-aws-eks-blueprints/node-groups/
  managed_node_groups = {
    karpenter_nodes = {
      node_group_name          = "${local.cluster_name}-kpt"
      enable_node_group_prefix = false
      instance_types           = ["t4g.large"]
      ami_type                 = "BOTTLEROCKET_ARM_64"
      launch_template_os       = "bottlerocket"
      create_security_group    = false
      subnet_ids               = var.subnet_ids
      max_size                 = 1
      min_size                 = 1
      desired_size             = 1
      disk_size                = 50
      k8s_taints               = [{ key = "CriticalAddonsOnly", value = "true", effect = "NO_SCHEDULE" }]
      k8s_labels               = {
        "critical-addons/exclude-balancer" = "true"
      }
    }
  }
}

versions.tf

terraform {
  required_version = ">= 1.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.55.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.18.1"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.9.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.14"
    }
  }
}

Another terraform project with the separate state:

main.tf:

module "karpenter" {
  source = "./modules/karpenter"

  cluster_name = var.cluster_name

  iam_role_arn          = data.aws_eks_node_group.karpenter_node_group.arn
  eks_oidc_provider_arn = data.aws_iam_openid_connect_provider.eks_oidc_provider.arn
  eks_cluster_endpoint  = data.aws_eks_cluster.eks_cluster.endpoint

  depends_on = [module.cluster_addons]
}

data "aws_eks_cluster" "eks_cluster" {
  name = var.cluster_name
}
data "aws_eks_node_group" "karpenter_node_group" {
  cluster_name    = var.cluster_name
  node_group_name = "${var.cluster_name}-kpt"
}
data "aws_iam_openid_connect_provider" eks_oidc_provider {
  url = data.aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer
}


modules/karpenter/main.tk

module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "19.10.0"

  cluster_name = var.cluster_name

  irsa_oidc_provider_arn          = var.eks_oidc_provider_arn
  irsa_namespace_service_accounts = ["karpenter:karpenter"]

  # Since Karpenter is running on an EKS Managed Node group,
  # we can re-use the role that was created for the node group
  create_iam_role = false
  iam_role_arn    = var.iam_role_arn
}

versions.tf:

terraform {
  required_version = ">= 1.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.55.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.18.1"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.9.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.14"
    }
  }
}

Last week I had the same issue for a day and then it disappeared after updating the aws provider version. Not sure if thats actually related since it returned.

I hope this example helps reproduce the issue.

@mballoni
Copy link

By the way, I've found this issue here thats closed:
#2388

When applying the changes locally everything works perfectly!
Any chance that can be re-opened?

@bryantbiggs
Copy link
Member

I need something that is going to show there is an issue - so far that has not happened. Here is a stab at it based on what I see above, and it deploys just fine without any issue

provider "aws" {
  region = local.region
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

data "aws_availability_zones" "available" {}
data "aws_caller_identity" "current" {}

locals {
  name   = "karp-ex"
  region = "eu-west-1"

  vpc_cidr = "10.0.0.0/16"
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)

  tags = {
    Example    = local.name
    GithubRepo = "terraform-aws-eks"
    GithubOrg  = "terraform-aws-modules"
  }
}

################################################################################
# EKS Module
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.10"

  cluster_name    = local.name
  cluster_version = "1.24"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    default = {}
  }

  tags = local.tags
}

################################################################################
# Karpenter Module
################################################################################

module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 19.10"

  cluster_name = module.eks.cluster_name

  irsa_oidc_provider_arn          = module.eks.oidc_provider_arn
  irsa_namespace_service_accounts = ["karpenter:karpenter"]

  # Since Karpenter is running on an EKS Managed Node group,
  # we can re-use the role that was created for the node group
  create_iam_role = false
  iam_role_arn    = module.eks.eks_managed_node_groups["default"].iam_role_arn
}

################################################################################
# Supporting resources
################################################################################

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.0"

  name = local.name
  cidr = local.vpc_cidr

  azs             = local.azs
  private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
  public_subnets  = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true


  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }

  tags = local.tags
}

@mballoni
Copy link

I've got your point and I'm working on it.

Would it be possible to give it a try using the blue prints as I shared earlier and using terraform data to lookup the managed node group? I suspect there may be something there.

Thank you, I appreciate your help!

@5cat
Copy link

5cat commented Mar 1, 2023

I have faced the same error, and solved it with this #2337 (comment) and hashicorp/terraform#26383 (comment) by removing depends_on from my modules.

But it would be great to have this PR #2462 somehow accepted since im sure there will be a case where removing the depends_on between modules is not an option

@bryantbiggs
Copy link
Member

Closing till a reproduction can be provided - I went out of my way to try to reproduce (see above) and I am unable to do so. Using depends_on across modules is something that Hashi advises against

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants