Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS provider/backend skip_metadata_api_check required to use aws credentials file native role_arn #7

Closed
etherops opened this issue Jun 11, 2019 · 7 comments · Fixed by #20
Assignees
Labels
bug Something isn't working
Milestone

Comments

@etherops
Copy link

etherops commented Jun 11, 2019

The aws-sdk-go v1.15.54 (and thereby terraform 12) now supports using the aws credentials file native format role_arn, as such:

$ cat ~/.aws/credentials 
[default]
role_arn = arn:aws:iam::123456789999:role/my-tf-role
credential_source=Ec2InstanceMetadata

(note Ec2InstanceMetadata used for example, but bug is not limited to that credential_source)

However, attempting to use the following provider/s3 config backend...

locals {
  service_name = "event-pipeline"
}

provider "aws" {
  version                 = "~> 2.0"
  region                  = "us-east-1"
}

terraform {
  backend "s3" {
    bucket                  = "example-tf-bucket"
    key                     = "foo/terraform.tfstate"
    region                  = "us-east-1"
  }
}

results in the following error...

bash-4.4# /atlantis/bin/terraform0.12.1 init -input=false -no-color -upgrade
Upgrading modules...
[MODULES REDACTED]

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Error: Failed to get existing workspaces: AccessDenied: Access Denied
	status code: 403, request id: 672B283A4654B52C, host id: qgNEkRWQ77ULDpDghfPVB8JCRWfAbnr+Jk6LfhOwaLhyLfCLRW0OQKto07LHv05HCCeIo6Py5pQ=

Error trace:

bash-4.4# TF_LOG=trace /atlantis/bin/terraform0.12.1 init -input=false -no-color -upgrade
2019/06/11 01:42:19 [INFO] Terraform version: 0.12.1  
2019/06/11 01:42:19 [INFO] Go runtime version: go1.12.4
2019/06/11 01:42:19 [INFO] CLI args: []string{"/atlantis/bin/terraform0.12.1", "init", "-input=false", "-no-color", "-upgrade"}
2019/06/11 01:42:19 [DEBUG] Attempting to open CLI config file: /root/.terraformrc
2019/06/11 01:42:19 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2019/06/11 01:42:19 [INFO] CLI command args: []string{"init", "-input=false", "-no-color", "-upgrade"}
Upgrading modules...

[MODULES REDACTED]

Initializing the backend...
2019/06/11 01:42:20 [TRACE] Meta.Backend: built configuration for "s3" backend with hash value 535375642
2019/06/11 01:42:20 [TRACE] Preserving existing state lineage "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX"
2019/06/11 01:42:20 [TRACE] Preserving existing state lineage "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX"
2019/06/11 01:42:20 [TRACE] Meta.Backend: working directory was previously initialized for "s3" backend
2019/06/11 01:42:20 [TRACE] Meta.Backend: using already-initialized, unchanged "s3" backend configuration
2019/06/11 01:42:20 [INFO] Setting AWS metadata API timeout to 100ms
2019/06/11 01:42:20 [INFO] AWS EC2 instance detected via default metadata API endpoint, EC2RoleProvider added to the auth chain
2019/06/11 01:42:20 [INFO] AWS Auth provider used: "EC2RoleProvider"
2019/06/11 01:42:20 [DEBUG] Trying to get account information via sts:GetCallerIdentity
2019/06/11 01:42:20 [DEBUG] [aws-sdk-go] DEBUG: Request sts/GetCallerIdentity Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: sts.amazonaws.com
User-Agent: aws-sdk-go/1.19.18 (go1.12.4; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.1
Content-Length: 43
Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20190611/us-east-1/sts/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date;x-amz-security-token, Signature=[REDACTED]
Content-Type: application/x-www-form-urlencoded; charset=utf-8
X-Amz-Date: 20190611T014220Z
X-Amz-Security-Token: REDACTED=
Accept-Encoding: gzip

Action=GetCallerIdentity&Version=2011-06-15
-----------------------------------------------------
2019/06/11 01:42:20 [DEBUG] [aws-sdk-go] DEBUG: Response sts/GetCallerIdentity Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 468
Content-Type: text/xml
Date: Tue, 11 Jun 2019 01:42:20 GMT
X-Amzn-Requestid: REDACTED


-----------------------------------------------------
2019/06/11 01:42:20 [DEBUG] [aws-sdk-go] <GetCallerIdentityResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
  <GetCallerIdentityResult>
    <Arn>arn:aws:sts::REDACTED-role/[THE_EC2_MACINE_ROLE]/i-xxxxxxxxxxxxxxxx</Arn>
    <UserId>REDACTED</UserId>
    <Account>REDACTED</Account>
  </GetCallerIdentityResult>
  <ResponseMetadata>
    <RequestId>REDACTED</RequestId>
  </ResponseMetadata>
</GetCallerIdentityResponse>
2019/06/11 01:42:20 [DEBUG] [aws-sdk-go] DEBUG: Request s3/ListObjects Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /?prefix=env%3A%2F HTTP/1.1
Host: REDACTED
User-Agent: aws-sdk-go/1.19.18 (go1.12.4; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.1
Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20190611/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=REDACTED
X-Amz-Content-Sha256: REDACTED
X-Amz-Date: 20190611T014220Z
X-Amz-Security-Token: REDACTED=
Accept-Encoding: gzip


-----------------------------------------------------
2019/06/11 01:42:21 [DEBUG] [aws-sdk-go] DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 403 Forbidden
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Tue, 11 Jun 2019 01:42:20 GMT
Server: AmazonS3
X-Amz-Bucket-Region: us-east-1
X-Amz-Id-2: REDACTEDX-Amz-Request-Id: REDACTED


-----------------------------------------------------
2019/06/11 01:42:21 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>REDACTED</RequestId><HostId>REDACTED</HostId></Error>

2019/06/11 01:42:21 [DEBUG] [aws-sdk-go] DEBUG: Validate Response s3/ListObjects failed, not retrying, error AccessDenied: Access Denied
	status code: 403, request id: REDACTED, host id: REDACTED
Error: Failed to get existing workspaces: AccessDenied: Access Denied
	status code: 403, request id: 7613B17D7D93D2C3, host id: REDACTED

I discovered, as did another user here: hashicorp/terraform-provider-aws#5018 (comment), that setting skip_metadata_api_check is a workaround.

I further discovered, essentially why, this is.

awsauth.go
202  	if !c.SkipMetadataApiCheck {
...
214  			providers = append(providers, &ec2rolecreds.EC2RoleProvider{
215  				Client: metadataClient,
216  			})
...

228  	// This is the "normal" flow (i.e. not assuming a role)
229  	if c.AssumeRoleARN == "" {
230  		return awsCredentials.NewChainCredentials(providers), nil
231  	}

After some tracing, I discovered that the reason is the logic above. During the so called metadataapicheck... the code not only checks the API, but adds the metadata creds to the auth provider chain!

Subsequently, then, on 229, if you haven't hard coded the role_arn into the terraform config, the awsauth.go code will return and not attempt to add anything else to the auth chain. This causes any subsequent AWS calls (for both the s3 backend or provider) to use the ec2 role creds retrieved from the ec2instancemetadata API, and it will sadly then not follow any role_arn instructions in the shared credentials file :-(

Why the workaround works

So by setting skip_metadata_api_check = true in the terrraform config, we can now basically trick the SDK into doing what it does best, instead of using the ec2 instance role creds that were set during the "check" on line 214.

Expected Behavior

Expected behavior would be that if you specify role_arn in your aws credentials file, that by default terraform will use that, and not use the ec2 machine role that it gets from the ec2instanemetadata api.

@bdwyertech
Copy link

bdwyertech commented Aug 20, 2019

Wonder if these are related? Alot of the logic in this module looks similar to that in the session module of aws-sdk-go

aws/aws-sdk-go#2528
aws/aws-sdk-go@8be2a09

We're running on ECS and can't assumerole via config file, have been chasing things down all over the place.
hashicorp/packer#7967
kubernetes-sigs/aws-iam-authenticator#257

@dee-kryvenko
Copy link

dee-kryvenko commented Nov 5, 2019

I have a slightly different use case - running TF in EKS pod that uses IAM attached to Service Account as described here https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

The long story short, I get ~/.aws/config file like this:

[profile profile1]
role_arn = arn:aws:iam::xxx:role/pod
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
[profile profile2]
source_profile = profile1
role_arn = arn:aws:iam::xxx:role/some-other-role-allowed-to-be-assumed-from-profile1

Then I have export AWS_PROFILE=profile2 just before calling terraform. I have simple TF code to test this:

provider "aws" {
  version = "2.34.0"
  region  = "us-west-2"
}

data "aws_caller_identity" "current" {}

output "aws_caller_identity" {
  value = data.aws_caller_identity.current
}

Terraform picking up EKS node instance profile instead of everything defined in ~/.aws/config. I think I have slightly better workaround than skip_metadata_api_check - trick AWS SDK into thinking it's not running in AWS by defining AWS_METADATA_URL environment variable to some absurd endpoint:

export AWS_METADATA_URL="http://localhost/not/existent/url"

For my particular use case, AWS metadata IP should be anyway iptabled out so not accessible by EKS pods, I just haven't got there yet. Still this is a bug worth fixing - I can imagine there might be use cases these workarounds does not apply. Such as using role_arn and instance profile for different provider instances or something if EKS node instance profile not intended to be hidden from it's pods. Anyway diverging from official AWS SDK credentials chain logic (or official AWS SDK in general - I'm not sure what was the reasoning to have this separate base SDK but can't think of any reason worth the trouble that comes with it) sounds like a bad practice to me. It may come in all sorts of unintended behaviour/bugs, AWS systems are pretty complex and heavily rely on conventions and standards like this AWS credentials chaining order.

@lifeofguenter
Copy link

Thanks @etherops this was exactly the issue we were having (#17 (comment)) and we were successfully able to fix it with your "workaround".

Not sure what that functionality should do or why its even required, as you can handle any case directly via the profile configuration.

Ideally we keep "profile lookup functionality" the same as the SDK so there are no caveats.

@jonjohnston
Copy link

@llibicpep your workaround is exactly what I needed as it works on older version of terraform. I'm using terraform 0.11.11 and have an instance profile tied to an EC2 instance. However, I have ~/.aws/credentials with the following

[default]
aws_access_key_id = DATA
aws_secret_access_key = DATA
[profile1]
role_arn = DATA
source_profile = default

but terraform ALWAYS loads the chain with the instance metadata even with the skip_metadata_api_check set to true.

@bflad bflad linked a pull request May 28, 2020 that will close this issue
@bflad bflad added the bug Something isn't working label May 28, 2020
@bflad bflad added this to the v0.6.0 milestone May 28, 2020
@bflad bflad self-assigned this May 28, 2020
@katharosada
Copy link

I'm attempting to run Atlantis (or terraform plan at all) on ECS and this has completely stopped us from being able to. Due to the use cases of our other automation and our developer access roles etc. we can't hardcode the role arn directly in the provider, so we're relying on it picking up that information from the standard AWS environment variables and config etc.

I have this in the aws config

[profile myrole]
role_arn = <role arn>
credential_source  = EcsContainer

Provider looks like this

provider "aws" {
  region  = var.aws_region
  allowed_account_ids = ["<id here>"]
  version = "~> 2.48"
  skip_metadata_api_check = true
}

I've got AWS_PROFILE=myrole set and I have verified that the ECS task role has permissions to assume the role and I have set the environment variable AWS_SDK_LOAD_CONFIG=1 as per the docs.

The permission errors I get from terraform plan show that it's attempting to use the task execution role and not the role that it should be assuming.

These workarounds did not work for ECS tasks:

  • Adding skip_metadata_api_check = true to the provider
  • Setting the AWS_METADATA_URL to a non existent URI
  • Also overriding the various environment variables for ECS: ECS_CONTAINER_METADATA_URI and ECS_CONTAINER_METADATA_URI_V4 (depending on the ECS agent version).

Looking for more ideas for workarounds, or maybe getting that pull request in with the fix please! 🙏

@bflad
Copy link
Contributor

bflad commented Jun 2, 2020

The fix for this has been merged and will release with v0.5.0 of this library, for inclusion with Terraform CLI (S3 Backend) v0.13.0-rc1 and Terraform AWS Provider v3.0.0. Thanks to @nkupton for the implementation and patience. 👍

@smirgel
Copy link

smirgel commented Jun 10, 2020

@katharosada we had to set "skip_metadata_api_check = true" on the backend config as well so that the correct profile was used when accessing the state bucket.
Although we don't use ecs container we use ec2 instance profile but we managed to get it to work with:

  • "skip_metadata_api_check = true" on provider and backend config
  • AWS_SDK_LOAD_CONFIG=1
  • aws profile with "credential_source = Ec2InstanceMetadata"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants