Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent remote S3 state failure #10779

Closed
jamtur01 opened this issue Dec 16, 2016 · 38 comments
Closed

Intermittent remote S3 state failure #10779

jamtur01 opened this issue Dec 16, 2016 · 38 comments
Labels
bug core waiting-response An issue/pull request is waiting for a response from the community

Comments

@jamtur01
Copy link
Contributor

Terraform Version

0.8

Affected Resource(s)

remote state on s3

Debug Output

When running terraform plan/apply or destroy.

Error reloading remote state: RequestError: send request failed
caused by: Get https://exxxxxxx.s3.amazonaws.com/development/consul/terraform.tfstate: x509: certificate signed by unknown authority

Expected Behavior

Should get remote state.

@blaltarriba
Copy link

blaltarriba commented Dec 16, 2016

I'm using 0.8.1 and I have the same problem without using S3 remote state file. I get this error running get, plan, apply and destroy but is randomly. Some examples:

$ terraform get
Get: s3::https://s3.amazonaws.com/mybucket/my-custom-module.zip (update)
Error loading Terraform: Error downloading modules: RequestError: send request failed
caused by: Get https://mybucket.s3.amazonaws.com/my-custom-module.zip: x509: certificate signed by unknown authority

and

$ terraform destroy
Do you really want to destroy?
  Terraform will delete all your managed infrastructure.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

@morissm
Copy link

morissm commented Dec 16, 2016

Same here.

It only happens in the ca-central-1 region for us but, to be fair, it's the only region we've been working on in the last few days so it may just be happenstance.

@jamtur01
Copy link
Contributor Author

All regions for me - primarily us-east-1 but also seeing in us-west-* too.

@mitchellh
Copy link
Contributor

Hey James, so I ran this in a loop for the past ~60 minutes (of configure, reset state, configure) on Mac and Linux and I was never able to see an issue. It has probably configured and synced remote state about 300 times during that time (sleep 12 seconds, 5 times per minute).

I've also heard of other people getting issues recently, though, so I'm not discounting your claim. I just don't know what causes it. I still continue to doubt its any change we made since we haven't touched any of the remote state code nor HTTP client initialization code.

Any ideas?

@mitchellh mitchellh added bug core waiting-response An issue/pull request is waiting for a response from the community labels Dec 16, 2016
@jamtur01
Copy link
Contributor Author

jamtur01 commented Dec 16, 2016

I think it is new since 0.8. I've never seen it with 0.7.13. If it was just me I'd put it down to AWS bucket weirdness but the fact that a few people see it too makes me suspect there's a wider issue, again perhaps not TF but still an issue, here.

@mitchellh
Copy link
Contributor

We changed to Go 1.7.4 which had very few changes, the only one of which I can imagine affecting this being: golang/go#18141

I'm not saying thats the issue at fault, but thats the only change between 0.7.13 and current that has anything to do with TLS in our code. We probably did update the AWS SDK during that time too, so its possible the issue is in the AWS SDK.

At any rate, we're not doing any special TLS configuration for the AWS SDK or Go directly so the issue is likely in one of those two. I'd lean towards the former just because I find it unlikely that something like this is broken in Go itself.

@jamtur01
Copy link
Contributor Author

I just ran ten minutes of terraform plan in a loop and saw it about 10% of the time. Here's a snippet of debug:

2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.plan-destroy"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.output.asg_name"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_elb.web"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_route53_record.web"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "var.instance_type"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_autoscaling_group.web"
2016/12/16 13:09:59 [ERROR] Shadow graph error: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
2016/12/16 13:09:59 [DEBUG] plugin: waiting for all plugin processes to complete...
Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

@mitchellh
Copy link
Contributor

@jamtur01 I just compiled TF 0.8.1 with Go 1.7.3. Do you mind giving this a shot?

Since you have a reliable repro I just want to eliminate the "wtf" that Go might be causing this.

https://dl.dropboxusercontent.com/u/46819/terraform_081_go173.zip
SHA256 is 4f3a039d4ffae4a3bdc0390c14f258e44a22bc32a77894c74bb120b3f285293e

(Note for the future: I probably deleted the file since it was just in my dropbox)

@gkhasel1
Copy link

Was having the same issue with certificates consistently on 0.8.1. Tried the build with go 1.7.3 linked above and was able to successfully work with remote state again.

@morissm
Copy link

morissm commented Dec 16, 2016

I can confirm that the issue manifests itself in a custom compiled version of Terraform 0.7.13 compiled with go 1.7.4.

@jamtur01
Copy link
Contributor Author

@mitchellh Tried that build with the ten minute test. No errors!

@mitchellh
Copy link
Contributor

@jamtur01 Yep, okay, so it is Go 1.7.4 causing this. Bradfitz also offered up a solution that is already a CL for Go (not merged yet though). Ouch! We'll try to resolve this one way or another for 0.8.2, either dropping back to Go 1.7.3 or finding a way to have cgo-enabled builds for Darwin.

@jen20
Copy link
Contributor

jen20 commented Dec 17, 2016

The same thing applies to Illumos builds of Terraform by the look of it - both 0.8 and 0.8.1 exhibit the issue running on SmartOS.

@myoung34
Copy link
Contributor

@mitchellh I can add a bit more confirmation.

Installed terraform 0.8.1 via brew and got the x509 issue on sts and s3. It's compiled with 1.7.4
Installed the tf binary from official source, worked fine.

@mitchellh
Copy link
Contributor

0.8.2 will be released today built with Go 1.7.3. That reverts the "security fixes" made in Go 1.7.4 unfortunately but hopefully 0.8.3 will be built with Go 1.8 which will bring all this back with a longer term fix from the Go team.

@jwadolowski
Copy link

@mitchellh unfortunately it still happens on 0.8.2. I just executed terraform plan in a loop and 5 out of 30 attempts ended with either

Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

or

Error reloading remote state: RequestError: send request failed
caused by: Get https://xxxxx-yyyyyy.s3-eu-west-1.amazonaws.com/commons/terraform.tfstate: x509: certificate signed by unknown authority

I had exactly the same issues on 0.8.1, but never seen that on 0.7.x

$ terraform -v
Terraform v0.8.2

@myoung34
Copy link
Contributor

@mitchellh can confirm that it's still a problem with 0.8.2
Also, 0.8.2 was released/tagged to the releases page on GH but isn't avail as a binary on the downloads page of terraform.io . Is that to confirm my suspicion that it's still a known problem?

@jen20
Copy link
Contributor

jen20 commented Dec 29, 2016

Hi @myoung34! Could you try force refreshing the download page? I see the download for 0.8.2 there.

@myoung34
Copy link
Contributor

Weird. It's there, never thought i'd fail to the cache.

Compiled master Terraform v0.8.3-dev (e2f2f9c78e9784eb125beb64c1fe938f9d14183c) against Go 1.7.3 manually and all is good for now

@sundeer
Copy link

sundeer commented Jan 6, 2017

Same here:

Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
[terragrunt] 2017/01/06 14:08:38 Attempting to release lock for state file dev-rancher-server-db in DynamoDB
[terragrunt] 2017/01/06 14:08:39 Lock released!
[terragrunt] 2017/01/06 14:08:39 exit status 1
exit status 1
rancher-server-db ❯ terraform -v
Terraform v0.8.2
rancher-server-db ❯ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.12.2
BuildVersion:	16C67

@troy-mac
Copy link

troy-mac commented Jan 19, 2017

I have been seeing this in 0.8.1, I have not seen this in 7.11 as I Run both versions for different environments... Just saying, seems to be an issue with terraforms latest releases.

@prees1
Copy link

prees1 commented May 29, 2017

Same issue with Terraform 0.9.5 and go 1.8. Any one find a reproducible solution?

➜ terraform --version
Terraform v0.9.5
➜ go version
go version go1.8 darwin/amd64
➜ terraform plan
Failed to load backend:
Error configuring the backend "s3": RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

Please update the configuration in your Terraform files to fix this error.
If you'd like to update the configuration interactively without storing
the values in your configuration, run "terraform init".

@prees1
Copy link

prees1 commented May 29, 2017

In my case it was an issue with my SSL certs that curl was using. I fixed it by setting CURL_CA_BUNDLE to a copy of this file, locally.

@brikis98
Copy link
Contributor

brikis98 commented Dec 5, 2017

I'm seeing the same issue with Terraform 0.10.8. Every now and then (1 out of 20 or 30 times, perhaps?) I get a net/http: TLS handshake timeout error when talking to S3. Given that other people are seeing it, this bug should probably be re-opened, as I doubt S3 is that flaky :)

@momirza
Copy link

momirza commented Dec 12, 2017

Also intermittently experiencing this issue using Terraform 0.10.4.

@eoliphan
Copy link

eoliphan commented Dec 12, 2017

I'm seeing it intermittently on 0.11.1/OSX

@Viman-Sharma
Copy link

Same issue with 0.11.1

@hgallo0
Copy link

hgallo0 commented Jan 15, 2018

Same issue with go version go1.8.3 darwin/amd64 and terraform Terraform v0.11.2

@denniswebb
Copy link
Contributor

As a temporary bandaid you can add skip_credentials_validation = true to your backend configuration block.

@hgallo0 Can you elaborate on your AWS credentials setup? Are you using access/secret keys, using a profile, assumed role, STS with MFA?

@hgallo0
Copy link

hgallo0 commented Jan 15, 2018

Hi @denniswebb thanks for your quick reply. I am using access/secret key currently stored in my ~/.aws/credentials. no STS or MFA

@brycehemme
Copy link

I'm seeing this exact issue as well. I'm setting profile in my backend config. The profile is in my ~/.aws/credentials file and the credentials work. Setting skip_credentials_validation = true fixed the issue. My version info is below.

❯ terraform --version
Terraform v0.11.3

@mchodson
Copy link

mchodson commented Mar 14, 2018

just started happening for me too.

osx 10.12.6.
s3 us-west-2 backend
terraform v0.11.3 (installed via brew)

the only recent local updates i can think of was installing a specific version of golang to use some new kubernetes incubator packages (external-dns). the terraform issue it is intermittent and i can't seem to figure out why. i did notice that if i switch networks it seems to clear up if only temporarily. like get on a vpn and try from there, or hop back off the vpn and try again. no idea if that's just a coincidence or not. maybe something to do with golang and stale dns/cache something something i'm grasping for answers.

@darvein
Copy link

darvein commented May 30, 2018

Same here

$ terraform init .
Initializing modules...
Initializing the backend...

Error configuring the backend "s3": RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: dial tcp: i/o timeout

Please update the configuration in your Terraform files to fix this error
then run this command again.

Terraform v0.11.7
OS X.

@ghost
Copy link

ghost commented Jun 28, 2018

Has there a fix for this as I am also seeing this with 0.11.7?

@nicolasbarbe
Copy link

Same issue with Terraform v0.11.7 on Alpine. I fixed it installing the following package:
apk --update add ca-certificates

@brikis98
Copy link
Contributor

I'm seeing these issues quite often on OS X 0.11.7 too. Should this issue be reopened?

@johnnyplaydrums
Copy link

@brikis98 I think this is the same issue being discussed here: hashicorp/terraform-provider-aws#4709. If so, add your comment / upvote to that issue since it's still open. I believe this needs to be solved in the provider, not in terraform core.

@ghost
Copy link

ghost commented Apr 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug core waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests