Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since provider version 1.34.0 terraform plan is really slow #5822

Closed
rmldsky opened this issue Sep 10, 2018 · 28 comments
Closed

Since provider version 1.34.0 terraform plan is really slow #5822

rmldsky opened this issue Sep 10, 2018 · 28 comments
Assignees
Labels
provider Pertains to the provider itself, rather than any interaction with AWS. upstream Addresses functionality related to the cloud provider.
Milestone

Comments

@rmldsky
Copy link

rmldsky commented Sep 10, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.11.8

Problem

Not sure if it's me or something under the hood. I was running 0.11.8/1.27.0 and upgraded provider to 1.35.0. Immediate after upgrade I ran terraform plan and it took very long time to finish. I blamed network at first, but after some time I ran it again with same result. I've downgraded plugin back to 1.27.0 and it is fast again ... Not sure if it is isolated issue or something was introduced in one of the versions following 1.27.0 :/

I run it on the same subset of resources and state plan (remote S3). Also not sure if it's network-slow or locally-slow

  • 1.27.0 time terraform plan
real	0m43.978s
user	0m6.544s
sys	0m2.043s
  • 1.35.0 time terraform plan
real	14m30.323s
user	0m8.626s
sys	0m3.522s

I'll try to upgrade version-by-version and see when it starts happening for me.

@rmldsky rmldsky changed the title terraform plan is really slow after upgrading provider Since provider version 1.34.0 terraform plan is really slow Sep 11, 2018
@rmldsky
Copy link
Author

rmldsky commented Sep 11, 2018

I have upgraded terraform to 0.11.8 and did provider upgrade version-by-version starting from 1.27.0. And the "slowness" problem start occurring with 1.34.0. Which according to CHANGELOG.md was build agains new/recent golang release.

@rmldsky
Copy link
Author

rmldsky commented Sep 11, 2018

I've checked on a different machine, with fresh install of both terraform (0.11.8) and provider (1.35.0, I did test also 1.27.0 and 1.34.0) and to my surprise it works as expected. Only difference is an OS: macOS El Capitan (slow one) vs macOS High Sierra (where it works as expected).

I'll try to check if there is anything I can do on the El Capitan machine to fix it. Any clues why it might be an issue on that platform?

@bflad
Copy link
Contributor

bflad commented Sep 11, 2018

Have you tried running Terraform with debugging enabled? Maybe the debug log will offer some clues. Please feel free to Gist it and post them here. If you are worried about any sensitive data, it can be encrypted with the HashiCorp GPG Key or redacted as necessary.

@bflad bflad added waiting-response Maintainers are waiting on response from community or contributor. provider Pertains to the provider itself, rather than any interaction with AWS. labels Sep 11, 2018
@supasteev0
Copy link

FYI, I am exactly in the same situation. Terraform 0.11.8 and provider.aws 1.35 is really slow. Any plan or apply action takes 10 times longer than before.
Just downgraded to 1.33 as suggested by @rmldsky and execution time is now as fast as before.

@rmldsky
Copy link
Author

rmldsky commented Sep 11, 2018

@bflad ah, did not though about this (duh!), will do tomorrow.

@supasteev0 are you also on macOS El Capitan?

@supasteev0
Copy link

@rmldsky No, I'm on Linux Fedora 25.

@rmldsky
Copy link
Author

rmldsky commented Sep 12, 2018

@bflad I did run terraform with DEBUG with both 1.33.0 and 1.34.0. I still try to figure out if I could post it (redacted or encrypted). On the first glance on both files it seems there are a lot more waiting fors when using 1.34.0:

$ cat debug-1.34.0.log | grep waiting | grep -ve ^2 | wc -l
      19
[...]
$ cat debug.log | grep waiting | grep -ve ^2 | wc -l
       2

Sample occurrence looks something like below and usually lasts 20sec:

2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: POST / HTTP/1.1
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Host: iam.amazonaws.com
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: User-Agent: aws-sdk-go/1.15.21 (go1.11; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.8
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Length: 66
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20180912/us-east-1/iam/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=56f0063b9785f3b282e07dc
f6cc8c2a54213f2629f386d70fef4ccee05b524a2
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Type: application/x-www-form-urlencoded; charset=utf-8
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amz-Date: 20180912T065335Z
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Accept-Encoding: gzip
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Action=GetUser&UserName=REDACTED&Version=2010-05-08
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: -----------------------------------------------------
default_80"
", waiting for: "aws_sns_topic.REDACTED"
alarms.aws_cloudwatch_metric_alarm.high_memory_usage_alarm", waiting for: "module.REDACTED.var.instance_id"
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: 2018/09/12 08:53:55 [DEBUG] [aws-sdk-go] DEBUG: Response dynamodb/DescribeTable Details:
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: ---[ RESPONSE ]--------------------------------------
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: HTTP/1.1 200 OK
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Connection: close
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Length: 490
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Type: application/x-amz-json-1.0
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Date: Wed, 12 Sep 2018 06:53:52 GMT
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Server: Server
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amz-Crc32: 1392297593
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amzn-Requestid: GOPTVLS03C4KAS05BVB2QP58JNVV4KQNSO5AEMVJF66Q9ASUAAJG
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: -----------------------------------------------------

What is strange(?) is that those lines with waiting look like they appear in randomly, a bit like out of context, injected there between those "timestamped" lines, but maybe it's OK.

@bflad
Copy link
Contributor

bflad commented Sep 13, 2018

Admittedly, its a little hard to discern what is occurring with only the provided snippet of logs, but hopefully we can narrow this down! 😅

What is strange(?) is that those lines with waiting look like they appear in randomly, a bit like out of context, injected there between those "timestamped" lines, but maybe it's OK.

The interleaved output is occurring since by default Terraform runs with a parallelism of 10. Maybe tuning the -parallelism flag during plan might help narrow down the log output to what is happening?

I would also check in the debug logs to see if the AWS Go SDK is returning [aws-sdk-go] DEBUG: Retrying Request, which would indicate its retrying API requests. We do tune the max retries very high by default, which can be lowered via the max_retries provider configuration and could return error conditions faster -- although I'm a little doubtful that this is occurring in this scenario unless the cause is some sort of throttling response from an API.

@rmldsky
Copy link
Author

rmldsky commented Sep 17, 2018

@bflad just FYI not to leave you hanging, I'll get back on this issue, this week.

@bflad
Copy link
Contributor

bflad commented Sep 25, 2018

@rmldsky any update?

@rmldsky
Copy link
Author

rmldsky commented Sep 26, 2018

@bflad sorry, took longer then expected for me to get back on it. Please see this encrypted (keybase/hashicorp) log output:
https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log

Let me know if you can access this file and decrypt it.

Also I did it with 1.37.0 and it's still the same case, even though I used different subset of resources than before:

1.33.0

real	0m25.638s
user	0m1.586s
sys	0m0.624s

vs
1.37.0

real	8m12.766s
user	0m2.187s
sys	0m1.195s

If you'd like me to run it with some other flags please let me know.

@bflad
Copy link
Contributor

bflad commented Sep 26, 2018

@rmldsky hmm something isn't quite right with that file or maybe I'm doing something wrong? GPG is only decrypting the first 7077 lines and reporting some errors:

$ wget https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log
--2018-09-26 08:18:08--  https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.32.133
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.32.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 921600 (900K) [text/plain]
Saving to: ‘encrypted-debug-1.37.0.log’

encrypted-debug-1.37.0.log           100%[======================================================================>] 900.00K  3.76MB/s    in 0.2s

2018-09-26 08:18:08 (3.76 MB/s) - ‘encrypted-debug-1.37.0.log’ saved [921600/921600]

$ gpg -d -o decrypted-debug-1.37.0.log encrypted-debug-1.37.0.log
gpg: encrypted with 2048-bit RSA key, ID D5F43B6432EA77F3, created 2014-02-26
      "HashiCorp Security <[email protected]>"
gpg: no valid OpenPGP data found.
gpg: block_filter 0x00007fac2f500b90: read error (size=8384,a->size=8454)
gpg: block_filter 0x00007fac302002c0: read error (size=16286,a->size=24484)
gpg: WARNING: encrypted message has been manipulated!
gpg: block_filter: pending bytes!
gpg: block_filter: pending bytes!
$ wc -l decrypted-debug-1.37.0.log
    7076 decrypted-debug-1.37.0.log

@bflad
Copy link
Contributor

bflad commented Sep 26, 2018

Got it! So its interesting that there seems to be 20 seconds of delay between every request and the response, e.g.

2018-09-26T11:34:39.358+0200 [DEBUG] plugin.terraform-provider-aws_v1.37.0_x4: 2018/09/26 11:34:39 [DEBUG] [aws-sdk-go] DEBUG: Request ec2/DescribeInstanceAttribute Details:
...
2018-09-26T11:35:00.537+0200 [DEBUG] plugin.terraform-provider-aws_v1.37.0_x4: 2018/09/26 11:35:00 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeInstanceAttribute Details:

Version 1.34.0 of the AWS provider was the first release that started using Go 1.11 (instead of Go 1.10) for builds, so maybe something changed in connection handling? Is there anything potentially strange about your networking environment (dual IPv4+IPv6 enabled, proxies, etc)?

@rmldsky
Copy link
Author

rmldsky commented Sep 26, 2018

@bflad no, nothing special. I also tried this on the same machine on different network and it was same story. Also not really sure what to look for in the network setup. Same network and machine on 1.33.0 and it is all fine.

I'll try to find someone in the office to test it out on different OS.

@bflad
Copy link
Contributor

bflad commented Sep 26, 2018

These upstream Go bug reports seem interesting:

Can you verify if your local DNS configuration has multiple name servers or if IPv6 is enabled locally?

@rmldsky
Copy link
Author

rmldsky commented Sep 26, 2018

Great catch. I had few DNS entries and left only 8.8.8.8:
1.37.0

real	0m26.757s
user	0m1.687s
sys	0m0.745s

Also I've check on a friend's machine (1.37.0, macOS Mojave BUT with go runtime 1.10.?) and it was as fast as 1.33.0

@bflad
Copy link
Contributor

bflad commented Sep 26, 2018

The upstream Go issues mentioned above seem to indicate it was a Go 1.11 regression with DNS handling so it should be machine specific based on DNS configuration. The machine's Go runtime shouldn't matter unless you are manually building the provider and running that custom built provider.

It seems Go 1.11.1 is slated to include a fix, so we'll keep this issue open to track the officially released version of the AWS provider that is built via that updated Go version.

@bflad bflad added upstream Addresses functionality related to the cloud provider. and removed waiting-response Maintainers are waiting on response from community or contributor. labels Sep 26, 2018
@bflad bflad self-assigned this Sep 26, 2018
@supasteev0
Copy link

Hello guys.
Just wanted to share my feedback, after @rmldsky answer about DNS entries.

I also had multiple DNS entries (thank you wifi networks ...).
I deleted them all and left only 8.8.8.8 : Terraform now behaves much quicker with latest aws provider plugin (1.38.0, how often are you guys releasing a new version ??).

Anyway, thank you for the feedback, problem solved on my side, but might be worth looking into this Go version issue indeed.

@bflad
Copy link
Contributor

bflad commented Oct 1, 2018

Go 1.11.1 has been released, which should include this fix. I'll switch our build processes to use the updated Go version and hopefully this will be part of the version 1.39.0 release.

@bflad
Copy link
Contributor

bflad commented Oct 3, 2018

The DNS issue starting in version 1.34.0 (built with Go 1.11.0) should now be resolved in version 1.39.0 (built with Go 1.11.1), released just now. 🎉

@bflad bflad closed this as completed Oct 3, 2018
@ghost
Copy link

ghost commented Oct 3, 2018

This has been released in version 1.39.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "aws" {
	version = "~> 1.39.0"
}
# ... other configuration ...

@supasteev0
Copy link

I confirm version 1.39.0 has solved the issue on my side. Terraform is now working flawlessly, even with multiple DNS entries in /etc/resolv.conf.

@brikis98
Copy link
Contributor

brikis98 commented Nov 8, 2018

This does not seem fixed for me. With multiple DNS entries, here's what I'm seeing on a dirt simple Terraform example:

provider "aws" {
  region = "eu-west-1"
}

data "aws_region" "current" {}

data "template_file" "foo" {
  template = "foo"
}

output "foo" {
  value = "${data.template_file.foo.rendered}"
}

output "region" {
  value = "${data.aws_region.current.name}"
}

Time:

$ time terraform apply
data.template_file.foo: Refreshing state...
data.aws_region.current: Refreshing state...

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

foo = foo
region = eu-west-1

real    0m13.912s
user    0m0.568s
sys    0m0.165s

And the exact same code with one DNS entry:

$ time terraform apply
data.template_file.foo: Refreshing state...
data.aws_region.current: Refreshing state...

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

foo = foo
region = eu-west-1

real	0m3.698s
user	0m0.595s
sys	0m0.163s

Terraform version:

$ terraform --version
Terraform v0.11.8
+ provider.aws v1.43.0
+ provider.template v1.0.0

@rmldsky
Copy link
Author

rmldsky commented Nov 8, 2018

@brikis98 was it a case for you with previous (< 1.43.0) provider versions? Could you check with ie. 1.39.0?

@nathanielks
Copy link
Contributor

I had this same problem. My issue was one of the entries in my DNS config wasn't resolving. Emptying it out to be solely 1.1.1.1 has fixed the issue. terraform plan went from 3 minutes and 30 seconds to 20 seconds total! Thank you, folks!

@jvolonda42
Copy link

@nathanielks response fixed my probleme

@bflad
Copy link
Contributor

bflad commented May 21, 2019

Locking this issue to encourage fresh bug reports, if necessary. 👍 As noted above, please double check your DNS resolvers!

@hashicorp hashicorp locked and limited conversation to collaborators May 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
provider Pertains to the provider itself, rather than any interaction with AWS. upstream Addresses functionality related to the cloud provider.
Projects
None yet
Development

No branches or pull requests

6 participants