Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

website: "terraform init" documentation could link to the relevant CLI config settings #26693

Closed
joe-a-t opened this issue Oct 23, 2020 · 5 comments

Comments

@joe-a-t
Copy link

joe-a-t commented Oct 23, 2020

Current Terraform Version

0.13.5

Use-cases

Any terraform usage (but most commonly by automation) that iterates through multiple terraform directories that have the same provider(s) but does not desire strict provider versioning and would like the process to be faster and more efficient.

Attempted Solutions

Thanks to the great improvements in https://www.terraform.io/upgrade-guides/0-13.html#new-filesystem-layout-for-local-copies-of-providers with terraform providers mirror, caching providers is now much easier to implement. However, the terraform providers mirror command downloads all providers in the current directory, even if they already exist in the cache. This prevents running terraform providers mirror in every terraform directory since that would invalidate the entire point of using a cache. As a result, we have to manually compile a list of providers to put in the cache and populate it in advance.

Additionally, if terraform init tries to get a specific version of a provider where the provider exists in the cache but not at that specific version, it is a hard fail instead of reaching out to the remote source to download the provider as it would if no versions of the provider existed in the cache.

Proposal

I would like to propose adding a -populate-cache flag to terraform init. The behavior of this flag would be to:

  1. If used in combination with -upgrade, checks the remote to see what the latest allowed version is, then continue to next steps a with request for the latest allowed version number.
  2. Check the local cache in ~/.terraform.d for the provider and latest allowed version.
  3. If provider and desired version exist, use the cache.
  4. If provider and desired version does not exist, download it from the remote to the cache and use that.

This would lead to significant performance improvements since any providers are only getting downloaded from their remote sources once, regardless of how many terraform directories use them. This would also avoid the need to pre-provision the cache and the management overhead it entails.

Ideally, this could also be extended to include modules per #16268.

To boost the urgency of the issue, I want to point out that it should lower Hashicorp's cloud costs since they receive fewer requests to download assets they host and as a result need fewer servers and pay for less network bandwidth. This may one day be worth changing the default behavior of terraform init to automatically store all downloaded providers/modules in ~/.terraform.d but I think the safest route is to make it available behind a flag and change the default behavior later if people like the change and start using it everywhere.

References

#16268

@joe-a-t joe-a-t added enhancement new new issue not yet triaged labels Oct 23, 2020
@pkolyvas pkolyvas added providers and removed new new issue not yet triaged labels Oct 26, 2020
@apparentlymart
Copy link
Contributor

Hi @joe-a-t! Thanks for sharing this use-case.

What you've described does seem valid and interesting, though I was a little confused at first due to some differences in terminology in your write-up compared to what we typically use in our documentation. Just so we can be sure we're all talking about the same thing, I want to define some terms:

  • The provider cache is an explicitly-configured directory which Terraform will use to cache providers it downloads. Terraform populates this directory automatically as part of terraform init and will read from it automatically on future runs. This is without disabling the ability for a particular Terraform configuration to request something that isn't in the cache: if Terraform needs to download something new then it will get it from the appropriate upstream source and remember it in the cache.
  • The ~/.terraform.d/plugins directory is an example of an implied local mirror directory (though it's one of the legacy ones that the documentation doesn't currently discuss directly). Mirrors are directories Terraform can be configured to use instead of upstream sources, so if you activate a mirror (either implicitly or explicitly) then you'll commonly be using it exclusively and no longer accessing the upstream sources at all. (This is common in environments where the Terraform processes are isolated from the internet and thus cannot access the upstream registries.)

With those terms defined, I think you are describing the main use-case for the opt-in provider cache, but you've been trying to meet that use-case using a filesystem mirror directory and thus finding it doesn't quite meet your needs.

I'd be interested to hear if you've tried (or would be willing to try) the provider cache feature as I linked above, and whether that better meets the use-cases you've described here. If it doesn't, we may prefer to meet the use-cases you've described by adjusting the provider cache behaviors rather than by adding a new operating mode entirely, because that feature is already designed to maintain a read-through cache and so I expect it would be a smaller change compared to the current behavior.

Another related enhancement we could consider based on this feedback is to optimize terraform providers mirror so that it can potentially detect whether the target directory already contains a package that matches the upstream registry's checksum and thus avoid re-downloading it. I think that would then help with the typical mirror use-case of running Terraform in an environment where no upstream registry access is available at all, by making it faster to populate a mirror directory with packages from many different configurations at once and then use that directory instead of the upstream registries when those configurations are used in practice.

@joe-a-t
Copy link
Author

joe-a-t commented Oct 27, 2020

Hey @apparentlymart, thanks a lot for the response. Yes, I tried it out and it looks like the provider cache you linked to works for most of my use case. I hadn't come across that since I don't see it mentioned on https://www.terraform.io/docs/commands/init.html, https://learn.hashicorp.com/tutorials/terraform/automate-terraform, or https://www.terraform.io/docs/extend/how-terraform-works.html#discovery. Could we update those docs to include mention of the plugin_cache_dir?

As far as improvements to the plugin_cache_dir, it would also be nice if we could specify the plugin_cache_dir when invoking terraform init instead of needing to worry about where the home directory is and placing the config file there (or needing to inject an environment variable telling terraform where the config file is). It would also be nice if Terraform could create the plugin_cache_dir if it does not exist.

It would also be a huge help if this sort of caching was extended to include modules, I know #16268 has been hanging around for quite a while but it would provide immense performance improvements for my use case if something similar could be implemented for module caching.

@apparentlymart
Copy link
Contributor

Hi again @joe-a-t! Thanks for following up with that additional information. I'm glad to hear that the cache directory was a better fit for your use-case.

I think there's maybe three enhancement requests in your comment here:

  • Add something in the documentation about provider init that talks about plugin_cache_dir. In practice I expect what that would end up being is a link to the overall Provider Installation section, for which plugin_cache_dir is one part, because everything in that situation is relevant to configuring how the provider installation portion of terraform init will behave.

  • A command line option for the cache directory. I believe it's designed the way it is right now because the assumption is that the cache directory is something you just set once and have all future Terraform use apply to it. Given that there are already two ways to set it, I expect we'll decline to add a third way that seems to me like it would only rarely be used, because setting a cache only for a single call would defeat the object of it. Note that if you're using a Unix-style shell you can set the environment variable as part of the overall command line, with something like this (the story isn't quite as nice in the Windows command prompt, though):

    TF_PLUGIN_CACHE_DIR=example terraform init
  • A cache directory for modules. As you noted, that's already covered by Feature Request: Module cache dir à la plugins #16268 so I'm going to let that issue continue representing that use-case. Although on the surface it seems like providers and remote modules are quite similar, they differ immensely in the details of how they tend to be used and so a mechanism for one rarely translates directly to the other. I expect there will be some future effort to improve various characteristics of the remote module mechanism at some point, and the team will probably prefer to address a number of different concerns of this sort all at once just to amortize the cost of reloading all of that context to work on it.

With all of that said, I'm going to relabel this issue to represent the documentation addition you mentioned because I agree that it's weird not to have a connection from the command that does the installation to the settings that command respects. I don't expect we'll add a cache-related command line option to terraform init for the reasons I stated above, so I'm not going to make a new feature request issue for that one, and we can continue to use #16268 to represent the potential for a local cache for remote modules.

Thanks again for raising this!

@apparentlymart apparentlymart changed the title terraform init -populate-cache website: "terraform init" documentation could link to the relevant CLI config settings Oct 28, 2020
@apparentlymart
Copy link
Contributor

As part of revising the terraform init documentation for the dependency lock file mechanism coming in the forthcoming v0.14.0 release I also rewrote the terraform init documentation about provider installation to include a link to the CLI configuration section for customizing how Terraform installs providers, in 897cb72.

The new version of the documentation will be published to the website as part of the v0.14.0 final release, which we're planning to make in a few weeks as long as there's no blocking feedback from the current v0.14.0-rc1 prerelease.

@ghost
Copy link

ghost commented Dec 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants