-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.4.0+ breaks shared provider cache #32901
Comments
Hi @dbadrak! Thanks for reporting this. I don't think Terraform v1.4 should have changed the details of what file modes Terraform uses when extracting the plugin packages, but Terraform v1.4 does change some of the implementation details about when It seems then that, as you noted, the root problem here is not really with the plugin cache directory behavior but rather that Terraform treats all of these directories as if they belong to and are used by only a single user, but in your case you are expecting to share the plugin cache directory between multiple users who should all be able to write into it and execute from it. I suspect (but haven't yet verified) that the file mode here is being set by the upstream library Terraform uses for extracting the provider package zip files, which does some custom work related to umask: https://github.com/hashicorp/go-getter/blob/91e93376c7e9720dc7fd5031878c6ff0430b0631/get_file_copy.go#L50-L53 The caller to this in Terraform seems to be just setting that terraform/internal/providercache/package_install.go Lines 136 to 139 in bd75dad
...and so therefore the mode created on disk is presumably exactly the mode recorded in the extended attributes in the zip file, which is under the control of the provider developer who prepared the zip file. I don't think this fully explains the problem because Terraform itself is also trying to refresh an existing cache to make it match the archive and so is effectively trying to As currently designed the provider cache is not designed to be shared between different users on the same system, which extends from the idea that CLI configuration is also typically set on a per-user basis, but I agree it would be a good improvement to officially support this configuration. In the meantime though to get a working configuration you will need to use a separate cache directory per user who runs Terraform; that this was working on older versions was only by chance due to relying on some implementation details that have changed in v1.4 to ensure that Terraform is always able to update the dependency lock file correctly. Since the old implementation details were apparently working for your situation by coincidence, you may be able to work around this by using the temporary option to restore the previous implementation details, even though your reason for enabling it would not be related to the dependency lock file as that documentation assumes. This also seems related to #31964, which is about making it safe to concurrently run multiple Terraform processes that might interact with the same cache directory. Given that the two enhancements would be made in essentially the same part of Terraform, it would probably make sense to work on them together as a single PR. Having multiple users sharing the same cache directory also seems like it would increase the likelihood of two Terraform processes trying to update the directory concurrently. |
Thanks. I don't think I read anything to discourage using a global (shared) provider cache among multiple users. We've done this because all of the binaries are very large, and we've got a single system with upwards of 50 users using TF configurations across 40+ accounts, with lots of directories. I suppose the short term solution is to update the configuration to use per-user plugin cache until there is some multi-user concurrent safe mechanism available. Is such a thing a near term possibility? My take on your references is that it's not trivial. I've currently dropped back to 1.3.9, but I'll test out the per-user cache change. |
I'm also seeing this error after upgrading to 1.4.2:
We're using a shared provider cache directory, and a system user is caching the providers daily. My current terraform provider config is very basic:
I'm using vanilla
This doesn't change the fact that 1.4.x broke backward compatibility on how In the end I also downgraded to 1.3.9. |
Experiencing something similar when using terraform 1.4 with terragrunt 0.45.0 and I get the following errors: ❯ terragrunt run-all init
INFO[0020] The stack at /home/<redacted> will be processed in the following order for command init:
Group 1
- Module /home/<redacted>
╷
│ Error: Required plugins are not installed
│
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│ - registry.terraform.io/hashicorp/azurerm: the cached package for registry.terraform.io/hashicorp/azurerm 3.50.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
│
│ Terraform uses external plugins to integrate with a variety of different
│ infrastructure services. To download the plugins required for this
│ configuration, run:
│ terraform init
╵
╷
│ Error: Required plugins are not installed
│
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│ - registry.terraform.io/hashicorp/azurerm: the cached package for registry.terraform.io/hashicorp/azurerm 3.50.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
│
│ Terraform uses external plugins to integrate with a variety of different
│ infrastructure services. To download the plugins required for this
│ configuration, run:
│ terraform init
╵
# ... repeating the same error a few more times until it fully fails Note that in the errors it's trying to download the latest provider instead of azurerm 3.35 which is required by the module. Doing unset to ❯ unset TF_PLUGIN_CACHE_DIR
❯ terragrunt run-all init
INFO[0021] The stack at /home/<redacted> will be processed in the following order for command init:
Group 1
- Module /home/<redacted>
Initializing the backend...
Successfully configured the backend "azurerm"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing modules... So It doesn't seem to be a simple file ownership issue as I'm the only user. |
+1 Terragrunt 1.4.+ completely breaks caching for us and we are constantly getting the above errors. Nothing is ever found from cache. We also find when we specify
we get an intermittent checksum error for 4.60.0. I've not managed to pin down exactly what makes it happen or go away, but I am deleteting caches and lock files. This is on ubuntu 20.04 on WSL2 via terragrunt, but 1.3.9 works fine. |
We also just ran into this problem trying to upgrade to tf 1.4. In our case, we operate a monorepo with many tf configs, and the build system operates across all of them. To reduce network calls, we stage the provider bundle on the build system before running terraform. We did that by generating the bundle out of band, zipping, and rehosting it. Then when the build system runs the job, we download the bundle and extract it to the cache directory. We then run Which I explain mostly to highlight this isn't a multi-user shared setup, just a setup where terraform init should not be getting providers from the Internet. However, with tf 1.4, this setup fails:
|
Fwiw, we were able to fix things in our usage by unsetting the
|
setting TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE env var for my terraform init resolved it for me when working with TF v1.4.6 and v1.5.1 locally |
Terraform Version
Terraform Configuration Files
Debug Output
Successful init upgrade (1.3.9)
Failed init upgrade (1.4.0+)
The first file has a different owner than the user running this script. The second file, the user running the script does not have write access to the file. However, the umask is 002 and write access should have been set so the ACL can offer the write capability.
Expected Behavior
Provider should install and continue.
Actual Behavior
Provider install fails, and we cannot continue.
Steps to Reproduce
terraform init -upgrade
Additional Context
We used a shared plugin cache and provider tree. We have permissions set and ACLS on the files and directories to allow all our users (who belong to a specific group) to write into this directory. Beginning with 1.4.0 (still present it 1.4.2), we get a failure on
chmod()
of the file, because the user writing the file is not the owner.It appears the umask is not being honored on create. An
strace
shows it setting the mode to 0755 (vs 775, which is what I would expect for an executable & the umask). This will fail if the owner of the file is not the user running it. If the owner IS the user, I would expect a 775 permission vs 755.So, two issues, it seems:
References
No response
The text was updated successfully, but these errors were encountered: