azurerm_network_security_group_association - prevent deadlock between association and network interface creation#4501
Merged
Merged
Conversation
Update from master TF repo
…_interface, caused by them locking the same resources, but in a different order.
katbyte
added a commit
that referenced
this pull request
Oct 5, 2019
abhinavdahiya
added a commit
to abhinavdahiya/installer
that referenced
this pull request
Oct 8, 2019
…ndle private_dns zone
Using the upstream azurerm provider is not possible for now because of following reasons:
1) There is not srv record resource for private dns zone
2) The version of provider that has the private dns zone resources `1.34.0` has a lot of bugs like
* hashicorp/terraform-provider-azurerm#4452
* hashicorp/terraform-provider-azurerm#4453
* hashicorp/terraform-provider-azurerm#4501
Some of these bugs are fixed, and some are in flight.
Another reason moving to `1.36.0` which might have all the fixes we need is the provider has moved to using
`standalone terraform plugin SDK v1.1.1` [1]. Because we vendor both terraform and providers, this causes errors like
`panic: gob: registering duplicate types for "github.com/zclconf/go-cty/cty.primitiveType": cty.primitiveType != cty.primitiveType`
Therefore, we would have to move towards a single vendor for terraform and plugins for correct inter-operation, which is tricker due to conflicts elsewhere
A simple 4 resource plugin that re-uses the already vendored azurerm provider as library and carries over the required resources seems like an easy fix for now.
[1]: hashicorp/terraform-provider-azurerm#4474
|
This has been released in version 1.36.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example: provider "azurerm" {
version = "~> 1.36.0"
}
# ... other configuration ... |
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I believe we have found a bug in the Azure Resource Manager provider for Terraform, having to do with the azurerm_subnet_network_security_group_association resource and its interaction with azurerm_network_interface.
The issue is that the azurerm_subnet_network_security_group_association resource, when being created, locks the following resources, in the following order:
However, the azurerm_network_interface resource, when being created, locks the following resources, in the following order:
You will notice that both lock the virtual network and subnet, but in the opposite order. This means that if both resources happen to be created at the same time in two threads, the following could happen:
In this situation, the two resources will wait indefinitely for each other until Terraform is terminated - a deadlock. Our trigger for finding this bug was that Terraform would be trying to create interfaces and associations forever. Of course, this only happens intermittently, because it depends on the exact order above happening. For example (resource names redacted):
To fix this, we simply need to lock the three resources in the same order across both resources. As a workaround, we are currently having all network interfaces depend on our NSG association, since that will ensure both are never created at the same time.
As an aside, is there any way to review all locks across the provider, to ensure this situation doesn't happen elsewhere? Perhaps there could be a policy to always lock resources in alphabetical order so that in the future this doesn't happen again?