Skip to content
This repository has been archived by the owner on May 15, 2023. It is now read-only.

Conversation

modular-magician
Copy link
Collaborator

The purpose of these changes is to implement support in the beta provider for concurrent node pool CRUD operations on a single cluster.

The GA provider should be unchanged (the global mutex store changes described below are technically included in the GA provider but the GA provider behavior is unchanged - however, I am happy to make the global mutex store changes specific to the beta provider if that is preferred).

The changes to the beta provider include:

  • Updating the global mutex store to use sync.RWMutex instead of sync.Mutex and adding the necessary methods to the MutexKV struct to support acquiring shared/read locks.
  • Removing the polling for cluster "ready" status, since with support for concurrent operations on the same cluster we no longer need to wait for the cluster to have no operations running on it before proceeding.
  • For NP CRUD operations, instead of acquiring an exclusive/write lock on the cluster, we acquire a read/shared lock on the cluster and an exclusive/write lock on the node pool. This ensures cluster-wide operations (e.g. UpdateCluster) still will block NP level operations, but NP level operations on different NPs won't block each other. A NP-level mutex uses the cluster hash + node pool name to guarantee lock key uniqueness.
  • Add retry logic to NP CRUD operations to retry while it receives an "incompatible operation" error (which has the FAILED_PRECONDITION canonical code), to safely retry concurrent operations blocked by a lock conflict with another operation.

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Generated Terraform, and ran make test and make lint to ensure it passes unit and linter tests.
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Ran relevant acceptance tests (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read the Release Notes Guide before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

container: Added support for concurrent node pool mutations on a cluster. Previously, node pool mutations were restricted to run synchronously clientside. NOTE: While this feature is supported in Terraform from this release onwards, only a limited number of GCP projects will support this behavior initially. The provider will automatically process mutations concurrently as the feature rolls out generally.

Reviewer Notes

  • Ran the set of 33 TestAccContainerNodePool acceptance tests with the beta provider and they passed, although TestAccContainerNodePool_withWorkloadIdentityConfig seems flaky (only passed when I ran it individually). So it seems to be fully backward compatible.
  • I did manual testing using my own *.tf files to create/delete multiple NPs concurrently, and confirmed the concurrency works.

Derived from GoogleCloudPlatform/magic-modules#6748

@modular-magician modular-magician requested a review from a team as a code owner December 5, 2022 19:42
@modular-magician modular-magician requested review from ScottSuarez and removed request for a team December 5, 2022 19:42
@modular-magician modular-magician merged commit b6a9bf9 into GoogleCloudPlatform:main Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant