keystore: add support for aws kms multi-region key replication#53927
Merged
keystore: add support for aws kms multi-region key replication#53927
Conversation
nklaassen
reviewed
May 2, 2025
dboslee
commented
May 14, 2025
Comment on lines
+59
to
+60
| // TODO(dboslee): waiting on AWS support to answer question regarding | ||
| // long time for GetPublicKey to succeed after updating key via UpdatePrimaryRegion. |
Contributor
Author
There was a problem hiding this comment.
KMS service team will deploy a fix by May 23 to resolve the GetPublicKey operation failures that occur when keys are in the 'updating' state. The service team has confirmed the code update is ready for deployment.
Update from AWS. Going to leave this until I am able to test the fix though. If this PR lands first I will create a follow up to adjust the timeout back down.
eriktate
added a commit
that referenced
this pull request
May 20, 2025
eriktate
added a commit
that referenced
this pull request
May 20, 2025
nklaassen
reviewed
May 20, 2025
nklaassen
approved these changes
May 20, 2025
eriktate
approved these changes
May 22, 2025
eriktate
added a commit
that referenced
this pull request
May 22, 2025
Contributor
dboslee
added a commit
that referenced
this pull request
May 28, 2025
* keystore: add support for aws kms multi-region key replication * update func name ApplyConfig -> ApplyMultiRegionConfig * more descriptive var out -> describeKeyOut * fix typo * better var names * add comment * renaming vars for readability * refactor multi-region auth config * add comment about cert authority lock * fix typo * move funcs up * update comment * copy whole struct instead of individual values
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Jun 10, 2025
…53927) (#55212) * keystore: add support for aws kms multi-region key replication (#53927) * keystore: add support for aws kms multi-region key replication * update func name ApplyConfig -> ApplyMultiRegionConfig * more descriptive var out -> describeKeyOut * fix typo * better var names * add comment * renaming vars for readability * refactor multi-region auth config * add comment about cert authority lock * fix typo * move funcs up * update comment * copy whole struct instead of individual values * keystore: retry describe key when applying multi-region kms config (#55274)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds the logic to handle kms key replication.
Key replication is configured on key creation and during teleport auth initialization. This allows for reconfiguring the primary/replica regions after an auth restart.
Let me know if there are any concerns with updating the stored key arn after a primary region change. This it not required but I thought it would provide a better UX. The keyID is the same however the full key arn changes with the aws region the primary is located in.
Each server only accesses KMS in the region configured by
AWSRegion. As a consequence of this only the region where the primary key currently resides is able to perform config changes on startup to replicate the key and update the primary. The other consequence of this is that when KMS is down in a given region auth in that region should be considered unhealthy. I plan to follow this up with a change that would link KMS availability to auth readiness.One final note, I had to increase the
pendingKeyTimeoutto 2 minutes. this is to account for the time it takes to successfully callGetPublicKeyafter updating the primary region. I've observed delays of ~1 minute during key creation when an auth server outside the primary region creates creates the key and needs to update the primary. I am still discussing this issue with AWS support but so far they saying this is expected behavior.changelog: Added support for AWS KMS multi-region keys with key replication