Skip to content

feat: seamless migration to HSM/KMS#36549

Merged
nklaassen merged 6 commits intomasterfrom
nklaassen/kms-migration
Jan 15, 2024
Merged

feat: seamless migration to HSM/KMS#36549
nklaassen merged 6 commits intomasterfrom
nklaassen/kms-migration

Conversation

@nklaassen
Copy link
Copy Markdown
Contributor

@nklaassen nklaassen commented Jan 11, 2024

This PR makes it much easier to migrate an existing Teleport cluster from software keys to HSM or KMS keys.

Previously, as soon as an HSM/KMS was configured in the teleport.yaml for an auth server, it would immediately refuse to sign any more certificates with software keys. This was meant to defend against someone configuring an HSM and then forgetting to perform the necessary CA migrations, thinking they were protected by the HSM when in fact they were still using the old software keys.

In practice, this just made it very difficult to migrate a cluster since there would be downtime where you couldn't even use tctl remotely because there was no way to log in. In a dual-auth cluster you could theoretically avoid downtime but the process was arcane and difficult to execute. Check out the docs changes to see the parts I was able to remove here.

This will be critical for enabling Cloud to start using AWS KMS keys.

TODO: (in a following PR) add a cluster alert (probably not for Cloud) when an HSM/KMS is configured but not actively used yet because a CA rotation is needed. Necessary now that tctl status no longer prints this (it has no way to
tell).

Changelog: Improved the migration experience when configuring HSM or KMS backing for CA key material

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Vercel preview here: https://docs-oq94pkenh-goteleport.vercel.app/docs/ver/preview

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Vercel preview here: https://docs-2eragb3m2-goteleport.vercel.app/docs/ver/preview

@gravitational gravitational deleted a comment from github-actions Bot Jan 11, 2024
Comment thread docs/pages/choose-an-edition/teleport-enterprise/gcp-kms.mdx Outdated
@ptgott
Copy link
Copy Markdown
Contributor

ptgott commented Jan 11, 2024

Are we backporting this?

@nklaassen
Copy link
Copy Markdown
Contributor Author

Are we backporting this?

No, the plan is for it to go out in v15

The PR makes it much easier to migrate an existing Teleport cluster from
software keys to HSM or KMS keys.

Previously, as soon as an HSM/KMS was configured in the teleport.yaml
for an auth server, it would immediately refuse to sign any more
certificates with software keys.
This was meant to defend against someone configuring an HSM and then
forgetting to perform the necessary CA migrations, thinking they were
protected by the HSM when in fact they were still using the old software
keys.

In practice, this just made it very difficult to migrate a cluster since
there would be downtime where you couldn't even use tctl remotely
because there was no way to log in.
In a dual-auth cluster you could theoretically avoid downtime but the
process was arcane and difficult to execute.
Check out the docs changes to see all the steps I was able to remove
here.

This will be critical for enabling Cloud to start using AWS KMS keys.

TODO: add a cluster alert (probably not for Cloud) when an HSM/KMS is
configured but not actively used yet because a CA rotation is needed.
Necessary now that `tctl status` no longer prints this (it has no way to
tell).
@github-actions
Copy link
Copy Markdown
Contributor

🤖 Vercel preview here: https://docs-6nnps8jkl-goteleport.vercel.app/docs/ver/preview

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Vercel preview here: https://docs-j2dncq7uq-goteleport.vercel.app/docs/ver/preview

Comment on lines -247 to +243
config.Auth.ListenAddr.Addr = net.JoinHostPort(hostName, "0")
config.Auth.ListenAddr.Addr = net.JoinHostPort("localhost", "0")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why, but I had to make this change to get the tests to pass on my macbook

Comment on lines 142 to +166
if (cfg.PKCS11 != PKCS11Config{}) {
backend, err := newPKCS11KeyStore(&cfg.PKCS11, logger)
return &Manager{backend: backend}, trace.Wrap(err)
pkcs11Backend, err := newPKCS11KeyStore(&cfg.PKCS11, cfg.Logger)
return &Manager{
backendForNewKeys: pkcs11Backend,
usableSigningBackends: []backend{pkcs11Backend, softwareBackend},
}, trace.Wrap(err)
}
if (cfg.GCPKMS != GCPKMSConfig{}) {
backend, err := newGCPKMSKeyStore(ctx, &cfg.GCPKMS, logger)
return &Manager{backend: backend}, trace.Wrap(err)
gcpBackend, err := newGCPKMSKeyStore(ctx, &cfg.GCPKMS, cfg.Logger)
return &Manager{
backendForNewKeys: gcpBackend,
usableSigningBackends: []backend{gcpBackend, softwareBackend},
}, trace.Wrap(err)
}
if (cfg.AWSKMS != AWSKMSConfig{}) {
backend, err := newAWSKMSKeystore(ctx, &cfg.AWSKMS, logger)
return &Manager{backend: backend}, trace.Wrap(err)
awsBackend, err := newAWSKMSKeystore(ctx, &cfg.AWSKMS, cfg.Logger)
return &Manager{
backendForNewKeys: awsBackend,
usableSigningBackends: []backend{awsBackend, softwareBackend},
}, trace.Wrap(err)
}
return &Manager{backend: newSoftwareKeyStore(&cfg.Software, logger)}, nil
return &Manager{
backendForNewKeys: softwareBackend,
usableSigningBackends: []backend{softwareBackend},
}, nil
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the actual change. Instead of only having a single keystore backend for everything, the keystore manager has one preferred backend for any new keys it will generate (this is only used to generate CA keys) and a list of backends it can use to sign stuff, in preference order. I always include the software keystore as the last element in the list, so the auth can always sign certs if there are any software keys in the CA.

@nklaassen
Copy link
Copy Markdown
Contributor Author

@smallinsky @zmb3 @r0mant I'm hoping I can get this in for the v15 cutoff and the bot chose all of you for one reason or another, not sure if I need one or two more reviews

@nklaassen nklaassen added this pull request to the merge queue Jan 15, 2024
Merged via the queue into master with commit 1736010 Jan 15, 2024
@nklaassen nklaassen deleted the nklaassen/kms-migration branch January 15, 2024 16:30
@public-teleport-github-review-bot
Copy link
Copy Markdown

@nklaassen See the table below for backport results.

Branch Result
branch/v15 Create PR

nklaassen added a commit that referenced this pull request Jan 25, 2024
Backport #36899 to branch/v13

The actual fix is a few characters in lib/auth/keystore/pkcs11.go.
I'm also backporting changes to test files from #36549 that this PR built on
top of, which make it easier to run all HSM unit and integration tests
with a connected YubiHSM2 (which I did when putting together this
backport).

Instead of merging all changes in the integration tests, I just checked
out the state of them from branch/v14 in
#37296
nklaassen added a commit that referenced this pull request Jan 26, 2024
Backport #36899 to branch/v12

The actual fix is a few characters in lib/auth/keystore/pkcs11.go.
I'm also backporting changes to test files from #36549 that this PR built on
top of, which make it easier to run all HSM unit and integration tests
with a connected YubiHSM2 (which I did when putting together this
backport).

Instead of merging all changes in the integration tests, I just checked
out the state of them from branch/v13 in
#37301

Changelog: fixes CA key generation when two auth servers share a single YubiHSM2
github-merge-queue Bot pushed a commit that referenced this pull request Jan 26, 2024
…7301)

Backport #36899 to branch/v13

The actual fix is a few characters in lib/auth/keystore/pkcs11.go.
I'm also backporting changes to test files from #36549 that this PR built on
top of, which make it easier to run all HSM unit and integration tests
with a connected YubiHSM2 (which I did when putting together this
backport).

Instead of merging all changes in the integration tests, I just checked
out the state of them from branch/v14 in
#37296
github-merge-queue Bot pushed a commit that referenced this pull request Jan 29, 2024
…7305)

Backport #36899 to branch/v12

The actual fix is a few characters in lib/auth/keystore/pkcs11.go.
I'm also backporting changes to test files from #36549 that this PR built on
top of, which make it easier to run all HSM unit and integration tests
with a connected YubiHSM2 (which I did when putting together this
backport).

Instead of merging all changes in the integration tests, I just checked
out the state of them from branch/v13 in
#37301

Changelog: fixes CA key generation when two auth servers share a single YubiHSM2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants