Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 10 additions & 36 deletions docs/pages/choose-an-edition/teleport-enterprise/gcp-kms.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -234,42 +234,16 @@ certificates can be signed without error.

## Migrating an existing cluster

If you have an existing Teleport cluster it will have already created CA keys
during its first start.
If you have an existing Teleport cluster it will have already created software
CA keys during its first start.
Those existing CA keys will have been used to sign all existing user and host
certificates, and will be trusted by all other services in your cluster.

When an Auth Server starts up with a `gcp_kms` keyring configured in its
`ca_key_params`, it will refuse to sign any certificates with any existing
software keys in the CA.
This will prevent any new user logins or new hosts from joining your cluster if
their requests are directed to that Auth Server and effectively cause downtime
for that server until a CA rotation is completed.

If some downtime until you can complete a CA rotation is acceptable, the
migration can be performed in three steps:

1. Configure all Auth Servers `ca_key_params` to use your desired KMS keyring,
as described in [Step 4](#step-45-configure-your-auth-server-to-use-kms-keys).
1. Restart all Auth Servers.
1. Perform a full [CA rotation](../../management/operations/ca-rotation.mdx).

To avoid any downtime while migrating your cluster, do the following procedure
instead:

1. Start a new Auth Server with an identical backend configuration to your
existing Auth Servers and with `ca_key_params` configured to use your KMS key
ring. Make sure no requests are routed to this new, temporary, Auth Server by
not adding it to your load balancer. You can run this anywhere with access to
your existing backend and new KMS key ring, one option would be to run it
locally on an existing Auth Server host (make sure to give it its own
`teleport.yaml` and unique `data_dir`).
1. Perform a full [CA rotation](../../management/operations/ca-rotation.mdx).
The temporary Auth Server will generate new KMS keys and include their names in
the backend CA state.
1. Stop/remove/delete the temporary Auth Server as it is no longer necessary.
1. Configure all other existing Auth Servers with identical `ca_key_params` and
reload/restart them, one by one. They will now use the KMS keys generated by the
temporary Auth Server.
1. Perform one more full CA rotation to evict all now-unused software keys from the
CA backend state so that hosts will no longer trust them.
In order for the Teleport Auth Service to generate new keys in GCP KMS and have
them trusted by the rest of the cluster, you will need to rotate all of your
Teleport CAs.

`teleport start` will print a warning during startup if any CA needs to be rotated.
CA rotation can be performed manually or semi-automatically, see our admin guide
on [Certificate rotation](../../management/operations/ca-rotation.mdx).
All CAs listed in the output of `tctl status` must be rotated.
52 changes: 14 additions & 38 deletions docs/pages/choose-an-edition/teleport-enterprise/hsm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ to use.
https://docs.aws.amazon.com/cloudhsm/latest/userguide/gs_cloudhsm_cli-install.html

Bootstrap the CLI by configuring the IP address of the new HSM:

```code
$ sudo /opt/cloudhsm/bin/configure-cli -a <Var name="HSM IP address"/>
```
Expand Down Expand Up @@ -170,7 +170,7 @@ to use.
```code
$ sudo /opt/cloudhsm/bin/configure-pkcs11 --disable-key-availability-check -a <Var name="HSM IP address"/>
```

</TabItem>
<TabItem label="YubiHSM2">

Expand Down Expand Up @@ -300,48 +300,26 @@ auth_service:

## Step 3/5. (Re)start Teleport Auth

If this is a new auth server which has not been started yet, starting a brand
new cluster with an empty backend, HSM keys will be automatically generated at
startup and no further action is required, skip to step 5. Otherwise, continue
reading.
If this is a new Teleport Auth Service which has not been started yet, starting
a brand new cluster with an empty backend, HSM keys will be automatically
generated at startup and no further action is required, skip to step 5.
Otherwise, continue reading.

If you are connecting an HSM to an existing Teleport cluster, restart the auth
server for the configuration changes to take effect. New CA keys will
automatically be generated in the HSM. For these keys to be trusted by the rest
of the cluster you will need to perform a CA rotation, see
[Step 4](#step-45-certificate-rotation-with-hsm). The auth server will not perform
any signing operations until the rotation has started. In an HA cluster you
should add the HSM to the auth configuration one server at a time, and do not
route any traffic to the auth server where the HSM is currently being added.
server for the configuration changes to take effect.
New CA keys will automatically be generated in the HSM during the next CA
rotation.
Until a CA rotation is completed, the Auth Service will continue signing new
certificates with the existing software keys.

## Step 4/5. Certificate Rotation with HSM

When adding a new HSM to an existing Teleport cluster, or adding a new
HSM-connected Auth server to an HA Teleport cluster, you will need to rotate all
Certificate Authorities in order for new certificates to be issued and
HSM-connected Auth Service to an HA Teleport cluster, you will need to rotate
all Certificate Authorities in order for new certificates to be issued and
trusted.

`tctl status` will print a warning if CA rotation is required:
```code
$ tctl status
WARNING: One or more auth servers has a newly added or removed HSM or KMS configured. You should not route traffic to that server until a CA rotation has been completed.
Cluster cluster-one
Version (=teleport.version=)
host CA never updated
user CA never updated
db CA never updated
openssh CA never updated
jwt CA never updated
saml_idp CA never updated
CA pin (=presets.ca_pin=)
CA pin sha256:e758c8f0f6cd95116d5da8171e0ff4adfa99dab3b1f171bfe854070884955524
```

`teleport start` will also print a warning during startup if any CA needs to be rotated.
Until rotation is completed, the auth server will not sign any new certificates
(except the `Admin` certificate used by `tctl` which will be signed by a
temporary HSM key).

`teleport start` will print a warning during startup if any CA needs to be rotated.
CA rotation can be performed manually or semi-automatically, see our admin guide
on [Certificate rotation](../../management/operations/ca-rotation.mdx).
All CAs listed in the output of `tctl status` must be rotated.
Expand All @@ -351,5 +329,3 @@ All CAs listed in the output of `tctl status` must be rotated.
You are all set! Check the teleport logs for `Creating new HSM key pair` to
confirm that the feature is working. You can also check that keys were created
in your HSM using your HSM's admin tool.


16 changes: 5 additions & 11 deletions integration/hsm/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ package hsm
import (
"context"
"net"
"os"
"path/filepath"
"testing"
"time"
Expand Down Expand Up @@ -229,9 +228,6 @@ func (s teleportServices) waitForPhaseChange(ctx context.Context) error {
}

func newAuthConfig(t *testing.T, log utils.Logger) *servicecfg.Config {
hostName, err := os.Hostname()
require.NoError(t, err)

config := servicecfg.MakeDefaultConfig()
config.DataDir = t.TempDir()
config.Auth.StorageConfig.Params["path"] = filepath.Join(config.DataDir, defaults.BackendDir)
Expand All @@ -244,13 +240,14 @@ func newAuthConfig(t *testing.T, log utils.Logger) *servicecfg.Config {

config.Auth.Enabled = true
config.Auth.NoAudit = true
config.Auth.ListenAddr.Addr = net.JoinHostPort(hostName, "0")
config.Auth.ListenAddr.Addr = net.JoinHostPort("localhost", "0")
Comment on lines -247 to +243
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why, but I had to make this change to get the tests to pass on my macbook

config.Auth.PublicAddrs = []utils.NetAddr{
{
AddrNetwork: "tcp",
Addr: hostName,
Addr: "localhost",
},
}
var err error
config.Auth.ClusterName, err = services.NewClusterNameWithRandomID(types.ClusterNameSpecV2{
ClusterName: "testcluster",
})
Expand All @@ -270,9 +267,6 @@ func newAuthConfig(t *testing.T, log utils.Logger) *servicecfg.Config {
}

func newProxyConfig(t *testing.T, authAddr utils.NetAddr, log utils.Logger) *servicecfg.Config {
hostName, err := os.Hostname()
require.NoError(t, err)

config := servicecfg.MakeDefaultConfig()
config.DataDir = t.TempDir()
config.CachePolicy.Enabled = true
Expand All @@ -289,8 +283,8 @@ func newProxyConfig(t *testing.T, authAddr utils.NetAddr, log utils.Logger) *ser
config.Proxy.DisableWebInterface = true
config.Proxy.DisableWebService = true
config.Proxy.DisableReverseTunnel = true
config.Proxy.SSHAddr.Addr = net.JoinHostPort(hostName, "0")
config.Proxy.WebAddr.Addr = net.JoinHostPort(hostName, "0")
config.Proxy.SSHAddr.Addr = net.JoinHostPort("localhost", "0")
config.Proxy.WebAddr.Addr = net.JoinHostPort("localhost", "0")

return config
}
39 changes: 17 additions & 22 deletions integration/hsm/hsm_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,13 +142,14 @@ func TestHSMRotation(t *testing.T) {
log.Debug("TestHSMRotation: starting auth server")
authConfig := newHSMAuthConfig(t, liteBackendConfig(t), log)
auth1 := newTeleportService(t, authConfig, "auth1")
t.Cleanup(func() {
require.NoError(t, auth1.process.GetAuthServer().GetKeyStore().DeleteUnusedKeys(ctx, nil))
})
allServices := teleportServices{auth1}

log.Debug("TestHSMRotation: waiting for auth server to start")
require.NoError(t, auth1.start(ctx))
err := auth1.start(ctx)
require.NoError(t, err, trace.DebugReport(err))
t.Cleanup(func() {
require.NoError(t, auth1.process.GetAuthServer().GetKeyStore().DeleteUnusedKeys(ctx, nil))
})

// start a proxy to make sure it can get creds at each stage of rotation
log.Debug("TestHSMRotation: starting proxy")
Expand All @@ -157,7 +158,7 @@ func TestHSMRotation(t *testing.T) {
allServices = append(allServices, proxy)

log.Debug("TestHSMRotation: sending rotation request init")
err := auth1.process.GetAuthServer().RotateCertAuthority(ctx, types.RotateRequest{
err = auth1.process.GetAuthServer().RotateCertAuthority(ctx, types.RotateRequest{
Type: types.HostCA,
TargetPhase: types.RotationPhaseInit,
Mode: types.RotationModeManual,
Expand Down Expand Up @@ -261,11 +262,9 @@ func TestHSMDualAuthRotation(t *testing.T) {
require.NoError(t, authServices.start(ctx), "auth service failed initial startup")

log.Debug("TestHSMDualAuthRotation: Starting load balancer")
hostName, err := os.Hostname()
require.NoError(t, err)
lb, err := utils.NewLoadBalancer(
ctx,
*utils.MustParseAddr(net.JoinHostPort(hostName, "0")),
*utils.MustParseAddr(net.JoinHostPort("localhost", "0")),
auth1.authAddr(t),
)
require.NoError(t, err)
Expand Down Expand Up @@ -487,12 +486,15 @@ func TestHSMMigrate(t *testing.T) {
require.NoError(t, auth1.start(ctx))
require.NoError(t, auth2.start(ctx))

// Replace configured addresses with port set to 0 with the actual port
// number so they are stable across hard restarts.
auth1Config.Auth.ListenAddr = auth1.authAddr(t)
auth2Config.Auth.ListenAddr = auth2.authAddr(t)

log.Debug("TestHSMMigrate: Starting load balancer")
hostName, err := os.Hostname()
require.NoError(t, err)
lb, err := utils.NewLoadBalancer(
ctx,
*utils.MustParseAddr(net.JoinHostPort(hostName, "0")),
*utils.MustParseAddr(net.JoinHostPort("localhost", "0")),
auth1.authAddr(t),
auth2.authAddr(t),
)
Expand All @@ -508,12 +510,11 @@ func TestHSMMigrate(t *testing.T) {
require.NoError(t, proxy.start(ctx))

testClient := func(t *testing.T) {
testAdminClient(t, auth2Config.DataDir, auth2.authAddrString(t))
testAdminClient(t, auth1Config.DataDir, lb.Addr().String())
}
testClient(t)

// Phase 1: migrate auth1 to HSM
lb.RemoveBackend(auth1.authAddr(t))
auth1.process.Close()
require.NoError(t, auth1.waitForShutdown(ctx))
auth1Config.Auth.KeyStore = keystore.SetupSoftHSMTest(t)
Expand Down Expand Up @@ -560,7 +561,7 @@ func TestHSMMigrate(t *testing.T) {
},
}

// do a full rotation
// Do a full rotation to get HSM keys for auth1 into the CA.
for _, stage := range stages {
log.Debugf("TestHSMMigrate: Sending rotate request %s", stage.targetPhase)
require.NoError(t, auth1.process.GetAuthServer().RotateCertAuthority(ctx, types.RotateRequest{
Expand All @@ -571,11 +572,7 @@ func TestHSMMigrate(t *testing.T) {
stage.verify(t)
}

// Safe to send traffic to new auth1 again
lb.AddBackend(auth1.authAddr(t))

// Phase 2: migrate auth2 to HSM
lb.RemoveBackend(auth2.authAddr(t))
auth2.process.Close()
require.NoError(t, auth2.waitForShutdown(ctx))
auth2Config.Auth.KeyStore = keystore.SetupSoftHSMTest(t)
Expand All @@ -587,18 +584,16 @@ func TestHSMMigrate(t *testing.T) {

testClient(t)

// do a full rotation
// Do another full rotation to get HSM keys for auth2 into the CA.
for _, stage := range stages {
log.Debugf("TestHSMMigrate: Sending rotate request %s", stage.targetPhase)
require.NoError(t, auth1.process.GetAuthServer().RotateCertAuthority(ctx, types.RotateRequest{
require.NoError(t, auth2.process.GetAuthServer().RotateCertAuthority(ctx, types.RotateRequest{
Type: types.HostCA,
TargetPhase: stage.targetPhase,
Mode: types.RotationModeManual,
}))
stage.verify(t)
}

// Safe to send traffic to new auth2 again
lb.AddBackend(auth2.authAddr(t))
testClient(t)
}
4 changes: 3 additions & 1 deletion integration/hsm/reload_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
"testing"
"time"

"github.com/gravitational/trace"
"github.com/stretchr/testify/require"

"github.com/gravitational/teleport/lib/service"
Expand Down Expand Up @@ -54,7 +55,8 @@ func testReloads(t *testing.T) {

authConfig := newAuthConfig(t, log)
auth := newTeleportService(t, authConfig, "auth")
require.NoError(t, auth.start(testCtx))
err := auth.start(testCtx)
require.NoError(t, err, trace.DebugReport(err))
t.Cleanup(func() { require.NoError(t, auth.close()) })

proxyConfig := newProxyConfig(t, auth.authAddr(t), log)
Expand Down
8 changes: 4 additions & 4 deletions lib/auth/keystore/aws_kms.go
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ func (a *awsKMSKeystore) canSignWithKey(ctx context.Context, raw []byte, keyType

// DeleteUnusedKeys deletes all keys readable from the AWS KMS account and
// region if they:
// 1. Are not included in the argument activeKys
// 1. Are not included in the argument activeKeys
// 2. Are labeled in AWS KMS as being created by this Teleport cluster
// 3. Were not created in the past 5 minutes.
//
Expand All @@ -310,7 +310,7 @@ func (a *awsKMSKeystore) canSignWithKey(ctx context.Context, raw []byte, keyType
// 1. A different auth server (auth2) creates a new key in GCP KMS
// 2. This function (running on auth1) deletes that new key
// 3. auth2 saves the id of this deleted key to the backend CA
func (a *awsKMSKeystore) DeleteUnusedKeys(ctx context.Context, activeKeys [][]byte) error {
func (a *awsKMSKeystore) deleteUnusedKeys(ctx context.Context, activeKeys [][]byte) error {
activeAWSKMSKeys := make(map[string]int)
for _, activeKey := range activeKeys {
keyIsRelevent, err := a.canSignWithKey(ctx, activeKey, keyType(activeKey))
Expand Down Expand Up @@ -381,7 +381,7 @@ func (a *awsKMSKeystore) DeleteUnusedKeys(ctx context.Context, activeKeys [][]by
a.logger.WithFields(logrus.Fields{
"key_arn": keyARN,
"key_state": keyState,
}).Info("DeleteUnusedKeys skipping AWS KMS key which is not in enabled state.")
}).Info("deleteUnusedKeys skipping AWS KMS key which is not in enabled state.")
return nil
}
creationDate := aws.TimeValue(describeOutput.KeyMetadata.CreationDate)
Expand All @@ -391,7 +391,7 @@ func (a *awsKMSKeystore) DeleteUnusedKeys(ctx context.Context, activeKeys [][]by
// the backend CA yet (which is why they don't appear in activeKeys).
a.logger.WithFields(logrus.Fields{
"key_arn": keyARN,
}).Info("DeleteUnusedKeys skipping AWS KMS key which was created in the past 5 minutes.")
}).Info("deleteUnusedKeys skipping AWS KMS key which was created in the past 5 minutes.")
return nil
}

Expand Down
3 changes: 1 addition & 2 deletions lib/auth/keystore/aws_kms_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ import (
"github.com/gravitational/teleport/lib/utils"
)

// TestAWSKMS_DeleteUnusedKeys tests the AWS KMS keystore's DeleteUnusedKeys
// TestAWSKMS_deleteUnusedKeys tests the AWS KMS keystore's deleteUnusedKeys
// method under conditions where the ListKeys response is paginated and/or
// includes keys created by other clusters.
//
Expand Down Expand Up @@ -322,7 +322,6 @@ func (f *fakeAWSKMSService) ListKeysWithContext(ctx aws.Context, input *kms.List
output.NextMarker = aws.String(strconv.Itoa(i))
output.Truncated = aws.Bool(true)
}
fmt.Println("NIC ListKeys", aws.StringValue(input.Marker), len(output.Keys), output.NextMarker)
return output, nil
}

Expand Down
4 changes: 2 additions & 2 deletions lib/auth/keystore/gcp_kms.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ func (g *gcpKMSKeyStore) canSignWithKey(ctx context.Context, raw []byte, keyType
return true, nil
}

// DeleteUnusedKeys deletes all keys from the configured KMS keyring if they:
// deleteUnusedKeys deletes all keys from the configured KMS keyring if they:
// 1. Are not included in the argument activeKeys
// 2. Are labeled with hostLabel (teleport_auth_host)
// 3. The hostLabel value matches the local host UUID
Expand All @@ -248,7 +248,7 @@ func (g *gcpKMSKeyStore) canSignWithKey(ctx context.Context, raw []byte, keyType
// or a simpler case where: the other auth server is running in a completely
// different Teleport cluster and the keys it's actively using will never appear
// in the activeKeys argument.
func (g *gcpKMSKeyStore) DeleteUnusedKeys(ctx context.Context, activeKeys [][]byte) error {
func (g *gcpKMSKeyStore) deleteUnusedKeys(ctx context.Context, activeKeys [][]byte) error {
// Make a map of currently active key versions, this is used for lookups to
// check which keys in KMS are unused.
activeKmsKeyVersions := make(map[string]int)
Expand Down
Loading