Skip to content

RFD 127: Encrypted Session Recording#53348

Merged
eriktate merged 1 commit intomasterfrom
eriktate/rfd-encrypted-session-recording
Aug 20, 2025
Merged

RFD 127: Encrypted Session Recording#53348
eriktate merged 1 commit intomasterfrom
eriktate/rfd-encrypted-session-recording

Conversation

@eriktate
Copy link
Copy Markdown
Contributor

This PR proposes an RFD for encrypting session recordings using age encryption.

Admins are free to configure their own public key recipients provided they are
generated by a supported algorithm.

As a convenience, we will also add a new `keygen` utility to `tsh` for key
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more appropriate for tctl than tsh? It feels like an administrative workflow to me, not an end-user workflow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that makes more sense. At one point it was a tctl command, but the similarity to ssh-keygen and the fact that tsh has to integrate with HSMs in order to playback made me wonder if it would be worth it to just do the entire workflow with tsh. Administrative workflows running through tctl` seems like a more compelling reason, though


### Decryption and Replay

Replay of encrypted session recordings will happen through `tsh play`. For
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will we do about desktop session recordings, which can only be played via the UI?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current design they wouldn't be replayable. In theory we could download and decrypt all of the PNGs with a command, but we would have to stitch the chunks together into full PNGs and the UX would be pretty poor.

I had an idea to add something like a tsh decrypt-server to support replays on web. It would spin up a local server capable of unwrapping the decryption key at an address that could be pasted into the web UI. The web UI would send encrypted recording data in and get decrypted recording data out, all local to the user's machine. I've seen a similar setup work before, but I haven't fully fleshed it out yet.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely an option - I have always thought of doing a teleport play desktop or tsh play desktop that would spin up a local webserver that can play a desktop session for offline playback, though this might be a big effort.

Can auth decrypt the recordings on the fly? Recordings play through auth's StreamSessionEvents API, so if auth has access to the HSM can it stream decrypted events to the client?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea too 🤔 Certainly fewer moving pieces and more straightforward to use. I can experiment with that and see how hard it would be to make a standalone desktop playback UI.

Can auth decrypt the recordings on the fly?

This is also possible, but we would lose the property of teleport infrastructure never interfacing directly with the HSM. If that's not something we think is worth having we could probably just do the entire HSM integration through auth instead of enabling tctl and tsh to handle that directly.

Comment thread rfd/0207-encrypted-session-recordings.md Outdated

### Teleport admin replaying encrypted session with a given private key
```bash
cat ~/my-key | tsh play --identity-stdin 49608fad-7fe3-44a7-b3b5-fab0e0bd34d1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: identity-stdin sounds like user identity, not like something necessary for decryption.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe --private-key-stdin instead? The "identity" was just in keeping with how age refers to things, but in retrospect I agree it's way less obvious

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, or --decryption-key-stdin to be even more clear.

Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch from f133d20 to b722985 Compare March 24, 2025 16:50
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
Comment thread rfd/0207-encrypted-session-recordings.md Outdated
set of `--hsm*` parameters. More information about this in
[HSM Configuration](#hsm-configuration).

### Session Recording Modes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what are the clients requirement for this, but it seems over-complicated to me. I would pick one mode only if this is not driven by client requests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually just a listing of recording modes we already support today. I suppose we could only enable encryption for a limited set of modes, but I don't think it would save us much complexity unless we only supported sync modes.

Comment thread rfd/0207-encrypted-session-recordings.md Outdated
@eriktate
Copy link
Copy Markdown
Contributor Author

eriktate commented Apr 1, 2025

Important changes in the latest commit:

  • Dropped manual key management since it isn't explicitly required right now
  • Moved decryption to the auth service
  • Added a CA for easier keystore integration, including HSMs
  • Data encryption scheme is now just the standard envelope encryption provided by age

The reason for wrapping a separate age private key is to enable the proxy service and nodes to use the public key directly without having access to the backing keystore.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch from 9e7a9a6 to 80b8126 Compare April 7, 2025 17:19
@eriktate eriktate changed the title RFD 207: Encrypted Session Recording RFD 127: Encrypted Session Recording Apr 7, 2025
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md
Comment on lines +273 to +749
#### `tctl` Changes

Key rotation will be handled through `tctl` using a new subcommand:
```bash
tctl auth rotate recordings --type=data # for rotating X25519 pairs
tctl auth rotate recordings --type=wrap # for rotating HSM/KMS backed wrapping keys
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you can add a subcommand under an existing command to make tctl auth rotate and tctl auth rotate recordings both work. I think you can't do this but I'm not 100% sure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i think you replied on the wrong thread but I'll reply here)

It looks we have one instance of this with tctl alerts ack and tctl alerts ack ls, but I'm not sure if that's actually working based on the help text. Maybe something like auth keys rotate would be better?

That is some kind of hack where tctl alerts ack ls is not a real subcommand, it accepts an argument for the alert id, and if the arg happens to be "ls" then it takes over https://github.com/gravitational/teleport/blob/master/tool/tctl/common/alert_command.go#L144-L146

You'd have to do some similar hack with an argument to make tctl auth rotate recordings works but then yeah the help text would be weird, let's try to avoid that.

Maybe something under tctl recordings? Idk like tctl recordings encryption rotate

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tctl recordings encryption rotate would work fine too. I would only wonder if being a subcommand of auth makes it more discoverable since auth already handles CA rotation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for putting it under tctl recordings.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md
keys are included with their rotated private keys in order to facilitate faster
searching.

### Key Rotation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we ever delete keys from the HSM/KMS? how will we prevent these keys from being deleted by the existing DeleteUnusedKeys function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest way would be to fetch the active keys from the SessionRecordingConfigV2 resource and include them in the list of allKeysInUse. That should keep them around and also clean them up automatically when they're rotated out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we arranged things such that auths that don't know about this feature (because they're too old, for example) don't accidentally cause data loss by accidentally deleting keys that they shouldn't delete. AFAICT this is also currently a problem if an auth server is downgraded below the point where we introduced a new CA.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call 🤔 I think we could just use a different label for these keys. That should keep older auth servers from accidentally deleting them and newer auth servers can target keys with both expected labels

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As remarked before, you should have a user defined data retention period, and a user defined key rotation period. These periods must be consistent across the teleport cluster. Then you add creation timestamps to keys, e.g. in the key label. You can then delete keys that are older than data retention period + key rotation period.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch 2 times, most recently from 2f4ac3f to 6efbd80 Compare April 8, 2025 18:39
@eriktate
Copy link
Copy Markdown
Contributor Author

Hey @gl-mc, thanks for taking the time to review! I tried to address all of your points below:

The change request seems to use RSA 2048 bit keys, which are at the 112 bit security level. It would be better to use keys with 3072 bits or better, as they operate on the 128 bit security level, in line with the other cryptographic algorithms used (ChaCha20-Poly1305, X25519, SHA-256).

This is a good point, but the primary issue with choosing another algorithm for key wrapping is lack of compatibility. OAEP with RSA2048 seems to be the best async encryption algorithm currently supported by the keystores compatible with Teleport. AWS KMS for example only supports variants of RSA using SHA1 or SHA256.

The key rotation seems to be overly complicated and involve re-keying of existing recordings. An alternative approach that does not require re-keying could provide benefits.

Key rotation is definitely the most complicated portion of this, but it's actually designed not to reencrypt recordings. The rotated keys are moved out of the active set to prevent them from being used during encryption, but remain accessible for the purpose of future replay. Most of the complexity involved is ensuring that each configured keystore in the cluster has access to a wrapped version of the new X25519 private key.

There does not seem to be a security benefit or performance benefit in wrapping X25519 keys with RSA keys. Technically, that is not possible in HSMs as they are both asymmetric key types. Recommended hosting the X25519 keys directly in the HSMs.

Historical playback of recordings without reencryption is actually the reason for key wrapping. It's possible for an auth server to join a cluster with a keystore configuration that does not provide access to the original keys used to encrypt a recording. Since the key pair used for encryption/decryption of recording data is the wrapped X25519 pair, a new auth server just needs to acquire its own wrapped copy in order to decrypt historical recordings. Wrapping this key rather than storing it as a software key is meant to ensure recording decryption is only permitted by the configured keystores/HSMs.

@gl-mc
Copy link
Copy Markdown

gl-mc commented Jun 30, 2025

Hey Erik, thanks for the quick response :)

The change request seems to use RSA 2048 bit keys, which are at the 112 bit security level. It would be better to use keys with 3072 bits or better, as they operate on the 128 bit security level, in line with the other cryptographic algorithms used (ChaCha20-Poly1305, X25519, SHA-256).

This is a good point, but the primary issue with choosing another algorithm for key wrapping is lack of compatibility. OAEP with RSA2048 seems to be the best async encryption algorithm currently supported by the keystores compatible with Teleport. AWS KMS for example only supports variants of RSA using SHA1 or SHA256.

Using SHA-256 for the RSA 3072 OAEP is fine, as SHA-256 is at the 128 bit security level. I don't understand why you use an asymmetric algorithm for the key wrapping. As key wrapping and unwrapping will be done on a trusted device (e.g., a network HSM), there is no benefit using an asymmetric key - the key will never leave the device.

The key rotation seems to be overly complicated and involve re-keying of existing recordings. An alternative approach that does not require re-keying could provide benefits.

Key rotation is definitely the most complicated portion of this, but it's actually designed not to reencrypt recordings. The rotated keys are moved out of the active set to prevent them from being used during encryption, but remain accessible for the purpose of future replay. Most of the complexity involved is ensuring that each configured keystore in the cluster has access to a wrapped version of the new X25519 private key.

Keep in mind that you need to do key rotation both on the data encryption key (currently X25519) and the key encryption key (currently RSA). So in my opinion, your best bet for doing this is:

  • Every auth server has access to all wrapped keys, e.g. through a shared database.
  • Every auth server has access to the trusted key decryption device(s)/server(s) - e.g. networked HSM(s)
  • When an auth server requires access to a recording, it searches for the correct wrapped decryption key in the shared database, based on customer configured key rotation and data retention settings. E.g., if key rotation happens yearly and data retention is for 3 years, then you have 4 data encryption keys (X25519), and 4 key encryption keys (AES - preferred, or RSA as per the current design). When a recording from the previous year needs to be replayed, the auth server requests the correct wrapped private key from the DB and then connects to the key decryption device to obtain the decrypted wrapped key required for playback. The decryption device uses the key encryption key for that year to decrypt the data encryption key, and then returns the data encryption key.
    However, this step has potential for optimisation. Looking at the age file format, it already uses the concept of a data encryption key (ChaCha20-Poly1305) encrypted with a key encryption key (X25519). So instead of wrapping the X25519 key with another key encryption key, it seems to be more efficient to host the X25519 keys directly in the trusted key decryption device(s). All auth servers need to have access to the trusted key decryption device(s) anyway. Then you can directly decrypt the ChaCha20-Poly1305 data decryption key on the trusted key decryption device and send that back. This saves you one level of indirection - one decryption per request. You get the additional benefit that the ChaCha20-Poly1305 keys are unique per recording file, so even if one of them leaks from the auth servers, there is very limited damage done. Whereas, if one of the X25519 private keys leaks, then all recordings from that given year are accessible.
    It has the further benefit that you do not need to maintain a database with wrapped X25519 keys that is accessible for all clients. The recordings contain all the information you need already.
  • You can still distribute the X25519 public keys to all the recording servers, as they can easily be exported from the trusted key decryption device.

@eriktate
Copy link
Copy Markdown
Contributor Author

eriktate commented Jul 1, 2025

Using SHA-256 for the RSA 3072 OAEP is fine, as SHA-256 is at the 128 bit security level.

Looking through documentation again I might be incorrect about RSA 2048 being a limitation. Let me do some experimentation and I'll update the RFD if making the switch is possible.

I don't understand why you use an asymmetric algorithm for the key wrapping. As key wrapping and unwrapping will be done on a trusted device (e.g., a network HSM), there is no benefit using an asymmetric key - the key will never leave the device.

This is totally reasonable and I think the solution you proposed below would be preferred if we had guarantees around all auth servers having access to the same keys. That's currently not true, though, since the key store configuration is per auth server rather than per cluster. You could have auth servers in the same cluster that use different HSMs or even a combination of an HSM and a cloud KMS. Wrapping with asymmetric keys enables sharing of recording encryption keys without needing access to the same key store.

This is actually even trickier for HSM backends because Teleport includes the auth server's host UUID in the key name, which essentially ensures that keys can't be shared across auth servers even if they have access to the same networked HSMs.

Keep in mind that you need to do key rotation both on the data encryption key (currently X25519) and the key encryption key (currently RSA). So in my opinion, your best bet for doing this is:

  • Every auth server has access to all wrapped keys, e.g. through a shared database.
  • Every auth server has access to the trusted key decryption device(s)/server(s) - e.g. networked HSM(s)
  • When an auth server requires access to a recording, it searches for the correct wrapped decryption key in the shared database, based on customer configured key rotation and data retention settings. E.g., if key rotation happens yearly and data retention is for 3 years, then you have 4 data encryption keys (X25519), and 4 key encryption keys (AES - preferred, or RSA as per the current design). When a recording from the previous year needs to be replayed, the auth server requests the correct wrapped private key from the DB and then connects to the key decryption device to obtain the decrypted wrapped key required for playback. The decryption device uses the key encryption key for that year to decrypt the data encryption key, and then returns the data encryption key.
    However, this step has potential for optimisation. Looking at the age file format, it already uses the concept of a data encryption key (ChaCha20-Poly1305) encrypted with a key encryption key (X25519). So instead of wrapping the X25519 key with another key encryption key, it seems to be more efficient to host the X25519 keys directly in the trusted key decryption device(s). All auth servers need to have access to the trusted key decryption device(s) anyway. Then you can directly decrypt the ChaCha20-Poly1305 data decryption key on the trusted key decryption device and send that back. This saves you one level of indirection - one decryption per request. You get the additional benefit that the ChaCha20-Poly1305 keys are unique per recording file, so even if one of them leaks from the auth servers, there is very limited damage done. Whereas, if one of the X25519 private keys leaks, then all recordings from that given year are accessible.
    It has the further benefit that you do not need to maintain a database with wrapped X25519 keys that is accessible for all clients. The recordings contain all the information you need already.
  • You can still distribute the X25519 public keys to all the recording servers, as they can easily be exported from the trusted key decryption device.

We're still planning on building out rotations for key encryption keys, it's just been omitted from this RFD so we can spend some more time making sure we handle all of the known edge cases properly. Like I mentioned above, your proposed solution would be dramatically simpler but it's not guaranteed that all auth servers have access to the same trusted device or the same keys.

There are definitely optimizations that could be made if we targeted a more constrained usage, but would the general implementation proposed in the RFD be insufficient for your use case?

@gl-mc
Copy link
Copy Markdown

gl-mc commented Jul 2, 2025

Using SHA-256 for the RSA 3072 OAEP is fine, as SHA-256 is at the 128 bit security level.

Looking through documentation again I might be incorrect about RSA 2048 being a limitation. Let me do some experimentation and I'll update the RFD if making the switch is possible.

I don't understand why you use an asymmetric algorithm for the key wrapping. As key wrapping and unwrapping will be done on a trusted device (e.g., a network HSM), there is no benefit using an asymmetric key - the key will never leave the device.

This is totally reasonable and I think the solution you proposed below would be preferred if we had guarantees around all auth servers having access to the same keys. That's currently not true, though, since the key store configuration is per auth server rather than per cluster. You could have auth servers in the same cluster that use different HSMs or even a combination of an HSM and a cloud KMS. Wrapping with asymmetric keys enables sharing of recording encryption keys without needing access to the same key store.

I don't get this argument. Why is it easier to synchronise wrapping keys across servers than the direct encryption keys? It seems to me that key material needs to be synchronised across all auth servers in a cluster and their associated HSMs, if they all shall be usable for decrypting recordings. You could e.g. use the p11keygen from pkcs11-tools to create the key material and immediately export it under wrapping keys and distribute it among HSMs, including cloud HSMs. The additional RSA encryption of the X25519 keys does not seem to solve this problem. It is also a problem that you as a software developer don't have to solve. It's key management / distribution, which needs to be achieved by the customer using your product.

This is actually even trickier for HSM backends because Teleport includes the auth server's host UUID in the key name, which essentially ensures that keys can't be shared across auth servers even if they have access to the same networked HSMs.

Using an attribute in the key name to restrict access is not really a restriction. You should use separate slots in the HSM for separate auth servers if you want to achieve separation of key material. Then you have strong access guarantees.

Keep in mind that you need to do key rotation both on the data encryption key (currently X25519) and the key encryption key (currently RSA). So in my opinion, your best bet for doing this is:

  • Every auth server has access to the trusted key decryption device(s)/server(s) - e.g. networked HSM(s)
    However, this step has potential for optimisation. Looking at the age file format, it already uses the concept of a data encryption key (ChaCha20-Poly1305) encrypted with a key encryption key (X25519). So instead of wrapping the X25519 key with another key encryption key, it seems to be more efficient to host the X25519 keys directly in the trusted key decryption device(s). All auth servers need to have access to the trusted key decryption device(s) anyway. Then you can directly decrypt the ChaCha20-Poly1305 data decryption key on the trusted key decryption device and send that back. This saves you one level of indirection - one decryption per request. You get the additional benefit that the ChaCha20-Poly1305 keys are unique per recording file, so even if one of them leaks from the auth servers, there is very limited damage done. Whereas, if one of the X25519 private keys leaks, then all recordings from that given year are accessible.
    It has the further benefit that you do not need to maintain a database with wrapped X25519 keys that is accessible for all clients. The recordings contain all the information you need already.
  • You can still distribute the X25519 public keys to all the recording servers, as they can easily be exported from the trusted key decryption device.

We're still planning on building out rotations for key encryption keys, it's just been omitted from this RFD so we can spend some more time making sure we handle all of the known edge cases properly. Like I mentioned above, your proposed solution would be dramatically simpler but it's not guaranteed that all auth servers have access to the same trusted device or the same keys.

Again, this is something you don't have to solve, as it's key management and not the focus of your software. The customer can do this.

There are definitely optimizations that could be made if we targeted a more constrained usage, but would the general implementation proposed in the RFD be insufficient for your use case?

I see the increased risk of software X25519 keys, which does not correspond to our security policies. Hosting the X25519 keys directly in the HSMs would make it compliant, and as discussed, have significant simplifications for the design. It also has the security benefit that if a software key leaks, then it compromises only a single recording, instead of compromising all recordings from a key rotation period. I get the feeling that you're trying to solve too many problems at once and are not focusing on a solution that achieves your goal. I think key management is out of scope for Teleport. But Teleport needs to provide a configuration file where we can specify which key labels will be used for encryption at what time. It also needs to be able to find a key based on the key identifier and date to decrypt recordings using a PKCS#11 module.

@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch 3 times, most recently from d147e9c to 828a888 Compare July 10, 2025 23:29
Comment on lines +98 to +102
session_recording_config: node
session_recording_config_encryption: on
session_recording_config_key_manager: keystore
session_recording_config_active_key_labels: []
session_recording_config_rotated_key_labels: []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking now that we're adding more than a single field here it probably makes sense to nest them under a session_recording_config group. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that makes configuration less annoying (less repetition) and makes it easier to extend session recording configurations in the future 👍

I'll probably just add the new configurations and not create two ways to configure the recording mode.

Comment on lines +63 to +76
kind: session_recording_config
version: v2
spec:
encryption:
# whether or not encryption should be enabled
enabled: true
# whether or not Teleport should manage the keys or simply consume them
key_manager: 'keystore|teleport'
# key labels Teleport will use to find active encryption keys when mode is
# set to "keystore-managed"
active_key_labels: []
# key labels Teleport will use to find rotated encryption keys when mode is
# set to "keystore-managed"
rotated_key_labels: []
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that from looking at a config that I would know what the additional options meant. Naming is hard, but I think we should try to emphasize the managed and unmanaged modes in a clearer way through these configurations. What do you think about something like this?

Suggested change
kind: session_recording_config
version: v2
spec:
encryption:
# whether or not encryption should be enabled
enabled: true
# whether or not Teleport should manage the keys or simply consume them
key_manager: 'keystore|teleport'
# key labels Teleport will use to find active encryption keys when mode is
# set to "keystore-managed"
active_key_labels: []
# key labels Teleport will use to find rotated encryption keys when mode is
# set to "keystore-managed"
rotated_key_labels: []
kind: session_recording_config
version: v2
spec:
encryption:
# whether or not encryption should be enabled
enabled: true
// Manual key management can be enabled for cluster admins to have complete
// control over creation and rotation of keys. In this mode Teleport takes zero
// ownership of cryptographic material.
manual_key_management:
# whether or not Teleport should manage the keys or simply consume them
enabled: true
# key labels Teleport will use to find active encryption keys, must be populated
# if enabled is true.
active_key_labels: []
# key labels Teleport will use to find rotated encryption keys, must be populated
# if enabled is true.
rotated_key_labels: []

Copy link
Copy Markdown
Contributor Author

@eriktate eriktate Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is clearer. It also makes it easier to add config options for manual management in the future 👍

What do you think about calling them labels? That's sort of specific to PKCS#11, but if you wanted to use this with an AWS KMS backend I think it would have to be a list of IDs, ARNs, or aliases. Is reusing the same config confusing? We could have a list of tagged values:

active_keys:
  - type: aws_id
     key: mrk-1234abcd12ab34cd56ef1234567890ab
  - type: gcp_id
    key: projects/proj-1/locations/global/keyRings/ring-1/cryptoKeys/teleport_recording/cryptoKeyVersions/1
  - type: pkcs11_label
    key: teleport_recording

Or maybe even have an object that supports all of the possible field names:

active_keys:
  - aws_id: mrk-1234abcd12ab34cd56ef1234567890ab
  - gcp_id: projects/proj-1/locations/global/keyRings/ring-1/cryptoKeys/teleport_recording/cryptoKeyVersions/1
  - pkcs11_label: teleport_recording

Although this would require validating against weird configurations where a single key has an aws_id and a pkcs11_label configured, so maybe the tagged version makes more sense.

Some sort of differentiation would also make it easier to migrate to a new keystore in manual mode because you could explicitly define keys for multiple backends. Right now we'd either have to inspect the label value to figure out which backend it belongs to or try all labels and ignore failures when using something like teleport_recording as an AWS ID.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful if the configuration allowed timestamps in relation to keys labels. Then the application knows which key to use for decrypting which recording. Ideally in seconds from epoch, so there are no time-zone issues.
It might also be useful to have a key identifier that is used in the age document alongside the keys, something like a SHA256 hash of the public key or similar. The age spec allows this:

"Recipient implementations MAY choose to include an identifier of the specific recipient (for example, a short hash of the public key) as an argument. Note that this sacrifices any chance of ciphertext anonymity and unlinkability."

For example, something like this:

active_keys:
  - type: aws_id
     key: mrk-1234abcd12ab34cd56ef1234567890ab
     start_date: 1752461976
     end_date: 1783998031
     sha256: 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
  - type: gcp_id
     key: projects/proj-1/locations/global/keyRings/ring-1/cryptoKeys/teleport_recording/cryptoKeyVersions/1
     start_date: 1752461976
     end_date: 1783998031
     sha256: 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be04
  - type: pkcs11_label
     key: teleport_recording
     start_date: 1752461976
     end_date: 1783998031
     sha256: 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be05

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
see that their key has been fulfilled, and finish the import process using
their configured keystore.

Some keystores support this sort of wrapped key exchange without ever exposing
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you referring to? Can you provide an example for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually duplicating most of the key sharing section a couple of paragraphs below. It also isn't relevant to the proposed direction of avoiding key exchange altogether, so it doesn't really make sense to include it here. I went ahead and removed this and added some of the missing parts to the key sharing section below.

I thought about removing key sharing altogether, but I think it's probably worth keeping some of it documented just in case we need to implement something similar in the future.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md Outdated
configured labels. Note that the "keys" cached for KMS and HSM backends are
just references to real keys stored in the keystore.

### Key Rotation
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about the requirements of this feature, but when I think about "key rotation" I always associate that to the process of creating new encryption keys in order to fully replace existing keys. The main use case for this would be to limit the potential damage of a compromised key. The current design will just limit the amount of session recordings that can be stolen by an attacker. I am sure you've already considered this, but we need to make sure that we're ok with accepting such risk (and also adjust the marketing/documentation appropriately)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is definitely something we're thinking about. Key rotation as it's currently designed is meant to limit the blast radius of a given breached key. The timeline for when those rotated keys are fully deleted ultimately comes down to the retention policy of the recording data itself.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key rotation works together with the data retention period. Keys and associated recordings are deleted when they are held longer than the data retention period. But keys get rotated more often than the data retention period, both to limit the blast radius of a single key being compromised, and also to enable deletion of keys in a fine-grained enough fashion so it matches the data retention period cleanup interval, and also to avoid re-encryption of existing data under new keys when the key cleanup happens.

For example, you could have yearly data cleanups and matching yearly key rotations. Or you could have monthly cleanups and also monthly key rotation. Sometimes, you find that there are monthly cleanups and only yearly key rotation - in that case, you have to keep the key until the last month encrypted under the key has been cleaned up.

I hope that clarifies?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also still missing the key hashes from the key configuration. I think it would make key lookups easier, as the key hashes can be included in the age document.

KEY_STATE_ROTATED = 3;
}

// WrappedKey wraps a PrivateKey using an asymmetric keypair.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may find that HSMs typically prevent you from wrapping an asymmetric key with an asymmetric key (at least Entrust HSMs do). It is typically required to wrap an asymmetric key with a symmetric key first. You can double-wrap / envelope, see e.g. https://github.com/Mastercard/pkcs11-tools/blob/master/docs/MANUAL.md#p11wrap-and-p11unwrap for details.
In the context of this pull request, it seems best to wrap the private key with a ChaCha20-Poly1305 key first and then wrap with an asymmetric key.

// WrappedKey wraps a PrivateKey using an asymmetric keypair.
message WrappedKey {
// RecordingEncryptionPair is the asymmetric keypair used to wrap the
// private key. Expected to be RSA.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is only RSA supported? Which RSA mode will be used (OAEP vs PKCS#1 v1.5)? I think it would be best to use OAEP in this context, but for wrapping a symmetric key first, which then wraps the private key. See comment on R155 above for rationale.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've discussed this elsewhere, but RSA with OAEP using SHA256 was chosen because it's supported across all of our keystores (AWS KMS, GCP CKM, and HSMs). We've also dropped the intermediate asymmetric key and key sharing from the scope so the symmetric age file keys should be the only keys being wrapped now.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
message KeySet {
// AciveKeys is a list of active, wrapped X25519 private keys. There should
// be at most one wrapped key per auth server using the
// SessionRecordingConfigV2 resource unless keys are being rotated.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to key rotation. The age file format defines the keys that can be used for decryption of a stream in the file header. If a historic recording is being played back (i.e. a recording made using an older x25519 key that has been rotated out), then it would use a key with KEY_STATE = ROTATED for decrypting the file. It could at that stage have more than one active key for decryption. I.e. the current active key and the recovered key from the rotated key. Is that correct?

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
Comment thread rfd/0127-encrypted-session-recordings.md
- Generate an asymmetric data key, export the encrypted
private key, and save it for future sharing.
- Import the generated asymmetric key back into KMS. This requires reencrypting
in software and exposes the private key to the Teleport process.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using a key management system when you expose the plaintext keys in software? You could go with software keys directly. Or is the idea that the software keys are only exposed for a short time and compromise is less likely during that time? You should highlight the risk of using (intermediate) software keys for all GCP and AWS KMS implementations, so customers are aware.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the idea that the software keys are only exposed for a short time and compromise is less likely during that time?

Yeah that would be the idea, but we aren't actually implementing this right now. I included it as documentation for a method that might be possible if we needed this type of key sharing in the future, but it could also be omitted from the RFD. I'll see if I can make it more explicit that this is just documenting a suggestion rather than describing what we're going to deliver for this feature

Comment thread rfd/0127-encrypted-session-recordings.md
`session_recording_config.status.encryption_keys`. Each unique public key will
result in a custom `Recipient` implementation responsible for wrapping the file
key using `OAEP-SHA256`. They will include the fingerprinted public key in the
stanza to make lookup during replay easier. The `Identity` implementation will
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the key configuration does not yet include the key fingerprint?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning on accepting or exposing the fingerprint in the configuration. Instead a fingerprint will be generated for the public key after it's fetched from the keystore. The fingerprint itself will be a bas64 encoded sha256 hash of the public key in PKIX ASN.1 DER form.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will mean that Teleport will have to create fingerprints for all keys in the keystore at startup. No big deal, just a different way of working with it. I agree it will likely avoid typos in configuration files and similar issues.

Comment thread rfd/0127-encrypted-session-recordings.md Outdated
@rosstimothy rosstimothy requested a review from gl-mc July 28, 2025 15:26
@eriktate eriktate requested a review from espadolini August 7, 2025 16:42
Comment on lines +46 to +48
[here](https://age-encryption.org/v1). The officially supported key wrapping
algorithms for `age` are limited to X25519, Ed25519, and RSA. Support for other
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say ed25519 and RSA are you referring to the SSH key support in age?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly that

Comment on lines +53 to +56
In order to support key unwrapping directly within HSM and KMS systems, this
RFD proposes a custom `age` plugin that uses RSA 4096 keypairs in combination
with the decryption APIs exposed by each keystore backend. The choice of RSA is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If age supports RSA natively why do we need a plugin?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't have access to the private key in most cases so key unwrapping needs to interact directly with a KMS/HSM. As a bonus it allows us to pull the public key used during encryption in order to lookup the correct private key without having to write a separate parser for age headers

mode will result in failed startup.

In order to support key unwrapping directly within HSM and KMS systems, this
RFD proposes a custom `age` plugin that uses RSA 4096 keypairs in combination
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going with RSA-4096 for some specific security level that the more common RSA-2048 does not provide? If so, please mention it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are, 128-bit is the minimum. I've updated the RFD to mention that 👍

// EncryptionKeyPair is a keypair used for encrypting and decrypting data.
message EncryptionKeyPair {
// public_key is the public encryption key.
bytes public_key = 1 [(gogoproto.jsontag) = "public_key"];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use gogoproto features for anything new.

Comment on lines +192 to +193
// Default KeyState
KEY_STATE_UNSPECIFIED = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not ascribe any meaning to the zero value of an enum, it's not a default, it's a missing value and if it's necessary to specify it and it's set to zero then the message should be treated as not valid (as if the value was set to some other value that's completely unknown).

// api/proto/teleport/legacy/types/types.proto

// EncryptionKeyPair is a keypair used for encrypting and decrypting data.
message EncryptionKeyPair {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncryptionKeyPair doesn't need to be in api/proto/teleport/legacy/types/types.proto, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not? 🤔 It's not necessarily specific to session recordings, though, and it's meant to be added to the generic keystore API. Since the other key types are in legacy and it also needs to go into SessionRecordingConfigStatus I figured this was the best spot for it

Comment on lines +139 to +152
// AgeEncryptionKey is a PEM encoded RSA4096 public key used for encrypting
// session recordings.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use PEM especially if it's stored in a bytes field, store the spki in DER form instead (which could likely not need anything more to specify that it's a RSA public key rather than something else).

Comment on lines +532 to +544
In order for all auth servers in a cluster to replay all recordings, they will
need access to the same keys. There are common cases where keys are easily
shared, such as a software key or an AWS KMS key that all auth servers have
access to. This RFD proposes that recording encryption should assume shared
access to the same keys and keystores from all auth servers in a cluster. For
networked HSMs, this will require replacing the `host_uuid` label applied to
keys with a shared label signalling recording encryption.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all auth servers have access to the REKs why do we need to keep track of unaccessible keys and individual key rotations and multiple active keys at once?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping track is just a failsafe in case that assumption breaks down because of a bad auth server configuration. Multiple active keys should only be possible during key rotations before the completion or rollback RPC is called, otherwise there should just be a single active key.

@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch from a763eb6 to 371e8fc Compare August 20, 2025 12:42
@eriktate eriktate added backport/branch/v18 no-changelog Indicates that a PR does not require a changelog entry and removed backport/branch/v15 backport/branch/v17 labels Aug 20, 2025
@eriktate eriktate force-pushed the eriktate/rfd-encrypted-session-recording branch from 371e8fc to 998f300 Compare August 20, 2025 12:49
@eriktate eriktate added this pull request to the merge queue Aug 20, 2025
Merged via the queue into master with commit c026b41 Aug 20, 2025
40 checks passed
@eriktate eriktate deleted the eriktate/rfd-encrypted-session-recording branch August 20, 2025 13:09
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@eriktate See the table below for backport results.

Branch Result
branch/v18 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/branch/v18 no-changelog Indicates that a PR does not require a changelog entry rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants