Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions rfd/0210-windows-ad-hsm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
---
authors: Zac Bergquist <zac.bergquist@goteleport.com>
state: released (v18.1.0, v17.7.0)
---

# RFD 210 - HSM support for Active Directory environments

## What

Ensure that Teleport works correctly in Active Directory environments
when configured to use HSM-backed private key material.

## Why

Teleport supports certificate-based login in Active Directory environments for
both Windows desktop access (RDP), and MS SQL database access.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we also have Oracle, but it uses the same code under the hood.


Windows requires that these certificates include metadata which contains a link
to a certificate revocation list (CRL), which is validated as part of the login
process.

Teleport's current implementation assumes that there is a single CA issuing
these certificates, so all certificates that Teleport generates point to the
same CRL distribution point (CDP).

The assumption that there is a single CA is invalid when Teleport is configured
to use HSMs, as each auth server will have its own certificate and HSM-backed
private key. Windows requires that the CRL pointed to by the certificate is
signed by the same CA as the certificate itself. As a result, Teleport logins
only work 1 out of N times (where N is the number of auth servers in the
cluster) when configured for HSMs, because the auth server that issued the
cert for the session or may not be the same as the auth server that generated
the CRL.

## Required Approvers

* Engineering: @nklaassen && (@gabrielcorado || @greedy52)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Engineering: @nklaassen && (@gabrielcorado || @greedy52)
* Engineering: @nklaassen && (@gabrielcorado || @Tener )

Krzysztof already approved from db side.

* Security: (@rjones || @klizhentas)

## Details

Windows supports a variety of different CRL distribution points:

- local paths: `file://...`
- remote file share paths: /`\\server\share\...`
- LDAP objects: `ldap://CN=Teleport,CN=CDP,...`
- HTTP/HTTPS endpoints: `https://...`

Teleport has always published its CRLs to LDAP, as it's guaranteed to be
available in the environments where Teleport runs and Teleport already has an
LDAP requirement. While it would be possible to make the Teleport proxy server
host CRLs over HTTPS, we aim to avoid such a change as we can't be sure that all
of the Windows components have network access to the Teleport proxy.

> [!NOTE]
> Teleport has its own locking system and does not use CRLs for revocation.
> Teleport only implements CRLs to satisfy Windows requirements, and its CRLs
> are always empty and always valid for 1 year from time of issuance.

In order to support HSM-enabled clusters with multiple active signers, we need
to consider two aspects:

1. Certificate generation: how does Teleport generate the CRL distribution point
2. CRL publishing: how does the CRL actually get to the distribution point referenced in the cert

### Certificate Generation

The certificate generation step is largely the same for Windows desktop access
and MS SQL database access. In both cases, Teleport generates an LDAP URL based
on the configured Active Directory domain. It takes the form:

```
ldap://CN=CLUSTER,CN=SIGNER,CN=CDP,CN=Public Key Services,CN=Configuration,DC=example,DC=com
```

Where:
- `CLUSTER` is the name of the Teleport cluster
- `SIGNER` is `Teleport` (for Windows desktop certs) or `TeleportDB` for database certs
- `example.com` is the AD domain (`DC=example,DC=com` in LDAP-speak)

### CRL Publishing

CRL publishing works differently for each Teleport feature.

#### MS SQL database access

For database access, our setup instructions require that users generate the CRL
with `tctl auth crl`, copy it over to the Windows environment, and publish it to
LDAP using `certutil -dspublish`.

It is worth noting that `tctl auth crl` generates empty CRLs that are valid for _1 year_,
so database access will require manual intervention in case:

- the CRL expires after 1 year
- a CA rotation is performed
Comment on lines +87 to +95
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve the UX here quite a lot by using the automated publishing workflow. We should probably keep the tctl auth crl as a fallback.

We will likely need an active LDAP connectivity anyway to lookup SID for username in order to fix #53502.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to keep that out of scope for now, since getting HMS to work is priority over other UX improvements. Feel free to file a new issue and link to this RFD for book-keeping.

Copy link
Copy Markdown
Contributor

@Tener Tener May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure, this was just a side remark, no need to change this RFD.


#### Windows desktop access

For Windows desktop access, Teleport's `windows_desktop_service` generates a CRL
on startup and publishes it to LDAP directly - no `certutil -dspublish` step is
necessary. This process is repeated every time the agent restarts, as well as
every time the agent refreshes its credentials (typically every 4 hours or so).

As a result, desktop access requires manual intervention in the case of a CA rotation,
but does not suffer from the CRL expiry issue, since the CRL is constantly getting
pushed back by 1 year.

### Proposed Changes

In order to support HSMs, which result in multiple active signers being present in the cluster,
we will need to:

1. Generate certs with a CDP that references the specific cert issuer
2. Publish multiple CRLs to LDAP instead of a single CRL

#### Cert Generation: use SKID of the issuer

When generating certificates, Teleport will use the subject key ID of the signer to
generate a CDP that is unique to that signer.

```
ldap://CN=IV4GC3LQNRSSAVDFNRSXA33SORJUWSKE_CLUSTER,CN=Teleport,CN=CDP,CN=Public Key Services,CN=Services,CN=Configuration,DC=example,DC=com
```

In this case, `IV4GC3LQNRSSAVDFNRSXA33SORJUWSKE` is the base32-encoded subject
key identifier of the issuing certificate, and `CLUSTER` is the name of the
Teleport cluster.


#### Publishing: `tctl auth crl --out`

For `tctl` we will follow the precedent set in
[#51298](https://github.com/gravitational/teleport/pull/51298) for `tctl auth
export`, and add a `--out` flag to `tctl auth crl`. This flag will cause `tctl`
to write the CRL(s) (whether there is one or many) to disk rather than to emit a
single CRL to standard out.

Teleport users setting up MS SQL access will then be able to perform a setup
procedure very similar to what they already do, they will only have to repeat
the `certutil -dspublish` command once for each individual CRL.

#### Publishing: add CRLs to the backend `certificate_authority`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just like to see some description of when the auth server generates the CRLs and adds them to the certificate_authority in this design

maybe on auth startup the auth checks each key in each CA that needs CRLs. if it can sign with that key, and there is currently no CRL or the CRL expires in <1 year, it generates a new one and updates the CA in the backend. this will handle adding CRLs to existing CAs without them, and refreshing them semi-regularly

during CA rotation when generating new keys it should also generate and store the CRL

not sure if we'd want to also periodically check/refresh the CRLs or if just at auth startup is enough


In order to maintain the behavior where Teleport's `windows_desktop_service`
publishes CRLs to LDAP on a periodic basis, we need to change the way the agent
requests a CRL from the auth server.

Today, the agent calls an auth RPC that generates a CRL on-demand. The agent has
no control over which signer is used to generate the CRL, since it is totally
dependent on which auth server handles the RPC.

Instead of calling an RPC to get a single CRL, the agent needs to be able to get
a set of CRLs (one for each active signer in the cluster). This means that the
CRLs must be present somewhere in Teleport's cluster state backend and not
generated on-demand.

Since these CRLs are small (they are always empty), we will store the CRL
directly in the `cert_authority` resource. This approach has several advantages:

- Agents already set up a watch to monitor CA resources in order to implement CA rotation,
so the entire resource (including the CRL) should always be present in the agent's cache
and very cheap to look up.
- It avoids a lot of boilerplate code required to add a new resource and set up RBAC, caching, etc.

Instead of calling an RPC to get a CRL, the agent will attempt to pull CRLs from
the `cert_authority`. For backwards compatibility, the agent will fall back to
the original `Teleport` CDP if the CRL is not present in the `cert_authority`
resource. This allows newer agents to maintain today's behavior even when running
against older auth servers.

### UX

The goal of this proposal is to fix Teleport logins on HSM-enabled clusters without making
significant changes to the user experience.

#### Database access

The `tctl auth crl` command will behave exactly like `tctl auth export`, printing a warning
that asks the user to add the `--out` flag if it detects more than one active signer.

We will also update the docs for this feature to ensure that it covers how to
import multiple CRLs, and advise users to repeat the setup process when CA
rotations are performed.

#### Desktop access

From a desktop access perspective, the user experience should not change at all.

Users will have to update both their auth servers and their agents in order to get
HSM support working, but we will take care to ensure that agents and auth can be
updated independently and don't need to update in any particular order.

### Proto Specification

New field will be added to `TLSKeyPair` resource:

```protobuf
bytes CRL = 4 [(gogoproto.jsontag) = "crl"];
```

This field will store empty DER-encoded revocation list for `cert_authorities` that require it.

### Audit Events

A new field will be added to the `certificate.create` audit event which will
hold information about the authority used to create certificate:

```protobuf
message CertificateCreate {
// ...

// CertificateAuthority holds information about creator of certificate
CertificateAuthority CertificateAuthority = 5 [(gogoproto.jsontag) = "certificate_authority"];
}

// CertificateAuthority holds information about creator of certificate
message CertificateAuthority {
// ID is identifier of the cert authority (type and domain)
string ID = 1 [(gogoproto.jsontag) = "id,omitempty"];
// SubjectKeyID is BASE32-encoded subject key ID from authority certificate
string SubjectKeyID = 2 [(gogoproto.jsontag) = "subject_key_id,omitempty"];
}
```
5 changes: 5 additions & 0 deletions rfd/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@
"cdou",
"centralus",
"certbot",
"certutil",
"CEST",
"CGNAT",
"Cieślak",
Expand Down Expand Up @@ -182,6 +183,7 @@
"Doyensec",
"dpkg",
"Dtmfa",
"dspublish",
"dynamodbattribute",
"eastus",
"ECCP",
Expand Down Expand Up @@ -236,6 +238,7 @@
"Ftruncate",
"fullchain",
"fxamacker",
"gabrielcorado",
"gcpxyz",
"germaine",
"getold",
Expand Down Expand Up @@ -373,6 +376,7 @@
"logrus",
"logurs",
"logutils",
"LQNRSSAVDFNRSXA",
"lsall",
"lxea",
"Lytvynov",
Expand Down Expand Up @@ -663,6 +667,7 @@
"someid",
"someroles",
"someuser",
"SORJUWSKE",
"Spacelift",
"Spadolini",
"SPIFFE",
Expand Down
Loading