Conversation
56b590c to
a0fc596
Compare
mbaraniak-doyensec
left a comment
There was a problem hiding this comment.
The overall design looks good to me.
|
|
||
| #### Conclusion | ||
|
|
||
| With the design principles above, the only possible attack would be for the attacker to issue the headless request themselves and trick the user into verifying the request with MFA, with a direct phishing attack. This phishing attack should also be mitigated by notifying the user of the command requested and the permissions that will be awarded upon approval. |
There was a problem hiding this comment.
This phishing attack should also be mitigated by notifying the user of the command requested and the permissions that will be awarded upon approval
I'm still concerned about phishing, I don't think it's a stretch that folks would approve requests without reading them, or get confused by a multitude of requests.
I'm not sure what we could do, so no action required here.
There was a problem hiding this comment.
That's fair. We could remove the "Headless auth requests" page from the proxy and only allow it to be completed from the generated link, keeping it more in line with standard SSO flow. The extra page is just a potential UX improvement for when a user is creating back to back headless authentication requests.
| participant Teleport Proxy | ||
| participant Teleport Auth | ||
|
|
||
| Note over Remote Machine: tsh --headless ssh user@node01 |
There was a problem hiding this comment.
Did we consider requiring --user for headless mode? It's important for the user and request to be tightly coupled and personal env variables in remote machines strike me wrong.
There was a problem hiding this comment.
The user would either need to provide --user or set $TELEPORT_USER. The latter is included in the UX section, but if we have a security concern greater than the UX concern here, we can remove it.
|
|
||
| #### Certificate retrieval | ||
|
|
||
| After the headless login is initiated, the request will wait until local authentication is complete. This will be handled by the Teleport Proxy by using a resource watcher to wait until the `HeadlessAuthRequest` object is updated to the approved or denied state, or it expires. |
There was a problem hiding this comment.
I would double-check with the scalability team, just to be sure.
There was a problem hiding this comment.
Would that be @rosstimothy? Can you give this RFD a brief look when you get a chance? Specifically, are the following approaches scalable:
- Adding 1 minute TTL to
HeadlessAuthenticationresource- I know we've had issues with too many expiration trackers for Session Trackers in the past.
- Watching for
HeadlessAuthenticationupdates from Proxy -> Auth
The primary concern here is that /webapi/login/headless is unauthenticated and subject to DoS attacks, though it will be rate limited. Both the creation and watch of HeadlessAuthentication occur before the end user has been authenticated.
|
|
||
| 1. **The Client does not write any keys, certificates, or session data to disk on the remote machine** | ||
|
|
||
| When using headless authentication, Teleport clients will generate a new private key in memory for the lifetime of the client. This private key will not be exported anywhere. Only its public key counterpart will be shared externally to get user certificates. Likewise, the user certificates will be held in memory for the lifetime of the client. |
There was a problem hiding this comment.
This is neat, I wish there was a way to run tsh without ever writing credentials to disk, without necessarily reducing the scope of the credentials either.
There was a problem hiding this comment.
tsh --add-keys-to-agent=only is the closest we have today. But also, we could simply add a flag to use MemClientStore. I already made all (most?) of the changes necessary for this to work in #19420.
There was a problem hiding this comment.
I'll take a look. 👍
The original design for tsh device enroll said it was supposed to work like headless mode if the user wasn't already logged in, the idea being that during IT-driven manual enrollment we wouldn't leave powerful creds behind. I never did get to that, and also plans changed a bit, but it sounds like I could leverage what you've done to get that feature easily.
(Btw, tsh device enroll may be a good candidate for --headless.)
|
|
||
| Due to the implementation complexity of limiting certificates permissions by requested command, it will not be included in the feature's preview release. | ||
|
|
||
| Additionally, some details, like how the server will encode the permissions into the certificate, are currently missing. This will be addressed in a follow up PR before the official release of this feature, accompanied by an RFD update. |
There was a problem hiding this comment.
This is the opposite of why we do RFDs. Prototyping is fine and you should do it, but the design should come before the end product.
There was a problem hiding this comment.
Agreed, I would prefer not to do it this way. For background, we are on a tighter timeline than usual for this feature, so some shortcuts are required.
There was a problem hiding this comment.
It sounds like you're still not sure how some parts of this are going to work, and that's why the detail is currently missing from the RFD.
You have to figure out these missing pieces either way, so I fail to see how discovering the unknowns now and updating the RFD is any faster then discovering them later and updating the RFD after. From my point of view, it seems both options are roughly the same amount of work but one of them allows the team to help you evaluate the design decisions early and one of them doesn't. The former is much less risky, no?
There was a problem hiding this comment.
We will keep thinking about the design of this particular requirement (limiting certs scope) and definitely solidify its design before starting its implementation. It is not in scope for the initial release so for now @Joerger let's please push it out of scope of the RFD so we don't block the implementation. cc @jakule
I agree with you Alan that this can be built as a separate subsystem. The more I think about it the more it seems to me that supporting something like this will require us to introduce some concept of "permission boundary" (or similar) in Teleport RBAC which will likely require a whole other research and a separate RFD.
jentfoo
left a comment
There was a problem hiding this comment.
I don't see any security risks in this design, but I did comment on the value of the auth token used here. I still lean towards doing this enforcement through the certificate, and just making sure that mechanism is well understood and tested.
|
|
||
| Note: The server can also encrypt the certificates with the client's public key. Since we don't currently do this for other login endpoints, we will omit this from the initial design. | ||
|
|
||
| Additionally, headless login will be handled by the Teleport Proxy endpoint - `/webapi/login/headless`. In this endpoint, the Proxy will make multiple Auth server API requests to initiate headless authentication and receive certificates. To prevent attackers from hijacking a headless auth request directly from the Auth server, The Proxy will generate a token for the auth request and pass the token in the initial headless auth request creation. The Auth server will keep this token a secret and authorize the latter API requests using this token. |
There was a problem hiding this comment.
Since the certs still require the client's private key with the PKCE-like flow, this isn't strictly necessary, so we could remove this.
After further consideration I agree with you, I am not sure if this token is really buying us much. @Joerger let me know if I am missing anything, but I believe the token was introduced to try and enforce we are talking to the same client consistently (as discussed using an alternate strategy here: #21005 (comment)).
This token does provide an additional secret that would need to be compromised. However it's hard to imagine any scenario where this token would remain secure, but the clients in-memory private key could be exposed.
I think it would be better to rely on the certificate security, specifically such that:
- This control becomes the primary enforcement:
The client provides their public key in the login request, and the server issues certificates for the client's public key. If the client does not have access to the corresponding client private key, they cannot use the certificates.
It sounds like we have an existing pattern we can use here? I was suggesting earlier we encrypt the responding certificate using the public key to enforce private key ownership. However it sounds like we are using a different mechanism? Are we chaining the certificates or can you help fill me in the details for how this is enforced?
GenerateUserCertscurrently requires the token for authentication, but also the previous state to be established. I believe the token auth can be removed as long as we are careful about making sure that established state is exactly as expected (ie setup and not already used).
TL:DR; If we can outline more about how the certificate generation is done, I feel comfortable removing this token from the spec.
…of CreateHeadlessAuthRequest and GenerateUserCerts endpoints * Remove CreateHeadlessAuthRequest rpc * Remove token and other unneeded fields from headless authentication
fspmarshall
left a comment
There was a problem hiding this comment.
Unauthenticated actions being able to create backend resources is very concerning. A few extra tens of req/s can be enough to have a catastrophic effect on some teleport clusters.
In my opinion, one of two things needs to happen for this to not be a significant attack vector:
-
We apply a very very strict global/cluster-wide rate-limit (no more than 3-4 req/s total, regardless of source IP), and add a background process that actively deletes old requests (ttl can take up to 48 hours to clean up).
-
We rework this feature so that it does not permit an unauthenticated write, instead inducing the first write during the callback/approval step.
I strongly prefer option 2, though I recognize that I may be overlooking some requirement of the feature that makes option 2 impossible. Regardless, I've added a rough sketch of how I think option 2 could work below. Its the first thing that came to mind, so might be some issues, but fundamentally I think it aught to be possible.
Instead of encoding parameters in a request in the backend and using an ID for callback/approval, we instead encode a token that includes both a random element, and any fields that we want to assert (e.g. an ssh login).
Replace /login/headless with a streaming endpoint of some kind s.t. we can return the callback/approval token immediately, and then return a second message over the same pipe if we see a matching approval within some time limit.
The unauthenticated client prints the callback with the encoded token, holding open the stream. The token is only good for that stream (i.e. intercepting the token doesn't let a different client perform headless-auth).
Server-side, we allow a fixed number of in-flight headless login attempts to exist (this can be much higher than the req/s limit from option 1 since it doesn't incur a backend write unless the user auths).
If the user follows the callback, parameters are decoded from the token and displayed like normal. If they choose to approve, then we do our first backend write.
The unauthenticated client's headless login process is "unblocked" if an approval with a matching token is observed within an appropriate timeout, and the rest of the flow happens as described in the RFD as-is.
I'd also suggest that instead of allocating resource watchers (which are big and expensive), unauthenticated requests that need to be woken up on resource creation might be better off relying on an efficient waker/notifier that allows a single resource watcher to unblock all requests as appropriate. Ex:
struct InFlightHeadlessLogins {
pending [1024]struct{
chan Approval
token string
}
}|
@fspmarshall Thanks for the review and suggestion. I agree that a design solution is better than a stricter limiting solution.
I want to avoid changing this to a streaming / multi-part endpoint if possible, in order to limit complexity and fit into a simplified authentication future. But if it's necessary we can add it. Here's another idea I'd like to run by you. In the current design, the Auth Server actually handles the |
Sounds reasonable. My thinking in using a streaming endpoint was that it would be an easy way to ensure that the token was not transferrable between clients without needing to encode the public key in the token (to avoid needing the callback token to become massive). Just limit it so that only the TLS session that created the token could get certs with it. We could definitely solve this in other ways (or just accept big tokens).
We'd need the approval to be written to the backend, because you can't guarantee that the auth server handing the Something like this should work: type InFlightHeadlessLogins struct {
mu sync.Mutex
pending [1024]struct {
ch chan Approval
token string
}
}
func (h *InFlightHeadlessLogins) Wait(ctx context.Context, token string) (Approval, error) {
ch := make(chan Approval, 1)
idx := -1
h.withLock(func() {
for i := range h.pending {
if h.pending[i].ch != nil {
continue
}
h.pending[i].ch = ch
h.pending[i].token = token
idx = i
return
}
})
if idx == -1 {
return Approval{}, fmt.Errorf("too many in-flight headless login requests")
}
defer h.withLock(func() {
h.pending[idx].ch = nil
h.pending[idx].token = ""
})
ctx, cancel := context.WithTimeout(ctx, time.Second*30)
defer cancel()
select {
case approval := <-ch:
return approval, nil
case <-ctx.Done():
return Approval{}, ctx.Err()
}
}
func (h *InFlightHeadlessLogins) Notify(token string, approval Approval) {
h.withLock(func() {
for i := range h.pending {
if h.pending[i].token != token {
continue
}
select {
case h.pending[i].ch <- approval:
default:
}
}
})
} |
|
@fspmarshall I did some workshopping on your solution. Here's a new diagram of the flow I have in mind: sequenceDiagram
participant Local Machine
participant Headless Machine as Remote (Headless) Machine
participant Teleport Proxy
participant Teleport Auth
Note over Headless Machine: tsh --headless ssh user@node01
Note over Headless Machine: generate request token and print URL<br/>proxy.example.com/headless/<token>/approve
par headless client request
Headless Machine->>Teleport Proxy: POST /webapi/login/headless
Teleport Proxy->>Teleport Auth: POST /:version/users/:user/ssh/authenticate
Teleport Auth ->>+ Backend: wait for backend insert /headless_authentication/<token>
and local client request
Headless Machine-->>Local Machine: user copies URL to local browser
Note over Local Machine: proxy.example.com/headless/<id>/approve
opt user is not already logged in locally
Local Machine->>Teleport Proxy: user logs in normally e.g. password+MFA
Teleport Proxy->>Local Machine:
end
Local Machine->>Teleport Auth: rpc GetHeadlessAuthentication (token)
Teleport Auth ->> Backend: insert /headless_authentication/<token>
par headless client request
Backend ->>- Teleport Auth: unblock on insert
Teleport Auth ->> Backend: upsert /headless_authentication/<token><br/>{public key, user, ip}
Teleport Auth ->>+ Backend: wait for state change
end
Teleport Auth->>Local Machine: Headless Authentication details
Note over Local Machine: share request details with user
Local Machine->>Teleport Auth: rpc CreateAuthenticateChallenge
Teleport Auth->>Local Machine: MFA Challenge
Note over Local Machine: user taps YubiKey<br/>to sign MFA challenge
Local Machine->>Teleport Auth: rpc UpdateHeadlessAuthenticationState<br/>with signed MFA challenge response
Teleport Auth ->> Backend: upsert /headless_authentication/<token><br/>{public key, user, ip, state=approved, mfaDevice}
and headless client request
Backend ->>- Teleport Auth: unblock on state change
Teleport Auth->>Teleport Proxy: user certificates<br/>(MFA-verified, 1 minute TTL)
Teleport Proxy->>Headless Machine: user certificates<br/>(MFA-verified, 1 minute TTL)
Note over Headless Machine: Connect to user@node01
end
The main changes I made to your design are:
Edit: since the request token is not a secret and does not have any security implications, we will instead just use a request UUID, as is the case in the current design. |
* Don't insert backend data without authenticaion * Remove view headless requests page * Update diagram
8e8bb86 to
566cc60
Compare
|
@Joerger Seems reasonable overall. More round-trips this way, but this doesn't strike me as the kind of operation where we really care about an extra round-trip or two. One concern tho:
If we go this route, I think we need a story for how to prevent "capture" of the request by another malicious client (e.g. if I'm providing technical support over zoom or session joining and see your terminal during a headless login, I shouldn't be able to just copy the token and registering my own competing request under the same token). In the initial proposal, malicious clients couldn't intercept an in-flight request in this manner because the request was written to the backend with the ephemeral public SSH key before the request ID was printed to the terminal. In the stream-based proposal, the server picked the random value associated with the stream, avoiding the issue that way. If the client generates the token and the request parameters (including public ssh key) aren't written to the backend until after the user clicks the link, there is a time window where other clients can "race" to register with the same token and potentially get their request through instead. The simplest solution that comes to mind is to just have the printed token/request-id be derived from the hash of another "secret" token that gets passed as an additional parameter to REQUEST_ID="$(echo "teleport-headless-login:${SECRET_TOKEN}" | openssl dgst -sha3-256 -r - | awk '{print $1}')"That should be sufficient to let auths distinguish which headless requests actually come from the client that generated the ID (and therefore correspond to the client controlled by the authenticating user). It is a bit hacky, so I'd run it by someone more security-minded tho. I'm sure there's a more dignified way to solve this problem. |
I was also thinking about this issue last week, my best idea was to derive the request ID from the user's public key with |
jentfoo
left a comment
There was a problem hiding this comment.
Most recent changes look good to me. I agree that this new design is worth the complexity to reduce the risk of unauthenticated garbage / denial of service risks.
I also agree incorporating the public key as a portion of the UUID is a good strategy.
* Draft UX section. * Complete draft. * Minor edits. * Address comments, polish. * Condense headless login request into a single HTTP endpoint. * Update security section for limited certificate permissions. * Address doyensec comments. * Update RFD. * Remove certificate limitation from RFD scope; Add RFD number; smaller edits. * Small fixes. * * Update auth flow to use auth.AuthenticateSSHUser endpoint instead of CreateHeadlessAuthRequest and GenerateUserCerts endpoints * Remove CreateHeadlessAuthRequest rpc * Remove token and other unneeded fields from headless authentication * * Add resource watcher section * Don't insert backend data without authenticaion * Remove view headless requests page * Update diagram * Use the client's public key to derive a request ID.
* RFD 105 - Headless Authentication (#21005) * Draft UX section. * Complete draft. * Minor edits. * Address comments, polish. * Condense headless login request into a single HTTP endpoint. * Update security section for limited certificate permissions. * Address doyensec comments. * Update RFD. * Remove certificate limitation from RFD scope; Add RFD number; smaller edits. * Small fixes. * * Update auth flow to use auth.AuthenticateSSHUser endpoint instead of CreateHeadlessAuthRequest and GenerateUserCerts endpoints * Remove CreateHeadlessAuthRequest rpc * Remove token and other unneeded fields from headless authentication * * Add resource watcher section * Don't insert backend data without authenticaion * Remove view headless requests page * Update diagram * Use the client's public key to derive a request ID. * Add HeadlessAuthentication protobuf type and Resource implementation. (#22350) * Add headless auth preference logic. (#22148) * Add Headless Authn backend service. (#22553) * Headless Login: add headless authentication resource watcher (#22699) * Add headless authentication resource watcher. * Handle OpInit event and Watcher errors. * Headless Login: proxy server changes (#22734) * * Add proper context handling to auth.AuthenticateUser. * Move PublicKey field to AuthenticateUserRequest where it can be used for actual authentication. * Use a simple switch statement in /webapi/ssh/certs logic to switch between password, otp, and eventually headless login. * Add Headless flow to /webapi/ssh/certs login enpdoint. * Add 3 minute callback timeout. * Headless Login: protobuf service (#22750) * Add Headless Authn proto server. * Add Headless Authn proto client. * Resolve comments. * Headless Login: tsh implementation (#22751) * * Implement tsh --headless * Implement tsh headless approve * Add better headless authn state handling. * Add godoc for new tsh field. * Headless login: Mlockall (#23159) * Use Mlockall for Headless login. * Skip memory lock on unsupported OSs. Resolve comments * Headless Login: auth server changes (#22726) * Add Headless Authn service. * Add/fix 3 minute headless login timeout. * * Prevent repeated updates to headless authentication state * Prevent user lock out from headless authentication failure * Delete headless authentication on failed attempts * Add auth_with_roles test. * Extend timeout in test to reduce flakiness. * Fix error typo. * Add context timeouts, remove initial GetHeadlessAuthentication call. * Resolve comments. * Move http client to it's own file; Add ability to clone HTTP client for per-request configuration changes. * Fix flaky test. * Remove shared state from test. * Update error handling and testing for auth_with_roles. * Fix rebase misshap. * Fix race condition in test. * update e ref * Fix ctx missing. * Extend test timeout to prevent flakiness. * Fix issue with roundtrip.ClientParams not being applied due to roundtripper wrapping. --------- Co-authored-by: Tim Ross <tim.ross@goteleport.com> * Extend context timeouts in TestHeadlessAuthenticationWatcher tests to reduce flakiness. (#23160) * Fix flaky test due to context deadline. (#23260) * Fix headless login with `second_factor: on | optional` (#23271) * Fix headless login with second_factor:on|optional. * Update ssh/certs endpoint to only configure necessary authentication fields; clarify comments; update test to cover headless authenticaiton preference. * Update test to cover user locking logic. * Change generic headless error. (#23331) * Headless SSO web endpoint and UI (#22914) * Update UI Update UI text Update the code to add headless request get Remove commented code Added simple UI and endpoints * Update UI Implement reject SSO handler and UI * Fix linter issues * Fix more linter issues * Fix UI tests * Use url.JoinPath. * center spinner on the page and animate it. * Address code review comments * Address code review comments * Renamed React component * Address PR comments --------- Co-authored-by: joerger <bjoerger@goteleport.com> Co-authored-by: Jeff Pihach <jeff.pihach@goteleport.com> * Fix flaky test `TestHeadlessAuthenticationWatcher` (#23417) * Fix race condition in test by using a helper function instead of complex channel mechanisms. * Avoid creating new methods solely for testing; resolve other comments. * Reuse more code; resolve other comments. * Fix race condition that could cause a new watcher to be marked as stale before the channel is consumed; Fix minor test issues. * Remove race condition on headless authentication expires field when (#23578) using memory storage. * Headless Authn: documentation (#23272) * Add docs. * Update docs/pages/access-controls/guides/headless-login.mdx Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com> * Fix lint error. * Ellaborate on how headless login differs from standard login. * Resolve comments; Fix capitalization. * Resolves comments. * Add cli reference docs. * Restructure guide; Remove scoped blocks; Update descriptions; resolve other comments. * Make configuration options/alternatives collapsible; Fix typos. * Fix file names, titles, and make new config details begin as closed. * Fix hidden merge conflict. * Add line breaks. * Fix dead link. --------- Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com> --------- Co-authored-by: Tim Ross <tim.ross@goteleport.com> Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> Co-authored-by: Jeff Pihach <jeff.pihach@goteleport.com> Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
No description provided.