Conversation
6ee069e to
793374f
Compare
zmb3
left a comment
There was a problem hiding this comment.
Just some general comments - I haven't thought enough about the client options to have a strong preference yet.
|
See also #7833, which should be closed as part of this work. |
| 1. Create an Auth API client using the user's existing valid certificates. | ||
| 1. Use the client to retrieve an MFA challenge for the user with | ||
| the existing `rpc CreateAuthenticateChallenge`. | ||
| 1. Prompt the user to solve the MFA challenge with their MFA device. |
There was a problem hiding this comment.
Has it been considered how to validate that the action the MFA was conducted for is the action it's being applied against?
There was a problem hiding this comment.
This could be done by adding additional parameters to rpc CreateAuthenticateChallenge, defining the action being permitted. Then during MFA verification, the Auth server would check that the corresponding MFA challenge request stored in the backend matches the action being requested.
However, I'm not sure I can see any security benefit in this. Unlike with MFA-verified certs, this MFA challenge can only be used to authenticate a single request. We don't have the same issue to solve here of preventing an MFA-verified cert from being reused for multiple / different requests.
If we are concerned about the client prompting MFA for one thing but using it for another, I'm not sure how we could avoid this. The MFA challenge/response cycle is ultimately in the purview of the client. The Auth server can only reliably verify that the client tapped their MFA key. It cannot verify that the user tapped their MFA key with the intention of permitting a specific request. To avoid this, I think we would need to have some system where the user verifies the request being made directly with the Proxy/Auth server (WebUI), similar to headless, but this seems unnecessary/overkill.
What do you think? Is there a different concern that I am missing?
There was a problem hiding this comment.
However, I'm not sure I can see any security benefit in this
It's admittedly a very limited value. The concern would be somehow an attacker gets access to an unused MFA challenge and uses it for another purpose. Though granted, in any case I can consider where that's possible it's probably equally easy to just lie to the user and convince them to MFA something they don't realize (just as you mentioned).
That said, unless it's technically prohibitive for any reason I still think we should make this improvement. Not just for the risk above, but also for potential auditing around how MFA was used.
What do you think?
There was a problem hiding this comment.
My 2c is that for the "one-shot" challenges we are OK without encoding the purpose. It's a nice follow up, but not a must-have/MVP requirement.
There was a problem hiding this comment.
@jentfoo Looking back on this, I do agree it's a good idea.
I'm actually running into an unforeseen issue in the implementation with "bulk" admin actions that perform several API requests. For example, adding a users calls CreateUser and CreateResetPasswordToken. In these cases, we have to prompt for MFA twice one after the other. Ideally, we would only need to prompt for MFA once for a single "action", even if it involves multiple requests.
I'd like to get some initial thoughts on the 2 potential solutions laid out below before updating the RFD.
Action Scoped MFA verified certs
One solution is to fall back to using MFA verified certs for these edge cases. However, as stated previously, this would require us to make the certificates scoped to the specific actions.
I think for an MVP this would not be hard to implement. We could just encode a list of API requests which are allowed into the cert. Then on the server, we could check that the request being authorized is in the list.
Action Scoped MFA challenge
Another similar solution is to encode the API requests into the MFA challenge request. Then we could pass the same MFA challenge response to each API request, and the server would check that the API request list in the challenge contains the API request being authorized.
However, the underlying change needed here is to let the MFA challenge exist beyond the first validation, instead expiring after a short duration (30s?) to allow subsequent requests of the same admin action to validate MFA. Since the MFA challenge is scoped, I think this may be a viable tradeoff. From a security perspective, I don't think this is any better/worse than MFA verified certificates that allow reuse.
I prefer this solution as it avoids the "two-shot" approach retrieving MFA verified certificates. @codingllama it looks like this is feasible from an implementation perspective, but IYO does it go against MFA principles too strongly, moreso than encoding MFA verification into certs for reuse?
C.C. @rosstimothy
There was a problem hiding this comment.
I don't think that letting the challenges live a bit longer is a problem, they could already live until timeout for certain scenarios. Also, we can still delete certain kinds of challenges on the spot (ie, "purposeless" or login/authn challenges).
For challenge persistence we have a few other alternatives, like passing a signed/timestamped JWT-like token to the client and using it as "currency", or using an altogether different key to store this info. (I'll have to think on this a bit more to have a stronger opinion.)
For the challenge purpose, I'd rather encode the purpose in the challenge than a list of endpoints, so we have a bit more future flexibility. I also think of the main questions here is how we can tell that the challenge's purpose is fulfilled.
There was a problem hiding this comment.
For the challenge purpose, I'd rather encode the purpose in the challenge than a list of endpoints, so we have a bit more future flexibility.
The way I am thinking about this, an admin action from the client's perspective one of:
- A single admin action endpoint request
- A string of admin action endpoints requests
So too clarify, I meant that the client will provide a list of endpoints that it wants to call. This way, the client has the flexibility to add/remove endpoints that it needs for an admin action.
I also think of the main questions here is how we can tell that the challenge's purpose is fulfilled.
I think we can avoid this complexity by just making them expire after 30 seconds. An admin action from the client perspective can be limited to a string of MFA-verified requests being performed back to back in quick succession.
If some action has a long pause for some reason, like if we added a tctl edit roles which loops through all roles for edits, then I think it is reasonable to require re-verification with MFA. I don't know of any admin actions like this currently.
For challenge persistence we have a few other alternatives, like passing a signed/timestamped JWT-like token to the client and using it as "currency", or using an altogether different key to store this info. (I'll have to think on this a bit more to have a stronger opinion.)
IMO just letting the client re-pass the MFA challenge response instead of having it exchange it for a token is simpler and avoids the extra round trip. I agree that this could be an interesting/useful route at some point though.
I also don't think we need to be concerned with the mfa challenge stored in /web/users/<user>/webauthnsessiondata/login being overwritten, since we can require a client to complete an admin action before continuing to another one. Or since the admin action is defined on the client side, the client probably just needs to combine the admin actions and make an MFA challenge that accommodates both.
There was a problem hiding this comment.
@Joerger I am happy to hear you're considering this further. I still think this would be a valuable improvement.
With regards to encoding the endpoints or a "action" / "scope", I feel either are acceptable. Endpoints seem precise initially, but the actual action entirely depends on the services implementation of the endpoint. Similar if we encode a "scope" into the cert / mfa request, it is just dependent on the services implementation of what is allowed for that scope.
For that reason I have historically used concise scopes to represent a series of endpoints to keep client side changes minimal and certificates small.
There was a problem hiding this comment.
I think a "scope" is better than a list of endpoints, so we can decide what that scope means server-side (without client changes and without giving that power to clients).
There was a problem hiding this comment.
I'm still not sure I fully see the benefit of server-side scopes for client constructed admin actions. If we add scopes, the power still ultimately lies with the client to decide which scope to use.
For example, how would we scope tctl create -f resources.yaml, where resources.yaml contains several different resource types? Would we just add a tctl create scope that covers update/create for basically all resources? If we add this, couldn't the client just always use the tctl create scope to cover almost any possible admin action?
If we don't add the tctl create scope, we would need to request new MFA for every resources in resources.yaml, which admittedly may be an acceptable limit.
However, if we want to make scopes server-side anyways, it would make more sense to add new endpoints instead of MFA scopes. For example, instead of adding a new-user scope to call CreateUser and CreateResetPasswordToken, we could just add a NewUser endpoint which does both. This moves defining "scope" to the responsibility of the gRPC service endpoints instead of a new not-flushed-out scope concept. Client and server side changes are necessary in both cases.
I don't dislike the idea of scopes in general, I just don't think it solves the issue I'm trying to solve. I think just adding a few scopes like "login" and "session-mfa" would be a nice upgrade from our current situation. We could also just add an unconstrained "admin-action" scope for the time being. This would prevent "login" mfa challenges from being used for "admin-actions" and vice versa, and we could continue discarding MFA challenges for the "login" and "session-mfa" scopes. And in the future we could constrain certain endpoints with new scopes.
FYI I'll formalize this into an RFD update this week, thanks for your initial thoughts and any more feedback you have.
| 1. Create an Auth API client using the user's existing valid certificates. | ||
| 1. Use the client to retrieve an MFA challenge for the user with | ||
| the existing `rpc CreateAuthenticateChallenge`. | ||
| 1. Prompt the user to solve the MFA challenge with their MFA device. |
There was a problem hiding this comment.
My 2c is that for the "one-shot" challenges we are OK without encoding the purpose. It's a nice follow up, but not a must-have/MVP requirement.
rosstimothy
left a comment
There was a problem hiding this comment.
Can you add a backward compatibility section? Are we planning to outright break anyone trying to use an older version of tctl on a newer version of teleport?
Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
* Update implementation option choice to option 1. * Replace retry-on-error idea with inferring requirement from ping.cluster_auth_preference.admin_action_mfa setting. * Add suggestions from reviews.
44f7c87 to
aa8c0da
Compare
…; explain MVP plan.
jentfoo
left a comment
There was a problem hiding this comment.
Generally looks good. My only question is if we can further limit how much MFA bypass is allowed.
|
@r0mant && (@klizhentas || @xinding33) Please review when you get the chance. |
r0mant
left a comment
There was a problem hiding this comment.
Overall lgtm, I think the last remaining bits are to figure out where to tuck in the bypass option and cover product metrics,
| ... | ||
| } | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Can you also add section about product metrics and observability? I.e. how are we going to be able to tell whether people are using this feature? Can we emit a Posthog event in the Cloud every time someone tries to perform administrative action and has to tap the key?
bae1390 to
0b992fb
Compare
klizhentas
left a comment
There was a problem hiding this comment.
Let's explore if we can remove two extra knobs. See my comments.
| ### Server configuration | ||
|
|
||
| The requirement for MFA on admin actions will be made configurable through | ||
| `cluster_auth_preference.admin_action_mfa: off | on`. |
There was a problem hiding this comment.
Is it possible to avoid adding extra knob? For example, in v14 always make it optional without extra know for backwards-compatibility. In V15, make it a requirement.
| name: access-request-plugin | ||
| spec: | ||
| options: | ||
| robot: true |
There was a problem hiding this comment.
Can we remove this extra knob as well? Certs issued by MachineID's non-interactive users should not require per session MFA for administrative actions if allowed by roles. All interactive clients should either opportunistically require in v14 or mandatory require in v15.
There was a problem hiding this comment.
Certs issued by MachineID's non-interactive users should not require per session MFA for administrative actions if allowed by roles.
Great idea. We can skip MFA for impersonated certificates by checking the impersonator certificate extension for a built in role like Bot or Admin.
Bots also require create permissions for roles, users, and tokens, making this method a bit more secure than a basic role option. @jentfoo
We'll also need to update our plugin/API guides to use MachineID, which seems to be our plan regardless:
> tctl auth sign --user=dev
Generating credentials to allow a machine access to Teleport? We recommend Teleport's Machine ID! Find out more at https://goteleport.com/r/machineid-tip
...
* Remove server configuration options in favor of determining MFA requirement from second_factor setting + Server version. * Remove role configuratio options in favor of checking for an Admin/Bot impersonator. * Update backwards compatibility section. Each admin action will take a full major version from its addition before we can start enforcing MFA.
| - Old clients have no way to pass MFA until the `MFAAuthenticateResponse` | ||
| message is included in the proto specification. In order to maintain backwards | ||
| compatibility, each request that we change to an admin action will take a full | ||
| major version before we can actually require MFA for that action. |
There was a problem hiding this comment.
I missed this issue before now. I update the backwards compatibility section to reflect this issue.
Essentially, each admin action will take a full major version before we can start enforcing MFA. For the MVP endpoints, we are planning to have done over Teleport 14+15. But for any endpoints that we add later in Teleport 14 or 15, we won't be able to require MFA until Teleport 16/17.
If we want to fix this, we could use a hybrid of Option 1 and 2. When a request is changed to require MFA, up to date clients and servers will know to pass the MFA response directly in the request. Older clients will make the request without MFA and receive ErrAdminActionMFARequired. If the client is v14+, the client will retry by with MFA verification passed in the request metadata. As a result, as soon as an admin action is required on the server, v14+ clients will be able to meet the requirement.
We could also just switch to Option 2 instead of Option 1 altogether.
WDYT? @r0mant @codingllama @rosstimothy
There was a problem hiding this comment.
Regardless of the approach, I think "new" clients need to know beforehand whether they need additional MFA for a call, so we can avoid the needless roundtrip. Ofc this varies per cluster settings, but we can make an educated guess. (Outdated clients will likely fail and then retry, but that seems alright.)
As for a 1+2) hybrid or just 2), I suppose having a single way to do things is easier to reason about. I'm failing right now to see how outdated clients, under option 1), could deal with ErrAdminActionMFARequired if they have an outdated proto.
There was a problem hiding this comment.
We'll be rolling out automatic upgrades for clients (tsh/tctl) in Q3 so this may be less of a concern but in general I agree that option 2 (having a generic method to passing MFA metadata rather than via API for each individual request) sounds like gives us a more graceful migration path.
There was a problem hiding this comment.
Ok, I think we should start with Option 2 then. It is the easier and more extensible solution. It should be possible to include all or most of the admin actions in Teleport 14, rather than only the current MVP list.
Once it is complete, we can consider going forward with option 1. This would give us a cleaner way to pass MFA in the long run, and for newly added requests. This is especially important for external users of the API as the metadata solution is not forthcoming.
P.S. @codingllama this means we won't be splitting up existing gRPC requests into services that follow best practices, but we will still be making new gRPC services for affected HTTP endpoints.
Add an RFD to require MFA on administrative actions when a user has an MFA device registered.
For example:
Related: #9089, #7833