Generate user login state from access lists and integrate into certificates.#29364
Generate user login state from access lists and integrate into certificates.#29364
Conversation
|
@mdwn is there an RFD or parent issue I can read? I feel like I'm missing the context here. |
fspmarshall
left a comment
There was a problem hiding this comment.
As I understood the original discussion about this feature, user login state was intended to be a complete representation of the "post-mapping" user-state that we could use as a drop-in replacement for the user resource as-appropriate. As currently implemented, the various cert-generation methods are still loading the user resource as the primary source of truth, and then using the login state to extend that. So currently we end up with a pattern like:
user, _ := a.GetUser(req.Username, false)
// ...
accessInfo := services.AccessInfoFromUser(user)
// ...
checker, _ := services.NewAccessChecker(accessInfo, ...)
// ...
traits, roles = addUserLoginStateToTraitsAndRoles(uls, req.traits, req.checker.RoleNames())
// ...etcWhich means that we're still mixing pre-login and post-login state (e.g. the access checker is initialized with your pre-mapping state).
I don't think this is how we should be going about it. I think we'd be better off migrating to the login state as the new source of truth, and tweaking things like AccessChecker to work with an interface that abstracts over the different between user state and user. Then we could do something like this (oversimplified):
func getUserOrLoginState(name string) UserStateInterface {
if uls, _ := getUserLoginState(name); uls != nil {
return uls
}
return getUser(name)
}
// ...
accessInfo := services.AccessInfoFromUserStateInterface(getUserOrLoginState())
// remaining logic is totally identical to what we had beforeBasically, I feel like it's going to cause issues if we sometimes work from the pre-mapping state and sometimes from the post-mapping state. Much better to work exclusively with the post-mapping state, and fall back to the pre-mapping state of the post mapping state does not exist (e.g. due to login having been handled by an older auth server).
WDYT? Does this sound reasonable/doable?
I think this sounds pretty reasonable. I don't think it'll be significantly more work than what I've already got here, it'll just involve starting with the "user" as the base user login state and then going from there. I'll update when I've made these changes. |
b26e9b8 to
4636754
Compare
…icates. On login, the user login state will be generated, using access lists to register additional roles and traits that will be inserted into the user's certificate. Tests have been added to exercise this as well.
…state comprises the whole state as opposed to a mix.
94fe0cf to
682e920
Compare
|
@Tener @espadolini @fspmarshall This should be good to review now, I've worked through the kinks here. |
| } | ||
| } | ||
|
|
||
| // getUserOrLoginState will return the given user or the login state associated with the user. |
There was a problem hiding this comment.
Is there a legitimate case for the user not to have user logins state?
If not, we could force the user to re-login by deleting the user login state.
There was a problem hiding this comment.
That's a question for @fspmarshall, I think -- possibly remote users?
There was a problem hiding this comment.
Certificates issued directly with tctl auth sign (e.g. certs for a bot) won't have an associated login state. There may be other cases as well.
…icates. (#29364) * Generate user login state from access lists and integrate into certificates. On login, the user login state will be generated, using access lists to register additional roles and traits that will be inserted into the user's certificate. Tests have been added to exercise this as well. * Cache user login states, filter roles that aren't in the backend. * Small refactor. * Optimize RPC calls, test merge login in auth.go more thoroughly. * Warn when role is missing. * Update so access info uses the user login state directly, user login state comprises the whole state as opposed to a mix. * Logic tweaks to restore tests. * Integrate user login state cache. * Swap out get user for get user state where applicable. * Revert unrelated debug change. * Add in missing err check. * Further replacing with user state. * Revert changes to helpers to try to get integration tests working. * Revert "Revert changes to helpers to try to get integration tests working." This reverts commit 682e920. * Add in user type to generator. * Use supplied user for generating SSH certs.
…icates. (#29364) (#30628) * Generate user login state from access lists and integrate into certificates. On login, the user login state will be generated, using access lists to register additional roles and traits that will be inserted into the user's certificate. Tests have been added to exercise this as well. * Cache user login states, filter roles that aren't in the backend. * Small refactor. * Optimize RPC calls, test merge login in auth.go more thoroughly. * Warn when role is missing. * Update so access info uses the user login state directly, user login state comprises the whole state as opposed to a mix. * Logic tweaks to restore tests. * Integrate user login state cache. * Swap out get user for get user state where applicable. * Revert unrelated debug change. * Add in missing err check. * Further replacing with user state. * Revert changes to helpers to try to get integration tests working. * Revert "Revert changes to helpers to try to get integration tests working." This reverts commit 682e920. * Add in user type to generator. * Use supplied user for generating SSH certs.
There was a problem hiding this comment.
git bisect tells me that this introduced a regression in Connect My Computer that was not caught by existing tests and I don't understand why.
In general, when you start Connect My Computer setup (instructions in the description of #30905), Connect creates a new role in the cluster (CreateConnectMyComputerRole RPC) and assigns it to the current user. At this point, we have to refresh the role list in local certs. To achieve this, we do this:
teleport/lib/teleterm/services/connectmycomputer/connectmycomputer.go
Lines 181 to 199 in 0ed045a
This used to update the certs on disk with the current role list from the backend. It's the same operation that tsh request drop does when you tell it to drop a certain access request.
After this is done, the Electron app calls tshd's GetCluster RPC which includes the role list read from the cert on disk. I noticed that it now doesn't get updated and git bisect led me to this PR. tsh status run from inside Connect also doesn't return the new role.
What's weird is that there's an integration test which calls CreateConnectMyComputerRole. It specifically checks if the certs on disk include the new role:
teleport/integration/teleterm_test.go
Lines 546 to 555 in 1294516
The test passes just fine. It's the "role does not exist" test from those table tests above.
I also added a test for GenerateUserCerts which also passes:
teleport/lib/auth/access_request_test.go
Lines 633 to 638 in 1294516
Given the changes in this PR, do you have any clue as to why this would update the cert in the integration test but not in the app?
There was a problem hiding this comment.
I suppose this is related to the change in lib/auth/auth_with_roles.go.
There was a problem hiding this comment.
We have reason to believe this may also have caused a regression for Machine ID. Looking into it.
There was a problem hiding this comment.
#30978 doesn't fix the regression in Connect My Computer, unfortunately.
There was a problem hiding this comment.
Ok, so I see the regression happened because Connect My Computer depended on the following interactions:
- Log in with roles foo and bar.
- Create a new role baz and assign it to the current user.
- Drop a bogus access request (== refresh the role list based on the current backend state).
Now it doesn't work like this, because at point 3 the role list is not updated based on the backend state, but rather on the "cached" login state from 1.
We used the hack with dropping a bogus access request because we wanted to have a way to refresh the role list without asking the user for credentials again. This is the same behavior that tsh request drop needed.
However, with the login state in place, it's in fact better (more secure) for tsh request drop to use the login state rather than backend state. This doesn't work for Connect My Computer because we need fresh roles.
There was a problem hiding this comment.
One thing I still don't understand is how it slipped through the integration test, as I mentioned in my original comment. I suppose it's an issue with some kind of cache. ;)
There was a problem hiding this comment.
Hrrrm, it definitely wasn't intentional to change this behavior. Where exactly did I change that here? IMO we should revert that line, because IMO I'd rather keep the existing UX and improve it later than have a workaround.
There was a problem hiding this comment.
It's the change in lib/auth/auth_with_roles.go. We used to build accessInfo based on the current backend state of the user, this PR made it so that it's based on the login state instead.
Reverting it back to accessInfo = services.AccessInfoFromUser(user) fixes the regression.
Do you have any clue as to why this wasn't caught in the integration test? Since the integration test doesn't go through the normal login procedure, I suppose login state was empty and thus accessInfo was calculated from the user resource, right?
There was a problem hiding this comment.
Yep, that would make sense. I think you're right, the user state was probably just empty.
There was a problem hiding this comment.
I made a PR which reverts this change in lib/auth/auth_with_roles.go. If you feel that we should revert it, then I say let's do it.
It'd be good to get this merged before the test plan starts, but this might be quite tight.
On login, the user login state will be generated, using access lists to register additional roles and traits that will be inserted into the user's certificate.
Tests have been added to exercise this as well.
Relevant section in the RFD.
Small low key benchmark: