Skip to content

Generate user login state from access lists and integrate into certificates.#29364

Merged
mdwn merged 16 commits intomasterfrom
mike.wilson/user-login-state-generator
Aug 17, 2023
Merged

Generate user login state from access lists and integrate into certificates.#29364
mdwn merged 16 commits intomasterfrom
mike.wilson/user-login-state-generator

Conversation

@mdwn
Copy link
Copy Markdown
Contributor

@mdwn mdwn commented Jul 20, 2023

On login, the user login state will be generated, using access lists to register additional roles and traits that will be inserted into the user's certificate.

Tests have been added to exercise this as well.

Relevant section in the RFD.

Small low key benchmark:

With login state

AuthenticateUser begin: 2023-07-20 13:35:22.553049 -0400 EDT m=+23.786587251
AuthenticateUser end: 2023-07-20 13:35:22.66828 -0400 EDT m=+23.901817918
AuthenticateUser delta: 115.230667ms
GenerateCert begin: 2023-07-20 13:35:22.668675 -0400 EDT m=+23.902212501
GenerateCert end: 2023-07-20 13:35:22.673191 -0400 EDT m=+23.906728626
GenerateCert delta: 4.516125ms

Without login state

AuthenticateUser begin: 2023-07-20 13:40:13.515064 -0400 EDT m=+12.754441292
AuthenticateUser end: 2023-07-20 13:40:13.636613 -0400 EDT m=+12.875989501
AuthenticateUser delta: 121.548209ms
GenerateCert begin: 2023-07-20 13:40:13.63706 -0400 EDT m=+12.876436501
GenerateCert end: 2023-07-20 13:40:13.641253 -0400 EDT m=+12.880629459
GenerateCert delta: 4.192958ms

With login state + 2 access lists (not a member)

AuthenticateUser begin: 2023-07-20 13:50:17.48219 -0400 EDT m=+126.730266376
AuthenticateUser end: 2023-07-20 13:50:17.595715 -0400 EDT m=+126.843790251
AuthenticateUser delta: 113.523875ms
GenerateCert begin: 2023-07-20 13:50:17.596153 -0400 EDT m=+126.844228043
GenerateCert end: 2023-07-20 13:50:17.60055 -0400 EDT m=+126.848625126
GenerateCert delta: 4.397083ms

With login state + 2 access lists (member)

AuthenticateUser begin: 2023-07-20 14:05:50.527176 -0400 EDT m=+141.383355001
AuthenticateUser end: 2023-07-20 14:05:50.633066 -0400 EDT m=+141.489245043
AuthenticateUser delta: 105.890042ms
GenerateCert begin: 2023-07-20 14:05:50.633598 -0400 EDT m=+141.489777001
GenerateCert end: 2023-07-20 14:05:50.637755 -0400 EDT m=+141.493933918
GenerateCert delta: 4.156917ms

Comment thread lib/auth/auth.go
@Tener
Copy link
Copy Markdown
Contributor

Tener commented Jul 21, 2023

@mdwn is there an RFD or parent issue I can read? I feel like I'm missing the context here.

Comment thread lib/auth/auth.go Outdated
Comment thread lib/auth/auth.go Outdated
Comment thread lib/auth/userloginstate/generator.go
Comment thread lib/auth/userloginstate/generator.go Outdated
Comment thread lib/auth/userloginstate/generator_test.go
Comment thread lib/auth/userloginstate/generator_test.go
Comment thread lib/auth/userloginstate/generator.go
Copy link
Copy Markdown
Contributor

@fspmarshall fspmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood the original discussion about this feature, user login state was intended to be a complete representation of the "post-mapping" user-state that we could use as a drop-in replacement for the user resource as-appropriate. As currently implemented, the various cert-generation methods are still loading the user resource as the primary source of truth, and then using the login state to extend that. So currently we end up with a pattern like:

	user, _ := a.GetUser(req.Username, false)
	// ...
	accessInfo := services.AccessInfoFromUser(user)
	// ...
	checker, _ := services.NewAccessChecker(accessInfo, ...)
	// ...
	traits, roles = addUserLoginStateToTraitsAndRoles(uls, req.traits, req.checker.RoleNames())
	// ...etc

Which means that we're still mixing pre-login and post-login state (e.g. the access checker is initialized with your pre-mapping state).

I don't think this is how we should be going about it. I think we'd be better off migrating to the login state as the new source of truth, and tweaking things like AccessChecker to work with an interface that abstracts over the different between user state and user. Then we could do something like this (oversimplified):

func getUserOrLoginState(name string) UserStateInterface {
    if uls, _ := getUserLoginState(name); uls != nil {
        return uls
    }
    return getUser(name)
}

// ...

accessInfo := services.AccessInfoFromUserStateInterface(getUserOrLoginState())

// remaining logic is totally identical to what we had before

Basically, I feel like it's going to cause issues if we sometimes work from the pre-mapping state and sometimes from the post-mapping state. Much better to work exclusively with the post-mapping state, and fall back to the pre-mapping state of the post mapping state does not exist (e.g. due to login having been handled by an older auth server).

WDYT? Does this sound reasonable/doable?

@mdwn
Copy link
Copy Markdown
Contributor Author

mdwn commented Jul 21, 2023

As I understood the original discussion about this feature, user login state was intended to be a complete representation of the "post-mapping" user-state that we could use as a drop-in replacement for the user resource as-appropriate. As currently implemented, the various cert-generation methods are still loading the user resource as the primary source of truth, and then using the login state to extend that. So currently we end up with a pattern like:

	user, _ := a.GetUser(req.Username, false)
	// ...
	accessInfo := services.AccessInfoFromUser(user)
	// ...
	checker, _ := services.NewAccessChecker(accessInfo, ...)
	// ...
	traits, roles = addUserLoginStateToTraitsAndRoles(uls, req.traits, req.checker.RoleNames())
	// ...etc

Which means that we're still mixing pre-login and post-login state (e.g. the access checker is initialized with your pre-mapping state).

I don't think this is how we should be going about it. I think we'd be better off migrating to the login state as the new source of truth, and tweaking things like AccessChecker to work with an interface that abstracts over the different between user state and user. Then we could do something like this (oversimplified):

func getUserOrLoginState(name string) UserStateInterface {
    if uls, _ := getUserLoginState(name); uls != nil {
        return uls
    }
    return getUser(name)
}

// ...

accessInfo := services.AccessInfoFromUserStateInterface(getUserOrLoginState())

// remaining logic is totally identical to what we had before

Basically, I feel like it's going to cause issues if we sometimes work from the pre-mapping state and sometimes from the post-mapping state. Much better to work exclusively with the post-mapping state, and fall back to the pre-mapping state of the post mapping state does not exist (e.g. due to login having been handled by an older auth server).

WDYT? Does this sound reasonable/doable?

I think this sounds pretty reasonable. I don't think it'll be significantly more work than what I've already got here, it'll just involve starting with the "user" as the base user login state and then going from there. I'll update when I've made these changes.

Comment thread lib/auth/auth.go
Comment thread lib/auth/auth.go Outdated
Comment thread lib/auth/auth.go
@mdwn mdwn force-pushed the mike.wilson/user-login-state-generator branch from b26e9b8 to 4636754 Compare August 10, 2023 12:30
@mdwn mdwn force-pushed the mike.wilson/user-login-state-generator branch from 94fe0cf to 682e920 Compare August 11, 2023 16:01
@mdwn
Copy link
Copy Markdown
Contributor Author

mdwn commented Aug 11, 2023

@Tener @espadolini @fspmarshall This should be good to review now, I've worked through the kinks here.

Comment thread lib/auth/userloginstate/generator_test.go
Comment thread lib/auth/auth.go
}
}

// getUserOrLoginState will return the given user or the login state associated with the user.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a legitimate case for the user not to have user logins state?

If not, we could force the user to re-login by deleting the user login state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a question for @fspmarshall, I think -- possibly remote users?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certificates issued directly with tctl auth sign (e.g. certs for a bot) won't have an associated login state. There may be other cases as well.

@mdwn mdwn added this pull request to the merge queue Aug 17, 2023
Merged via the queue into master with commit c954d14 Aug 17, 2023
@mdwn mdwn deleted the mike.wilson/user-login-state-generator branch August 17, 2023 01:01
@public-teleport-github-review-bot
Copy link
Copy Markdown

@mdwn See the table below for backport results.

Branch Result
branch/v13 Failed

mdwn added a commit that referenced this pull request Aug 17, 2023
…icates. (#29364)

* Generate user login state from access lists and integrate into certificates.

On login, the user login state will be generated, using access lists to
register additional roles and traits that will be inserted into the user's
certificate.

Tests have been added to exercise this as well.

* Cache user login states, filter roles that aren't in the backend.

* Small refactor.

* Optimize RPC calls, test merge login in auth.go more thoroughly.

* Warn when role is missing.

* Update so access info uses the user login state directly, user login state comprises the whole state as opposed to a mix.

* Logic tweaks to restore tests.

* Integrate user login state cache.

* Swap out get user for get user state where applicable.

* Revert unrelated debug change.

* Add in missing err check.

* Further replacing with user state.

* Revert changes to helpers to try to get integration tests working.

* Revert "Revert changes to helpers to try to get integration tests working."

This reverts commit 682e920.

* Add in user type to generator.

* Use supplied user for generating SSH certs.
github-merge-queue Bot pushed a commit that referenced this pull request Aug 21, 2023
…icates. (#29364) (#30628)

* Generate user login state from access lists and integrate into certificates.

On login, the user login state will be generated, using access lists to
register additional roles and traits that will be inserted into the user's
certificate.

Tests have been added to exercise this as well.

* Cache user login states, filter roles that aren't in the backend.

* Small refactor.

* Optimize RPC calls, test merge login in auth.go more thoroughly.

* Warn when role is missing.

* Update so access info uses the user login state directly, user login state comprises the whole state as opposed to a mix.

* Logic tweaks to restore tests.

* Integrate user login state cache.

* Swap out get user for get user state where applicable.

* Revert unrelated debug change.

* Add in missing err check.

* Further replacing with user state.

* Revert changes to helpers to try to get integration tests working.

* Revert "Revert changes to helpers to try to get integration tests working."

This reverts commit 682e920.

* Add in user type to generator.

* Use supplied user for generating SSH certs.
Comment thread lib/auth/auth.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git bisect tells me that this introduced a regression in Connect My Computer that was not caught by existing tests and I don't understand why.

In general, when you start Connect My Computer setup (instructions in the description of #30905), Connect creates a new role in the cluster (CreateConnectMyComputerRole RPC) and assigns it to the current user. At this point, we have to refresh the role list in local certs. To achieve this, we do this:

// ReissueUserCerts called with CertCacheDrop and a bogus access request ID in DropAccessRequests
// allows us to refresh the role list in the certs without forcing the user to relogin.
//
// Sending bogus request IDs is not documented but it is covered by tests. Refreshing roles based
// on the server state is necessary for tsh request drop to work.
//
// If passing bogus request IDs ever needs to be removed, then there are two options here:
// * Pass a wildcard instead. This will break setups where people use access requests to make
// Connect My Computer work. Most users will probably not use access requests for that though.
// * Invalidate the stored certs somehow to force the user to relogin. If Connect makes a request
// after role setup and [client.IsErrorResolvableWithRelogin] returns true for the error from
// the response, Connect will ask the user to relogin.
//
// TODO(ravicious): Expand auth.ServerWithRoles.GenerateUserCerts to support refreshing role
// list without having to send a bogus request ID.
err = certManager.ReissueUserCerts(ctx, client.CertCacheDrop, client.ReissueParams{
RouteToCluster: cluster.Name,
DropAccessRequests: []string{fmt.Sprintf("bogus-request-id-%v", uuid.NewString())},
})

This used to update the certs on disk with the current role list from the backend. It's the same operation that tsh request drop does when you tell it to drop a certain access request.

After this is done, the Electron app calls tshd's GetCluster RPC which includes the role list read from the cert on disk. I noticed that it now doesn't get updated and git bisect led me to this PR. tsh status run from inside Connect also doesn't return the new role.

What's weird is that there's an integration test which calls CreateConnectMyComputerRole. It specifically checks if the certs on disk include the new role:

// Verify that the certs have been reloaded and that the user is assigned the role.
//
// GetCluster reads data from the cert. If the certs were not reloaded properly, GetCluster
// will not return the role that's just been assigned to the user.
clusterDetails, err := handler.GetCluster(ctx, &api.GetClusterRequest{
ClusterUri: rootClusterURI,
})
require.NoError(t, err)
require.Contains(t, clusterDetails.LoggedInUser.Roles, roleName,
"the user certs don't include the freshly added role; the certs might have not been reloaded properly")

The test passes just fine. It's the "role does not exist" test from those table tests above.

I also added a test for GenerateUserCerts which also passes:

// testRoleRefreshWithBogusRequestID verifies that GenerateUserCerts refreshes the role list based
// on the server state, even when supplied an ID for a nonexistent access request.
//
// Teleport Connect depends on this behavior when setting up roles for Connect My Computer.
// See [teleterm.connectmycomputer.RoleSetup].
func testRoleRefreshWithBogusRequestID(t *testing.T, testPack *accessRequestTestPack) {

Given the changes in this PR, do you have any clue as to why this would update the cert in the integration test but not in the app?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is related to the change in lib/auth/auth_with_roles.go.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have reason to believe this may also have caused a regression for Machine ID. Looking into it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#30978 doesn't fix the regression in Connect My Computer, unfortunately.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I see the regression happened because Connect My Computer depended on the following interactions:

  1. Log in with roles foo and bar.
  2. Create a new role baz and assign it to the current user.
  3. Drop a bogus access request (== refresh the role list based on the current backend state).

Now it doesn't work like this, because at point 3 the role list is not updated based on the backend state, but rather on the "cached" login state from 1.

We used the hack with dropping a bogus access request because we wanted to have a way to refresh the role list without asking the user for credentials again. This is the same behavior that tsh request drop needed.

However, with the login state in place, it's in fact better (more secure) for tsh request drop to use the login state rather than backend state. This doesn't work for Connect My Computer because we need fresh roles.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I still don't understand is how it slipped through the integration test, as I mentioned in my original comment. I suppose it's an issue with some kind of cache. ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrrrm, it definitely wasn't intentional to change this behavior. Where exactly did I change that here? IMO we should revert that line, because IMO I'd rather keep the existing UX and improve it later than have a workaround.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the change in lib/auth/auth_with_roles.go. We used to build accessInfo based on the current backend state of the user, this PR made it so that it's based on the login state instead.

Reverting it back to accessInfo = services.AccessInfoFromUser(user) fixes the regression.

Do you have any clue as to why this wasn't caught in the integration test? Since the integration test doesn't go through the normal login procedure, I suppose login state was empty and thus accessInfo was calculated from the user resource, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would make sense. I think you're right, the user state was probably just empty.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a PR which reverts this change in lib/auth/auth_with_roles.go. If you feel that we should revert it, then I say let's do it.

It'd be good to get this merged before the test plan starts, but this might be quite tight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants