Conversation
f30979c to
111651e
Compare
There was a problem hiding this comment.
In web sessions' cache we also compare remote site:
Line 1024 in 2baa7e7
111651e to
928bae2
Compare
|
@rosstimothy @espadolini I added you here, because I saw you were pushing some improvements to the remote client cache in |
|
I would recommend converting from
|
| // Check if we already have a cached client for this cluster and a valid cert. | ||
| // Theoretically, we don't have to check the validity | ||
| // because auth server will disconnect expired clients. | ||
| // However, by default, it isn't done immediately | ||
| // (until it is enabled with disconnect_expired_cert on cluster) and to keep | ||
| // the previous behavior, we need to check the cert. | ||
| // | ||
| // If the cert is expired, we will remove the client and make an attempt | ||
| // to connect to proxy which will fail with an appropriate error. | ||
| proxyClient := c.getFromCache(clusterURI) | ||
| if proxyClient != nil { | ||
| if cluster.Connected() { | ||
| return proxyClient, nil | ||
| } | ||
| err := c.InvalidateForRootCluster(clusterURI.GetRootClusterURI()) | ||
| if err != nil { | ||
| c.log.WithError(err).Errorf("Failed to invalidate expired remote client for %q.", clusterURI) | ||
| } | ||
| } |
There was a problem hiding this comment.
What's the benefit of invalidating the client on Get?
As it is, each call to get a client will incur the overhead of c.ResolveCluster (which reads from tsh home dir) in order to check if the cert is valid. But couldn't we make Get fairly simple and move the invalidation logic to places where we're certain that invalidation needs to happen?
Off the top of my head, Logout would need to remove the client from the cache while Login and AssumeRole would need to replace any old client with a new one.
There was a problem hiding this comment.
Actually, in the first version I didn't have that client invalidation here. It was something like this:
if proxyClient := c.getFromCache(clusterURI); proxyClient != nil && cluster.Connected() {
return proxyClient, nil
}
newProxyClient, err := clusterClient.ConnectToCluster(ctx)
if err != nil {
...The previous client would be closed when adding a new client or by the remote server, but then I thought that it would be good to close the connection immediately and added that InvalidateForRootCluster call.
But in general, do we need checking if the cert is valid? Probably no, the auth server will terminate the session anyway.
There was a problem hiding this comment.
I agree, I don't think there's a lot of value in client-side checking of certificate expiration; only the auth server's point of view matters, after all.
@rosstimothy thanks, I didn't know about this difference. I think it is a good idea to convert it, but I have two questions:
|
Why do we need this behavior? Shouldn't we be preventing the cluster client from ever being closed until the user either explicitly removes the cluster, or the users credentials are refreshed and a new cluster replaces the old one?
Hrmm there used to be an unexported equivalent implemented on the ClusterClient. We can definitely improve the MFA and credential refresh experience for the ClusterClient. |
That sounds reasonable if porting it to |
I was afraid that if the cluster becomes unavailable or the connection goes down, the client would be stuck in a "broken" state, unable to make new calls. |
| // Check if we already have a cached client for this cluster and a valid cert. | ||
| // Theoretically, we don't have to check the validity | ||
| // because auth server will disconnect expired clients. | ||
| // However, by default, it isn't done immediately | ||
| // (until it is enabled with disconnect_expired_cert on cluster) and to keep | ||
| // the previous behavior, we need to check the cert. | ||
| // | ||
| // If the cert is expired, we will remove the client and make an attempt | ||
| // to connect to proxy which will fail with an appropriate error. | ||
| proxyClient := c.getFromCache(clusterURI) | ||
| if proxyClient != nil { | ||
| if cluster.Connected() { | ||
| return proxyClient, nil | ||
| } | ||
| err := c.InvalidateForRootCluster(clusterURI.GetRootClusterURI()) | ||
| if err != nil { | ||
| c.log.WithError(err).Errorf("Failed to invalidate expired remote client for %q.", clusterURI) | ||
| } | ||
| } |
There was a problem hiding this comment.
I agree, I don't think there's a lot of value in client-side checking of certificate expiration; only the auth server's point of view matters, after all.
|
In the end I decided to keep caching In v16, we should have methods to reissue user certs (and some other issues fixed) in |
ravicious
left a comment
There was a problem hiding this comment.
Looks fine to me, I'll give it a try once I get around to reviewing the other PR.
* Replace all simple `c.clusterClient.ConnectToProxy()` calls * Use cached proxy client to create gateways * Use cached proxy client to assume roles * Invalidate clients when logging in and out * Gracefully handle expired cert error returned by the server * Drop `GetRootClusterURI` in headless auth watcher since URIs are already root URIs * Simplify error check * Make `auth.ClientI` parameter naming more consistent, use `root` prefix when needed * Reduce error scope where possible * Clear cached clients before passwordless login * Use `fakeClientCache` without pointers * Move separate `proxyClient` parameter to `CreateGatewayParams` in the gateways code
* Add remote client cache * Add an integration test * Close all clients when stopping the service * Move RemoteClientCache to the place where it is used * Do not check client cert in `Get` * Fix code style issues * Prevent potential race condition when removing a cached client * Test concurrent calls to `Get` * Add TODO * `remoteclientcache` -> `clientcache` * Reduce the `err` scope * Move `Config` closer to `New` and docs * Fix lint * Improve logging and error handling * Add missing comments * `Close` -> `Clear` * Improve the test * Remove mentions about "remote" client * Pass `cfg` directly to `Cache` * `InvalidateForRootCluster` -> `ClearForRootCluster` * Add docs for the interface * `ClearForRootCluster` -> `ClearForRoot` * Add config validation * Log multiple fields at once * Improve setting logger * Use cached remote clients in Connect (#38202) * Replace all simple `c.clusterClient.ConnectToProxy()` calls * Use cached proxy client to create gateways * Use cached proxy client to assume roles * Invalidate clients when logging in and out * Gracefully handle expired cert error returned by the server * Drop `GetRootClusterURI` in headless auth watcher since URIs are already root URIs * Simplify error check * Make `auth.ClientI` parameter naming more consistent, use `root` prefix when needed * Reduce error scope where possible * Clear cached clients before passwordless login * Use `fakeClientCache` without pointers * Move separate `proxyClient` parameter to `CreateGatewayParams` in the gateways code * Replace checking error string with `client.ErrClientCredentialsHaveExpired` (cherry picked from commit 39f9951)
* Cache remote clients in Connect (#38201) * Add remote client cache * Add an integration test * Close all clients when stopping the service * Move RemoteClientCache to the place where it is used * Do not check client cert in `Get` * Fix code style issues * Prevent potential race condition when removing a cached client * Test concurrent calls to `Get` * Add TODO * `remoteclientcache` -> `clientcache` * Reduce the `err` scope * Move `Config` closer to `New` and docs * Fix lint * Improve logging and error handling * Add missing comments * `Close` -> `Clear` * Improve the test * Remove mentions about "remote" client * Pass `cfg` directly to `Cache` * `InvalidateForRootCluster` -> `ClearForRootCluster` * Add docs for the interface * `ClearForRootCluster` -> `ClearForRoot` * Add config validation * Log multiple fields at once * Improve setting logger * Use cached remote clients in Connect (#38202) * Replace all simple `c.clusterClient.ConnectToProxy()` calls * Use cached proxy client to create gateways * Use cached proxy client to assume roles * Invalidate clients when logging in and out * Gracefully handle expired cert error returned by the server * Drop `GetRootClusterURI` in headless auth watcher since URIs are already root URIs * Simplify error check * Make `auth.ClientI` parameter naming more consistent, use `root` prefix when needed * Reduce error scope where possible * Clear cached clients before passwordless login * Use `fakeClientCache` without pointers * Move separate `proxyClient` parameter to `CreateGatewayParams` in the gateways code * Replace checking error string with `client.ErrClientCredentialsHaveExpired` (cherry picked from commit 39f9951) * Temporarily disable flaky part of `TestClientCache` (#38798) (cherry picked from commit 7067a88) * Make calls to the auth server concurrently (#38955) * Make calls to the auth server concurrently * Enhance the comment about preferences upsert * Summarize how `userpreferences.Update` works (cherry picked from commit ddc45a4)
1/2 of #15603
Currently, to make a remote API call, we need to connect to a proxy every time, which is time-consuming. To improve performance, we should create clients once and keep them in memory.
This task is done by
RemoteClientCachethat maintains a map of cluster URIs to proxy clients. When a callsite needs a proxy client, it can call the cache and it will return an existing client (if there is one) or will connect to proxy and return a new one (and set it in the cache).We only add a client to the cache in
RemoteClientCache.Get(); it would be ideal to store the client during login, unfortunately we don't have*client.ProxyClientthere, but*proxyclient.Client, which is a different thing.Probably we could get
*client.ProxyClientsomehow, I will look at it separately.The cache keeps clients for both root and leaf clusters. To get the auth client, we should call
proxyClient.CurrentCluster(). It will return the stored auth client (it is set duringtc.ConnectToProxy()call).Please let me know if the description above is unclear.
Testing
I wanted to write unit tests for
RemoteClientCachebut I didn't find a good way to mocktc.ConnectToProxy(probably because of my lack of experience with tsh code), which wants to make a remote call. In the end, I decided to add an integration test which has an advantage of testing the flow end-to-end.Part 2/2 (actual use of the cache): #38202.
Changelog: Improve performance of remote calls in Teleport Connect.