Conversation
smallinsky
left a comment
There was a problem hiding this comment.
First pass. Seems to be reasonable. I left some nit comments.
| if err := c.reissueDBCerts(ctx, routeToDatabase); err != nil { | ||
| return tls.Certificate{}, trace.Wrap(err) | ||
| } | ||
| tlsCert, err := c.loadDBCert(routeToDatabase) |
There was a problem hiding this comment.
the LoadDBCert is always called after reissueDBCerts since the reissueDBCerts uses the same information to issue a new cert at first glance the routeToDatabase.CheckCertSubject validation inside loadDBCert function is not needed.
nit: Also instead of loadingCert from disc directly we can use the introduce a helper function for our local agent c.clusterClient.LocalAgent() that will be responsible for loading the cert.
There was a problem hiding this comment.
Checking the cert subject after calling reissueDBCerts was originally added because reissueDBCerts saves the cert to disk without returning it from the function. As such, we need to load the cert then and there's a small chance that between saving and loading, the cert on disk might be modified by e.g. tsh db login being ran from a separate shell session. See #18740.
After we switch from using ReissueUserCerts to IssueUserCertsWithMFA (which returns the cert), checking the cert subject will become unnecessary.
|
|
||
| // GatewayCertReissuer defines an interface of a helper that manages the | ||
| // process of reissuing certificates. | ||
| type GatewayCertReissuer interface { |
There was a problem hiding this comment.
seems that is used in scope of this package so we can make it enexported:
| type GatewayCertReissuer interface { | |
| type gatewayCertReissuer interface { |
| CLICommandProvider gateway.CLICommandProvider | ||
| TCPPortAllocator gateway.TCPPortAllocator | ||
| OnExpiredCert gateway.OnExpiredCertFunc | ||
| CertReissuer GatewayCertReissuer |
| if err := c.reissueDBCerts(ctx, routeToDatabase); err != nil { | ||
| return tls.Certificate{}, trace.Wrap(err) | ||
| } | ||
| tlsCert, err := c.loadDBCert(routeToDatabase) |
There was a problem hiding this comment.
Checking the cert subject after calling reissueDBCerts was originally added because reissueDBCerts saves the cert to disk without returning it from the function. As such, we need to load the cert then and there's a small chance that between saving and loading, the cert on disk might be modified by e.g. tsh db login being ran from a separate shell session. See #18740.
After we switch from using ReissueUserCerts to IssueUserCertsWithMFA (which returns the cert), checking the cert subject will become unnecessary.
| } | ||
|
|
||
| func (d DbcmdCLICommandProvider) GetCommand(gateway *gateway.Gateway) (*exec.Cmd, error) { | ||
| cluster, err := d.storage.GetByResourceURI(gateway.TargetURI()) |
There was a problem hiding this comment.
I think I originally made it so that DbcmdCLICommandProvider holds storage instead of cluster because I wanted to guarantee that the cluster.clusterClient passed to dbcmd.NewCmdBuilder will be a fresh client constructed from the state of the profile on disk, which is what storage.GetByResourceURI returns.
Do you foresee any problems with essentially "caching" it in a long-lived object such as the gateway? I'm not totally sure how TeleportClient handles scenarios where the cert expires after TeleportClient is instantiated. It might be that there are no problems with that, I've just never checked if that's the case.
Same goes for cluster.status which holds things such as Roles and Username. If I start a gateway, the user cert expires, Connect asks me to relogin and I log in again, will DbcmdCLICommandProvider have access to fresh profile status?
There was a problem hiding this comment.
When connecting to the local proxy (tunnel mode), DbcmdCLICommandProvider only needs tc.SiteName (for things like psql profile name, which is probably not required either) and maybe tc.InsecureSkipVerify. TeleportClient and cluster.status are used more when NOT connecting to the local proxy (it has to know how to load the cert and where to send to, which both not required for local proxy), or connecting to local proxy with TLS cert.
One exception to above is Oracle where profile.OracleWalletDir is called by DbcmdCLICommandProvider. But I don't think Oracle works anyway in Connect since custom logic is needed to create that OracleWalletDir in the first place.
So maybe I can refactor DbcmdCLICommandProvider to decouple it from TeleportClient and ProfileStatus. What do guys think?
There was a problem hiding this comment.
dbcmd depending on TeleportClient instead of on a more specific data structure feels like an oversight which just naturally happened over time. I think it might be another side effect of tsh not having to care about long-lived TeleportClient instances – why make dbcmd depend on something else than TeleportClient when you can just read stuff from tc and you can be sure that it's fresh on each tsh call.
This might be another argument for returning a fresh TeleportClient in dbcmd command builder, so that the semantics of using TeleportClient in Connect resemble those of using TeleportClient in tsh. I don't have a horse in this race though so feel free to pick what works best.
| TargetSubresourceName: params.TargetSubresourceName, | ||
| LocalPort: params.LocalPort, | ||
| CLICommandProvider: cliCommandProvider, | ||
| TCPPortAllocator: s.cfg.TCPPortAllocator, |
There was a problem hiding this comment.
Could you fill me in on how you got rid of passing TCPPortAllocator here?
I think the only place which actually set TCPPortAllocator on daemon's config was daemon_test.go, but that same place mocks the gateway creator so I see you were able to "inject" the mocked TCP port allocator exactly where it's needed vs passing it through the daemon.
I know it was all quite convoluted, so I appreciate the effort to clean this up!
There was a problem hiding this comment.
I believe I passed gateway.TCPPortAllocator to mockGatewayCreator now. We probably can even completely get rid of it from gateway.Config if gateway is an interface.
| // CreateGateway creates a gateway | ||
| func (c *Cluster) CreateGateway(ctx context.Context, params CreateGatewayParams) (*gateway.Gateway, error) { | ||
| if params.CLICommandProvider == nil { | ||
| params.CLICommandProvider = NewDbcmdCLICommandProvider(c, dbcmd.SystemExecer{}) |
There was a problem hiding this comment.
How does moving DbcmdCLICommandProvider initialization from the daemon to the cluster help with adding kube gateways?
My original comment pointed that the creation of the CLI command provider should be moved up to outer layers, not further down.
Instead of creating a new struct for every gateway, we'd have only two CLI command provider structs, say DbcmdCLICommandProvider and KubeCLICommandProvider, both initialized before calling daemon.New and passed to that function as a field on Config. Then when CreateGateway on the daemon is called, we inspect the target URI of the gateway and pass a reference to an appropriate provider.
This goes back to how dependencies are typically injected in Go and how we should approach mocking them out in tests. I'm no expert on this, so by all means please push back on this if you disagree!
I'm not sure if any of this matters to be honest. Because of how clusters.Cluster is written, it is usually mocked out wholesale anyway. So with how things are right now, there won't be a situation where I want to use clusters.Cluster or daemon.Service but I want the CLICommandProvider to be mocked out – CLICommandProvider always passes through clusters.Cluster. As I mentiode, clusters.Cluster has to be mocked out itself due to how it's designed.
This is clearly demonstrated by daemon_test.go where you were able to shift passing a mocked out TCP port allocator to the mock gateway creator.
There was a problem hiding this comment.
Now that I think about it, DbcmdCLICommandProvider and KubeCLICommandProvider could be initialized within daemon.New. If DbcmdCLICommandProvider needs clusters.Storage for initialization, then daemon.New would simply pass the Config.Storage.
There was a problem hiding this comment.
I asked for some opinions on Slack regarding passing the deps up. https://gravitational.slack.com/archives/C08HF8E9F/p1686740351896489
There was a problem hiding this comment.
In my opinion, daemon.Service has to own CLICommandProvider only if the db command provider has to depend on storage so a fresh cluster is required. That may not be the case. Let me see what I can do on the DbcmdCLICommandProvider side to decouple the need for TeleportClient and ProfileStatus. If that's possible, cluster can be the one creating it and there is really no need for daemon.Service to know anything about the command provider.
There was a problem hiding this comment.
But I think the same problem applies to the way I am doing the reissue cert callback. Do we need the storage to get a fresh Cluster for reissuing the database cert?
Seems so right? so the current refactor on issue cert won't work. Let me rethink through this.
There was a problem hiding this comment.
wonder if we can just have a Cluster.Reload() function after relogin, or recreate the gateways after relogin.
There was a problem hiding this comment.
I'm not sure if a fresh cluster is needed for the db cert, but if we're to have Cluster.Reload, I wonder if it wouldn't be better to just pass the storage there. 🤔
AFAIK, TeleportClient was originally made for tsh which doesn't really have the concept of long-running processes, maybe other than proxies. Actually, I'd check how this those TeleportClient instances are managed in the current local proxy middlewares (or if TeleportClient is used there in the first place).
There was a problem hiding this comment.
I'm not sure if a fresh cluster is needed for the db cert, but if we're to have Cluster.Reload, I wonder if it wouldn't be better to just pass the storage there. 🤔
I think it's safer to get a fresh cluster for reissuing db cert the same way a fresh cluster is used for DBCliCommandProvider. If so, I would say the current implemented way is better so refactoring is probably not needed at all =D.
Thinking through this, I found a problem with this change I made earlier:
teleport/lib/teleterm/gateway/config.go
Lines 77 to 82 in 0bd2517
So the gateway is caching TLSRoutingConnUpgradeRequired and RootClusterCACertPoolFunc. It's rare they change but when they do change after login, gateway has no way to reload these values, at least not supported in the current local proxy implementation. Might be cleaner to recreate the gateway by the new cluster after relogin (which will break active connections though).
Prep for:
Changes:
DbcmdCLICommandProviderfromlib/teleterm/daemontolib/teleterm/clustersclusters.Clusterhides gateway types fromdaemon.GatewayCertReissuergateway.Gatewaytakestls.Certficateinstead of cert paths, and gets atls.Certificatedirectly on reissue callback