Skip to content

Refactor lib/teleterm gateway#27685

Closed
greedy52 wants to merge 1 commit intomasterfrom
STeve/28636_lib_teleterm_refactor
Closed

Refactor lib/teleterm gateway#27685
greedy52 wants to merge 1 commit intomasterfrom
STeve/28636_lib_teleterm_refactor

Conversation

@greedy52
Copy link
Copy Markdown
Contributor

@greedy52 greedy52 commented Jun 9, 2023

Prep for:

Changes:

  • Move construction of DbcmdCLICommandProvider from lib/teleterm/daemon to lib/teleterm/clusters
  • Refactor cert reissue logic
    • clusters.Cluster hides gateway types from daemon.GatewayCertReissuer
    • gateway.Gateway takes tls.Certficate instead of cert paths, and gets a tls.Certificate directly on reissue callback

@greedy52 greedy52 added database-access Database access related issues and PRs teleport-connect Issues related to Teleport Connect. labels Jun 9, 2023
@greedy52 greedy52 self-assigned this Jun 9, 2023
@ravicious ravicious self-requested a review June 12, 2023 09:51
@smallinsky smallinsky self-requested a review June 12, 2023 13:12
Copy link
Copy Markdown
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Seems to be reasonable. I left some nit comments.

Comment on lines +225 to +228
if err := c.reissueDBCerts(ctx, routeToDatabase); err != nil {
return tls.Certificate{}, trace.Wrap(err)
}
tlsCert, err := c.loadDBCert(routeToDatabase)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the LoadDBCert is always called after reissueDBCerts since the reissueDBCerts uses the same information to issue a new cert at first glance the routeToDatabase.CheckCertSubject validation inside loadDBCert function is not needed.

nit: Also instead of loadingCert from disc directly we can use the introduce a helper function for our local agent c.clusterClient.LocalAgent() that will be responsible for loading the cert.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the cert subject after calling reissueDBCerts was originally added because reissueDBCerts saves the cert to disk without returning it from the function. As such, we need to load the cert then and there's a small chance that between saving and loading, the cert on disk might be modified by e.g. tsh db login being ran from a separate shell session. See #18740.

After we switch from using ReissueUserCerts to IssueUserCertsWithMFA (which returns the cert), checking the cert subject will become unnecessary.


// GatewayCertReissuer defines an interface of a helper that manages the
// process of reissuing certificates.
type GatewayCertReissuer interface {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that is used in scope of this package so we can make it enexported:

Suggested change
type GatewayCertReissuer interface {
type gatewayCertReissuer interface {

CLICommandProvider gateway.CLICommandProvider
TCPPortAllocator gateway.TCPPortAllocator
OnExpiredCert gateway.OnExpiredCertFunc
CertReissuer GatewayCertReissuer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: go docs.

Comment on lines +225 to +228
if err := c.reissueDBCerts(ctx, routeToDatabase); err != nil {
return tls.Certificate{}, trace.Wrap(err)
}
tlsCert, err := c.loadDBCert(routeToDatabase)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the cert subject after calling reissueDBCerts was originally added because reissueDBCerts saves the cert to disk without returning it from the function. As such, we need to load the cert then and there's a small chance that between saving and loading, the cert on disk might be modified by e.g. tsh db login being ran from a separate shell session. See #18740.

After we switch from using ReissueUserCerts to IssueUserCertsWithMFA (which returns the cert), checking the cert subject will become unnecessary.

}

func (d DbcmdCLICommandProvider) GetCommand(gateway *gateway.Gateway) (*exec.Cmd, error) {
cluster, err := d.storage.GetByResourceURI(gateway.TargetURI())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally made it so that DbcmdCLICommandProvider holds storage instead of cluster because I wanted to guarantee that the cluster.clusterClient passed to dbcmd.NewCmdBuilder will be a fresh client constructed from the state of the profile on disk, which is what storage.GetByResourceURI returns.

Do you foresee any problems with essentially "caching" it in a long-lived object such as the gateway? I'm not totally sure how TeleportClient handles scenarios where the cert expires after TeleportClient is instantiated. It might be that there are no problems with that, I've just never checked if that's the case.

Same goes for cluster.status which holds things such as Roles and Username. If I start a gateway, the user cert expires, Connect asks me to relogin and I log in again, will DbcmdCLICommandProvider have access to fresh profile status?

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When connecting to the local proxy (tunnel mode), DbcmdCLICommandProvider only needs tc.SiteName (for things like psql profile name, which is probably not required either) and maybe tc.InsecureSkipVerify. TeleportClient and cluster.status are used more when NOT connecting to the local proxy (it has to know how to load the cert and where to send to, which both not required for local proxy), or connecting to local proxy with TLS cert.

One exception to above is Oracle where profile.OracleWalletDir is called by DbcmdCLICommandProvider. But I don't think Oracle works anyway in Connect since custom logic is needed to create that OracleWalletDir in the first place.

So maybe I can refactor DbcmdCLICommandProvider to decouple it from TeleportClient and ProfileStatus. What do guys think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbcmd depending on TeleportClient instead of on a more specific data structure feels like an oversight which just naturally happened over time. I think it might be another side effect of tsh not having to care about long-lived TeleportClient instances – why make dbcmd depend on something else than TeleportClient when you can just read stuff from tc and you can be sure that it's fresh on each tsh call.

This might be another argument for returning a fresh TeleportClient in dbcmd command builder, so that the semantics of using TeleportClient in Connect resemble those of using TeleportClient in tsh. I don't have a horse in this race though so feel free to pick what works best.

TargetSubresourceName: params.TargetSubresourceName,
LocalPort: params.LocalPort,
CLICommandProvider: cliCommandProvider,
TCPPortAllocator: s.cfg.TCPPortAllocator,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fill me in on how you got rid of passing TCPPortAllocator here?

I think the only place which actually set TCPPortAllocator on daemon's config was daemon_test.go, but that same place mocks the gateway creator so I see you were able to "inject" the mocked TCP port allocator exactly where it's needed vs passing it through the daemon.

I know it was all quite convoluted, so I appreciate the effort to clean this up!

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I passed gateway.TCPPortAllocator to mockGatewayCreator now. We probably can even completely get rid of it from gateway.Config if gateway is an interface.

// CreateGateway creates a gateway
func (c *Cluster) CreateGateway(ctx context.Context, params CreateGatewayParams) (*gateway.Gateway, error) {
if params.CLICommandProvider == nil {
params.CLICommandProvider = NewDbcmdCLICommandProvider(c, dbcmd.SystemExecer{})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does moving DbcmdCLICommandProvider initialization from the daemon to the cluster help with adding kube gateways?

My original comment pointed that the creation of the CLI command provider should be moved up to outer layers, not further down.

Instead of creating a new struct for every gateway, we'd have only two CLI command provider structs, say DbcmdCLICommandProvider and KubeCLICommandProvider, both initialized before calling daemon.New and passed to that function as a field on Config. Then when CreateGateway on the daemon is called, we inspect the target URI of the gateway and pass a reference to an appropriate provider.

This goes back to how dependencies are typically injected in Go and how we should approach mocking them out in tests. I'm no expert on this, so by all means please push back on this if you disagree!


I'm not sure if any of this matters to be honest. Because of how clusters.Cluster is written, it is usually mocked out wholesale anyway. So with how things are right now, there won't be a situation where I want to use clusters.Cluster or daemon.Service but I want the CLICommandProvider to be mocked out – CLICommandProvider always passes through clusters.Cluster. As I mentiode, clusters.Cluster has to be mocked out itself due to how it's designed.

This is clearly demonstrated by daemon_test.go where you were able to shift passing a mocked out TCP port allocator to the mock gateway creator.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, DbcmdCLICommandProvider and KubeCLICommandProvider could be initialized within daemon.New. If DbcmdCLICommandProvider needs clusters.Storage for initialization, then daemon.New would simply pass the Config.Storage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for some opinions on Slack regarding passing the deps up. https://gravitational.slack.com/archives/C08HF8E9F/p1686740351896489

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, daemon.Service has to own CLICommandProvider only if the db command provider has to depend on storage so a fresh cluster is required. That may not be the case. Let me see what I can do on the DbcmdCLICommandProvider side to decouple the need for TeleportClient and ProfileStatus. If that's possible, cluster can be the one creating it and there is really no need for daemon.Service to know anything about the command provider.

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the same problem applies to the way I am doing the reissue cert callback. Do we need the storage to get a fresh Cluster for reissuing the database cert?

Seems so right? so the current refactor on issue cert won't work. Let me rethink through this.

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we can just have a Cluster.Reload() function after relogin, or recreate the gateways after relogin.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if a fresh cluster is needed for the db cert, but if we're to have Cluster.Reload, I wonder if it wouldn't be better to just pass the storage there. 🤔

AFAIK, TeleportClient was originally made for tsh which doesn't really have the concept of long-running processes, maybe other than proxies. Actually, I'd check how this those TeleportClient instances are managed in the current local proxy middlewares (or if TeleportClient is used there in the first place).

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if a fresh cluster is needed for the db cert, but if we're to have Cluster.Reload, I wonder if it wouldn't be better to just pass the storage there. 🤔

I think it's safer to get a fresh cluster for reissuing db cert the same way a fresh cluster is used for DBCliCommandProvider. If so, I would say the current implemented way is better so refactoring is probably not needed at all =D.

Thinking through this, I found a problem with this change I made earlier:

// TLSRoutingConnUpgradeRequired indicates that ALPN connection upgrades
// are required for making TLS routing requests.
TLSRoutingConnUpgradeRequired bool
// RootClusterCACertPoolFunc is callback function to fetch Root cluster CAs
// when ALPN connection upgrade is required.
RootClusterCACertPoolFunc alpnproxy.GetClusterCACertPoolFunc

So the gateway is caching TLSRoutingConnUpgradeRequired and RootClusterCACertPoolFunc. It's rare they change but when they do change after login, gateway has no way to reload these values, at least not supported in the current local proxy implementation. Might be cleaner to recreate the gateway by the new cluster after relogin (which will break active connections though).

@smallinsky smallinsky self-requested a review June 14, 2023 14:58
@greedy52 greedy52 closed this Jun 20, 2023
@greedy52 greedy52 deleted the STeve/28636_lib_teleterm_refactor branch June 28, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database-access Database access related issues and PRs teleport-connect Issues related to Teleport Connect.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants