Skip to content

Connect Kube gateway part 1: lib/teleterm/gateway#28312

Merged
greedy52 merged 7 commits intomasterfrom
STeve/26836_connect_kube_gateway_go
Jul 10, 2023
Merged

Connect Kube gateway part 1: lib/teleterm/gateway#28312
greedy52 merged 7 commits intomasterfrom
STeve/26836_connect_kube_gateway_go

Conversation

@greedy52
Copy link
Copy Markdown
Contributor

Part of

Implementation based on poc #27287. Decided to scratch the refactor attempt #27685 and stick with the current design.

This is part 1 of the actual implementation. This part covers:

  • lib/teleterm/api/uri
  • lib/teleterm/gateway

The next part will cover the rest of the Golang side of changes.

@greedy52 greedy52 added tls-routing Issues related to TLS routing kubernetes teleport-connect Issues related to Teleport Connect. labels Jun 26, 2023
@greedy52 greedy52 self-assigned this Jun 26, 2023
@greedy52 greedy52 requested review from ravicious and smallinsky June 26, 2023 18:20
@greedy52 greedy52 marked this pull request as ready for review June 26, 2023 18:20
@github-actions github-actions Bot added size/lg tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Jun 26, 2023
Comment thread lib/teleterm/gateway/gateway.go
@rosstimothy rosstimothy self-requested a review June 27, 2023 13:04
Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet to read through the whole PR, submitting what I got so far.

Comment thread lib/teleterm/api/uri/uri.go Outdated
Comment thread lib/teleterm/gateway/gateway.go
Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
Comment thread lib/teleterm/gateway/gateway_kube.go
Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
}

config := kubeconfig.CreateLocalProxyConfig(clientcmdapi.NewConfig(), values)
return trace.Wrap(kubeconfig.Save(g.KubeconfigPath(), *config))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// KubeconfigPath returns the kubeconfig path that can be used for clients to
// connect to the local proxy.
func (g *Gateway) KubeconfigPath() string {
// Assumes CertPath is unique per kube cluster.
return g.cfg.CertPath + ".kubeconfig"
}

Since this PR is only part 1, it's hard to judge this particular decision. Do you plan to adjust this path in part 2 or is it the final path? g.cfg.CertPath is now

c.status.DatabaseCertPathForCluster(c.clusterClient.SiteName, db.GetName())

so I assume at least this bit needs to change.

Potential issues

If we upgrade existing kube tabs in Connect so that they automatically start using this new kube proxy instead of tsh kube login, this change will break the workflow of some users.

Another issue I see is that this adds a new file in the tsh home dir. Is that the plan for regular tsh proxy kube as well? My concern here is that if lib/teleterm adds a new file in the tsh home dir, the rest of the codebase will not be aware of it which might lead to conflicts or other code making wrong assumptions about the layout of the home dir.

Preserving backwards compatibility

Assuming that tsh proxy kube will not do this and thus Connect will be the only place needing to store this specific kubeconfig somewhere, I think we can solve this by utilizing what Connect is already doing with kubeconfigs. This will guarantee that the workflow of existing users doesn't break with this change.

When you open a new kube tab in Connect, we generate a new relative path to kubeconfig with a unique identifier and store it on the document object:

// We prepend the name with `rootClusterId/` to create a kube config
// inside this directory. When the user logs out of the cluster,
// the entire directory is deleted.
kubeConfigRelativePath:
options.kubeConfigRelativePath ||
`${params.rootClusterId}/${params.kubeId}-${unique(5)}`,

The same relative path is then stored on the connection object (responsible for the item in the list in the top left in Connect). So as long as you don't log out or manually remove the connection, connecting to the same kube cluster will always use that predetermined kubeconfig location.

What we could do is send that relative path to tshd when a gateway is being created. The problem is, tshd doesn't know the full path for the folder with kubeconfigs. We don't want the Electron app to send the full path with the request to create the gateway because this could serve as an attack vector.

Instead, we could send the dir with kube config during the start of the daemon, similar to how we send other directories:

flags: [
'daemon',
'start',
// grpc-js requires us to pass localhost:port for TCP connections,
// for tshd we have to specify the protocol as well.
`--addr=${tshAddress}`,
`--certs-dir=${getCertsDir()}`,
`--prehog-addr=${staticConfig.prehogAddress}`,

Coincidentally, the dir with kube configs is available in runtimeSettings.ts as getKubeConfigsDir, we just have to make sure to cache the result and not call the function twice.

If tshd knew where this dir is, it could then accept the relative path from the Electron app when creating a gateway, along with verifying that it's indeed relative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravicious

If we upgrade existing kube tabs in Connect so that they automatically start using this new kube proxy instead of tsh kube login, this change will break the workflow of some users.

I haven't thought through this but maybe we should. The document, terminal types will be different now. How would this migration work?

Another issue I see is that this adds a new file in the tsh home dir. Is that the plan for regular tsh proxy kube as well? My concern here is that if lib/teleterm adds a new file in the tsh home dir, the rest of the codebase will not be aware of it which might lead to conflicts or other code making wrong assumptions about the layout of the home dir.

I don't see an issue putting in tsh home dir, as tsh proxy kube creates a temp kubeconfig there too. Though I would say it's definitely better to put it in a "Connect" kube dir separately from tsh home. Curious what's the reason behind the unique ID? the resourceURI (or the teleport cluster + kube cluster combo) should be unique. All terminals of the same kube cluster should share the config too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document, terminal types will be different now. How would this migration work?

I mentioned it briefly in your proof of concept. #27287 (comment)

Because the new document type is so similar to the old one, we should be able to transform one into the other without requiring any input from the user. This would mean that when loading the state from disk, we'd need to inspect the documents first and then perform necessary changes.

We don't have a mechanism in place, so that's why in the comment I linked above I mentioned leaving the migration for Grzegorz and me. ;)

I don't see an issue putting in tsh home dir, as tsh proxy kube creates a temp kubeconfig there too. Though I would say it's definitely better to put it in a "Connect" kube dir separately from tsh home.

If tsh proxy kube already puts stuff there then I think it's fine for Connect to use the same path. Connect uses its own tsh home for now.

Curious what's the reason behind the unique ID? the resourceURI (or the teleport cluster + kube cluster combo) should be unique. All terminals of the same kube cluster should share the config too.

@gzdunek Do you remember why we add unique IDs to kubeconfigs? AFAIR we thought kube clusters might not be unique within a Teleport cluster.

Another reason that comes to my mind is that we put the kubeconfigs under <appPath>/kube/<root cluster>, so if two leafs have a kube cluster with the same name, it'd cause problems.


In any case, I think it's fine if we want to change the location of kubeconfigs. We just need to make sure that the existing setups people have will not break.

By existing setups I mean two things:

  • People who have the old document type in app_state.json, meaning if they restart Connect and restore tabs, Connect is going to restore the old document type with tsh kube login.
  • People who share kubeconfig created by Connect with 3rd party apps. I'm not even sure if there are people who figured out they can do this as we don't explain this clearly anywhere.

One thing we could do is keep the existing kube login implementation of kube tabs, but make it so that clicking "Connect" from the resource table opens the new implementation. This way the existing tabs will keep working as expected.

All tsh kube login does is prepares a special kubeconfig stanzas, right? But once you run it once, theoretically you can use that kubeconfig forever, don't you? If there's a person who uses Teleport Connect and shares kubeconfig generated by us with a 3rd party app, this setup should continue to work until they manually log out of the cluster (since logging out removes kube configs created by Connect). So it's not a use case we have to worry about.

In the next major version after kube proxies get released in Connect, we could deprecate the old document implementation. Instead of having it work as usual, it could say something like "This tab uses a kube connection method that is deprecated. Please use the button below to lorem ipsum and migrate any 3rd party apps away from using <old kubeconfig location>".

We'd need to update docs to clearly express that running Connect is now required for the kubeconfig to work. We should also update the deprecated parts of kube tabs in the code to clearly mark them as deprecated.

This way we also don't have to write any migration code that I mentioned at the beginning of this comment. Doing so would be probably a bad choice, as the kube login document has totally different behavior than the kube proxy document. Even if we kept the kubeconfig locations intact, the new implementation would require Connect to be running, while the old one doesn't require that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason that comes to my mind is that we put the kubeconfigs under <appPath>/kube/<root cluster>, so if two leafs have a kube cluster with the same name, it'd cause problems.

Yeah, that's the reason - we keep kubeconfigs for root and leaves in the same "root" directory, so there could be a name collision.

Btw I like the idea of having a new type of kube document and deprecating the old one - it sounds much easier than documents migration.

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

I've decided to pass in a ConfigDir to the gateway and use keypaths to construct the kubeconfig path:

  • The path must persist through logins so I think it's better to use a separate path from tsh home.
  • ConfigDir so that it can be used for other things like database access specific local configs, app access local ca, etc.

I cheated using keypaths. But we can certainly design a naming convention just for Connect instead of using keypaths if wanted. In the next change, I can pass in a dir to the daemon as suggested above.

Let me know what you think about this!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path must persist through logins so I think it's better to use a separate path from tsh home.

Why does it need to persist through logins?

In the next change, I can pass in a dir to the daemon as suggested above.

If by "as suggested above" you mean the "Preserving backwards compatibility" section of my OG comment, then that's not necessary I think.

To sum up the thread:

  • We want to keep both kube tab implementations until the next major version where we'll deprecate the old implementation, as described in my previous comment.
  • Since the new kube tab implementation is separate from the old one, we don't need to preserve backwards compatibility of the kubeconfig location.
  • tsh already stores kubeconfig in tsh home dir, so it's fine for Connect to do the same.

Given all this, I don't see what passing ConfigDir would help with. Database access specific local configs etc. are all hypotheticals at the moment, as far as I understand.

We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to persist through logins?

What I meant is the file has to persist during re-login. During re-login, teleport client cleans the kube keys folder. tsh proxy kube (CLI) has to rewrite the config:

// We recreate ephemeral kubeconfig to make sure it's there even after relogin.
k.kubeconfig.CurrentContext = currentContext
if err := k.WriteKubeConfig(); err != nil {
return tls.Certificate{}, trace.Wrap(err)
}

We can either do the same for Teleport Connect or just use a separate directory.

"as suggested above"

What I meant by "as suggested above" is to pass on a directory through parameters when creating the daemon.

Database access specific local configs etc.

For example, oracle generates some local files for connection:

if dbInfo.Protocol == defaults.ProtocolOracle {
if err := generateDBLocalProxyCert(key, profile); err != nil {
return trace.Wrap(err)
}
err = oracle.GenerateClientConfiguration(key, dbInfo.RouteToDatabase, profile)
if err != nil {
return trace.Wrap(err)
}
}

We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.

I can certainly do this. As mentioned above, just have to rewrite the config after cert reissue. I missed this part in my previous implementation. Let me know if you prefer this way or use a separate directory.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, now I understand what the purpose behind ConfigDir is.

One thing we need to keep in mind is that in the future we'd probably want to make Connect use ~/.tsh, sharing it with tsh. I don't know if storing kubeconfigs outside of tsh home dir will make it easier or harder. ;)

Given your explanation with keeping kubeconfigs around on relogin, I don't have any strong preference for one way or another.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to add, I think using keypaths makes sense.

// <baseDir>/keys/<proxy>/<username>-kube/<cluster>/<kubename>-kubeconfig
func KubeConfigPath(baseDir, proxy, username, cluster, kubename string) string {
return filepath.Join(KubeCertDir(baseDir, proxy, username, cluster), kubename+kubeConfigSuffix)

The /keys/ part is going to look a bit weird in a standalone folder. But if we decided not to use it then we'd have to recreate pretty much half of the keypaths functionality anyway.

If we stick to keypaths, we also increase the chances of compatibility with other parts of our tooling. That is, if tsh uses keypaths and stores something in tsh home dir and we use keypaths but store something outside, sharing code between the two will be somewhat easier as only baseDir will change.

Though I suppose coming up with a name for Connect's baseDir is going to be challenging. I wasn't able to come up with anything good. 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to keep in mind is that in the future we'd probably want to make Connect use ~/.tsh, sharing it with tsh.

I think that sharing will break other things too like tsh db logout will erase db certs used by the gateway.

Though I suppose coming up with a name for Connect's baseDir is going to be challenging. I wasn't able to come up with anything good. 🤔

Sorry for going back-and-force on this. I end up passing in the tsh profile/home dir to the gateway and letting gateway use keypaths, and I added the logic to rewrite the kubeconfig on relogin. Just too lazy to "manage" baseDir. And this way it's more aligned with tsh.

@ravicious ravicious self-requested a review June 28, 2023 16:00
Comment thread lib/srv/alpnproxy/local_proxy.go
Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
Comment thread lib/teleterm/gateway/kube_cert_reissuer.go
Comment thread lib/teleterm/gateway/gateway_kube_test.go Outdated
Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good overall, we just need to make a decision on ConfigDir.

Thanks for adding the comments, they're very helpful and detailed!

// Insecure
Insecure bool
// ClusterName is the Teleport cluster name
ClusterName string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterName doesn't seem to be set anywhere outside of the tests.

Comment thread lib/teleterm/gateway/gateway_kube.go Outdated
}

config := kubeconfig.CreateLocalProxyConfig(clientcmdapi.NewConfig(), values)
return trace.Wrap(kubeconfig.Save(g.KubeconfigPath(), *config))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path must persist through logins so I think it's better to use a separate path from tsh home.

Why does it need to persist through logins?

In the next change, I can pass in a dir to the daemon as suggested above.

If by "as suggested above" you mean the "Preserving backwards compatibility" section of my OG comment, then that's not necessary I think.

To sum up the thread:

  • We want to keep both kube tab implementations until the next major version where we'll deprecate the old implementation, as described in my previous comment.
  • Since the new kube tab implementation is separate from the old one, we don't need to preserve backwards compatibility of the kubeconfig location.
  • tsh already stores kubeconfig in tsh home dir, so it's fine for Connect to do the same.

Given all this, I don't see what passing ConfigDir would help with. Database access specific local configs etc. are all hypotheticals at the moment, as far as I understand.

We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.

}

g.cleanupFuncs = append(g.cleanupFuncs, func() error {
return trace.Wrap(utils.RemoveFileIfExist(g.KubeconfigPath()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind removing the kubeconfig when gateway closes? I know that the kubeconfig will work only as long as the gateway is up. From the UX perspective though, I wonder if leaving the kubeconfig wouldn't be better.

I'm thinking of a situation where the user doesn't know that Connect needs to be open for the kubeconfig to work. So they close Connect and either:

  1. Connect removes the kubeconfig. The user tries to use a third-party tool by supplying a path to the kubeconfig, but they get a "Not found" error. At this point they probably wonder why the kubeconfig got removed.
  2. Connect keeps the kubeconfig. The user tries to use a third-party tool, but they get a "Can't establish connection to localhost:45678". At this point they might wonder what is localhost:45678.

(These are just my assumptions as to how those 3rd party apps might work, do you know perhaps how they behave in those situations?)

Now that I wrote it out like this, removing the kubeconfig might be a better idea. The kubeconfig path is the only part of the "API" the user can easily tie back to Connect, since they got the kubeconfig from Connect. The same cannot be said about a random port on localhost (which they won't even see in the app).

But I wonder what you think about it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not found error kubectl example:

$ KUBECONFIG=not_found kubectl get pod
W0705 15:54:05.352504   28156 loader.go:222] Config not found: not_found
E0705 15:54:05.355350   28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.355788   28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.356873   28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.358003   28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.359051   28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

local proxy down kubectl example:

The connection to the server teleport.dev.aws.stevexin.me:443 was refused - did you specify the right host or port?

The reason I added the cleanup is to avoid keeping unused files around forever, since I added ConfigDir. If it's in tsh home, it will get cleanup eventually.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output you posted cements the idea that removing the kubeconfig is better than keeping it around. Though I still worry that we might run into 3rd party clients that completely blow up or force you back to the configuration screen when the kubeconfig goes missing vs just showing an error dialog when they can't connect to the cluster. But we can worry about this once we find such clients.

Copy link
Copy Markdown
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left few comments comments/suggestion but PR mostly LGTM.

Comment thread lib/srv/alpnproxy/kube.go Outdated
Comment thread lib/teleterm/gateway/config.go Outdated
Comment thread lib/teleterm/gateway/gateway_kube.go
Comment thread lib/teleterm/api/uri/uri.go Outdated
//
// The tlsca.RouteToDatabase.Database field is skipped, as it's an optional field and gateways can
// change their Config.TargetSubresourceName at any moment.
func (g *Gateway) RouteToDatabase() tlsca.RouteToDatabase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope of this PR but wonder what would you say about defining a Gateway Interface with BasicGateway implementing that can be extended by KubeGateway and DatabaseGateway:

type Gateway interface {
	Serve() error
	Close() error
	ReloadCert() error
	URI() uri.ResourceURI
	TargetURI() string
	TargetName() string
	Protocol() string
	TargetUser() string
	TargetSubresourceName() string
	SetTargetSubresourceName(value string)
	Log() *logrus.Entry
	LocalAddress() string
	LocalPort() string
	LocalPortInt() int
	Config() Config
	CLICommand() (*api.GatewayCLICommand, error)
}

type BasicGateway struct {
 // implements Gateway 
}

type DatabaseGateway interface {
	BasicGateway
	RouteToDatabase() tlsca.RouteToDatabase
}

type KubeGateway interface {
	BasicGateway
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to do this in my next PR. this interface would make it easier to mock gateway.

Comment on lines +85 to +87
case targetURI.IsKube():
if err := gateway.makeLocalProxiesForKube(listener); err != nil {
return nil, trace.Wrap(err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now during gateway creation from Cluster object we are calling
"c.GetDatabase(ctx, params.TargetURI)" independent of targetURI type so if possible the type check should be done only in one place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster has to issue different certs and pass different parameters for kube vs db. So it's hard to avoid this check on separate levels.

An alternative that makes it slightly easier to read is to define a gateway type. but it doesn't eliminate the switch tho so i didn't do it.

Comment on lines +29 to +34
type kubeCertReissuer struct {
cert atomic.Value
onExpiredCert func(context.Context) error
}

func newKubeCertReissuer(cert tls.Certificate, onExpiredCert func(context.Context) error) *kubeCertReissuer {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unclear for me why is the difference between kubeCertReissuerand *GatewayCertReissuer) ReissueCert

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gateway and cluster are injected along the way when GatewayCertReissuer callback is called. so at kubeCertReissuer level just trying to hide those. I am open to any suggestions though. I do feel bad when tracing this roundtrip.

@greedy52 greedy52 requested a review from smallinsky July 6, 2023 17:50
@greedy52 greedy52 enabled auto-merge July 10, 2023 12:58
@greedy52 greedy52 added this pull request to the merge queue Jul 10, 2023
Merged via the queue into master with commit a86283e Jul 10, 2023
@greedy52 greedy52 deleted the STeve/26836_connect_kube_gateway_go branch July 10, 2023 13:16
ravicious pushed a commit that referenced this pull request Jul 11, 2023
* Connect Kube gateway part 1: lib/teleterm/gateway

* fix lint

* move IsDB/IsKube to resource URI

* address review comments

* config dir

* use ProfileDir instead of ConfigDir

* remove NewKubeForwardProxyWithListener
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kubernetes size/lg teleport-connect Issues related to Teleport Connect. tls-routing Issues related to TLS routing tsh tsh - Teleport's command line tool for logging into nodes running Teleport.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants