Connect Kube gateway part 1: lib/teleterm/gateway#28312
Conversation
ravicious
left a comment
There was a problem hiding this comment.
I'm yet to read through the whole PR, submitting what I got so far.
| } | ||
|
|
||
| config := kubeconfig.CreateLocalProxyConfig(clientcmdapi.NewConfig(), values) | ||
| return trace.Wrap(kubeconfig.Save(g.KubeconfigPath(), *config)) |
There was a problem hiding this comment.
teleport/lib/teleterm/gateway/gateway_kube.go
Lines 34 to 39 in 77579c9
Since this PR is only part 1, it's hard to judge this particular decision. Do you plan to adjust this path in part 2 or is it the final path? g.cfg.CertPath is now
c.status.DatabaseCertPathForCluster(c.clusterClient.SiteName, db.GetName())
so I assume at least this bit needs to change.
Potential issues
If we upgrade existing kube tabs in Connect so that they automatically start using this new kube proxy instead of tsh kube login, this change will break the workflow of some users.
Another issue I see is that this adds a new file in the tsh home dir. Is that the plan for regular tsh proxy kube as well? My concern here is that if lib/teleterm adds a new file in the tsh home dir, the rest of the codebase will not be aware of it which might lead to conflicts or other code making wrong assumptions about the layout of the home dir.
Preserving backwards compatibility
Assuming that tsh proxy kube will not do this and thus Connect will be the only place needing to store this specific kubeconfig somewhere, I think we can solve this by utilizing what Connect is already doing with kubeconfigs. This will guarantee that the workflow of existing users doesn't break with this change.
When you open a new kube tab in Connect, we generate a new relative path to kubeconfig with a unique identifier and store it on the document object:
The same relative path is then stored on the connection object (responsible for the item in the list in the top left in Connect). So as long as you don't log out or manually remove the connection, connecting to the same kube cluster will always use that predetermined kubeconfig location.
What we could do is send that relative path to tshd when a gateway is being created. The problem is, tshd doesn't know the full path for the folder with kubeconfigs. We don't want the Electron app to send the full path with the request to create the gateway because this could serve as an attack vector.
Instead, we could send the dir with kube config during the start of the daemon, similar to how we send other directories:
teleport/web/packages/teleterm/src/mainProcess/runtimeSettings.ts
Lines 64 to 71 in 794e0d4
Coincidentally, the dir with kube configs is available in runtimeSettings.ts as getKubeConfigsDir, we just have to make sure to cache the result and not call the function twice.
If tshd knew where this dir is, it could then accept the relative path from the Electron app when creating a gateway, along with verifying that it's indeed relative.
There was a problem hiding this comment.
If we upgrade existing kube tabs in Connect so that they automatically start using this new kube proxy instead of
tsh kube login, this change will break the workflow of some users.
I haven't thought through this but maybe we should. The document, terminal types will be different now. How would this migration work?
Another issue I see is that this adds a new file in the tsh home dir. Is that the plan for regular tsh proxy kube as well? My concern here is that if lib/teleterm adds a new file in the tsh home dir, the rest of the codebase will not be aware of it which might lead to conflicts or other code making wrong assumptions about the layout of the home dir.
I don't see an issue putting in tsh home dir, as tsh proxy kube creates a temp kubeconfig there too. Though I would say it's definitely better to put it in a "Connect" kube dir separately from tsh home. Curious what's the reason behind the unique ID? the resourceURI (or the teleport cluster + kube cluster combo) should be unique. All terminals of the same kube cluster should share the config too.
There was a problem hiding this comment.
The document, terminal types will be different now. How would this migration work?
I mentioned it briefly in your proof of concept. #27287 (comment)
Because the new document type is so similar to the old one, we should be able to transform one into the other without requiring any input from the user. This would mean that when loading the state from disk, we'd need to inspect the documents first and then perform necessary changes.
We don't have a mechanism in place, so that's why in the comment I linked above I mentioned leaving the migration for Grzegorz and me. ;)
I don't see an issue putting in
tshhome dir, astsh proxy kubecreates a temp kubeconfig there too. Though I would say it's definitely better to put it in a "Connect" kube dir separately from tsh home.
If tsh proxy kube already puts stuff there then I think it's fine for Connect to use the same path. Connect uses its own tsh home for now.
Curious what's the reason behind the unique ID? the resourceURI (or the teleport cluster + kube cluster combo) should be unique. All terminals of the same kube cluster should share the config too.
@gzdunek Do you remember why we add unique IDs to kubeconfigs? AFAIR we thought kube clusters might not be unique within a Teleport cluster.
Another reason that comes to my mind is that we put the kubeconfigs under <appPath>/kube/<root cluster>, so if two leafs have a kube cluster with the same name, it'd cause problems.
In any case, I think it's fine if we want to change the location of kubeconfigs. We just need to make sure that the existing setups people have will not break.
By existing setups I mean two things:
- People who have the old document type in app_state.json, meaning if they restart Connect and restore tabs, Connect is going to restore the old document type with
tsh kube login. - People who share kubeconfig created by Connect with 3rd party apps. I'm not even sure if there are people who figured out they can do this as we don't explain this clearly anywhere.
One thing we could do is keep the existing kube login implementation of kube tabs, but make it so that clicking "Connect" from the resource table opens the new implementation. This way the existing tabs will keep working as expected.
All tsh kube login does is prepares a special kubeconfig stanzas, right? But once you run it once, theoretically you can use that kubeconfig forever, don't you? If there's a person who uses Teleport Connect and shares kubeconfig generated by us with a 3rd party app, this setup should continue to work until they manually log out of the cluster (since logging out removes kube configs created by Connect). So it's not a use case we have to worry about.
In the next major version after kube proxies get released in Connect, we could deprecate the old document implementation. Instead of having it work as usual, it could say something like "This tab uses a kube connection method that is deprecated. Please use the button below to lorem ipsum and migrate any 3rd party apps away from using <old kubeconfig location>".
We'd need to update docs to clearly express that running Connect is now required for the kubeconfig to work. We should also update the deprecated parts of kube tabs in the code to clearly mark them as deprecated.
This way we also don't have to write any migration code that I mentioned at the beginning of this comment. Doing so would be probably a bad choice, as the kube login document has totally different behavior than the kube proxy document. Even if we kept the kubeconfig locations intact, the new implementation would require Connect to be running, while the old one doesn't require that.
There was a problem hiding this comment.
Another reason that comes to my mind is that we put the kubeconfigs under
<appPath>/kube/<root cluster>, so if two leafs have a kube cluster with the same name, it'd cause problems.
Yeah, that's the reason - we keep kubeconfigs for root and leaves in the same "root" directory, so there could be a name collision.
Btw I like the idea of having a new type of kube document and deprecating the old one - it sounds much easier than documents migration.
There was a problem hiding this comment.
Thanks for the explanation!
I've decided to pass in a ConfigDir to the gateway and use keypaths to construct the kubeconfig path:
- The path must persist through logins so I think it's better to use a separate path from tsh home.
ConfigDirso that it can be used for other things like database access specific local configs, app access local ca, etc.
I cheated using keypaths. But we can certainly design a naming convention just for Connect instead of using keypaths if wanted. In the next change, I can pass in a dir to the daemon as suggested above.
Let me know what you think about this!
There was a problem hiding this comment.
The path must persist through logins so I think it's better to use a separate path from tsh home.
Why does it need to persist through logins?
In the next change, I can pass in a dir to the daemon as suggested above.
If by "as suggested above" you mean the "Preserving backwards compatibility" section of my OG comment, then that's not necessary I think.
To sum up the thread:
- We want to keep both kube tab implementations until the next major version where we'll deprecate the old implementation, as described in my previous comment.
- Since the new kube tab implementation is separate from the old one, we don't need to preserve backwards compatibility of the kubeconfig location.
- tsh already stores kubeconfig in tsh home dir, so it's fine for Connect to do the same.
Given all this, I don't see what passing ConfigDir would help with. Database access specific local configs etc. are all hypotheticals at the moment, as far as I understand.
We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.
There was a problem hiding this comment.
Why does it need to persist through logins?
What I meant is the file has to persist during re-login. During re-login, teleport client cleans the kube keys folder. tsh proxy kube (CLI) has to rewrite the config:
teleport/tool/tsh/common/kube_proxy.go
Lines 477 to 481 in 8abbea6
We can either do the same for Teleport Connect or just use a separate directory.
"as suggested above"
What I meant by "as suggested above" is to pass on a directory through parameters when creating the daemon.
Database access specific local configs etc.
For example, oracle generates some local files for connection:
teleport/tool/tsh/common/db.go
Lines 324 to 333 in 8abbea6
We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.
I can certainly do this. As mentioned above, just have to rewrite the config after cert reissue. I missed this part in my previous implementation. Let me know if you prefer this way or use a separate directory.
There was a problem hiding this comment.
Thanks for the explanation, now I understand what the purpose behind ConfigDir is.
One thing we need to keep in mind is that in the future we'd probably want to make Connect use ~/.tsh, sharing it with tsh. I don't know if storing kubeconfigs outside of tsh home dir will make it easier or harder. ;)
Given your explanation with keeping kubeconfigs around on relogin, I don't have any strong preference for one way or another.
There was a problem hiding this comment.
Oh, I forgot to add, I think using keypaths makes sense.
teleport/api/utils/keypaths/keypaths.go
Lines 312 to 314 in 6140bdc
The /keys/ part is going to look a bit weird in a standalone folder. But if we decided not to use it then we'd have to recreate pretty much half of the keypaths functionality anyway.
If we stick to keypaths, we also increase the chances of compatibility with other parts of our tooling. That is, if tsh uses keypaths and stores something in tsh home dir and we use keypaths but store something outside, sharing code between the two will be somewhat easier as only baseDir will change.
Though I suppose coming up with a name for Connect's baseDir is going to be challenging. I wasn't able to come up with anything good. 🤔
There was a problem hiding this comment.
One thing we need to keep in mind is that in the future we'd probably want to make Connect use ~/.tsh, sharing it with tsh.
I think that sharing will break other things too like tsh db logout will erase db certs used by the gateway.
Though I suppose coming up with a name for Connect's baseDir is going to be challenging. I wasn't able to come up with anything good. 🤔
Sorry for going back-and-force on this. I end up passing in the tsh profile/home dir to the gateway and letting gateway use keypaths, and I added the logic to rewrite the kubeconfig on relogin. Just too lazy to "manage" baseDir. And this way it's more aligned with tsh.
ravicious
left a comment
There was a problem hiding this comment.
The PR looks good overall, we just need to make a decision on ConfigDir.
Thanks for adding the comments, they're very helpful and detailed!
| // Insecure | ||
| Insecure bool | ||
| // ClusterName is the Teleport cluster name | ||
| ClusterName string |
There was a problem hiding this comment.
ClusterName doesn't seem to be set anywhere outside of the tests.
| } | ||
|
|
||
| config := kubeconfig.CreateLocalProxyConfig(clientcmdapi.NewConfig(), values) | ||
| return trace.Wrap(kubeconfig.Save(g.KubeconfigPath(), *config)) |
There was a problem hiding this comment.
The path must persist through logins so I think it's better to use a separate path from tsh home.
Why does it need to persist through logins?
In the next change, I can pass in a dir to the daemon as suggested above.
If by "as suggested above" you mean the "Preserving backwards compatibility" section of my OG comment, then that's not necessary I think.
To sum up the thread:
- We want to keep both kube tab implementations until the next major version where we'll deprecate the old implementation, as described in my previous comment.
- Since the new kube tab implementation is separate from the old one, we don't need to preserve backwards compatibility of the kubeconfig location.
- tsh already stores kubeconfig in tsh home dir, so it's fine for Connect to do the same.
Given all this, I don't see what passing ConfigDir would help with. Database access specific local configs etc. are all hypotheticals at the moment, as far as I understand.
We might as well use client.ProfileStatus.KubeConfigPath. The path would have to be passed to the gateway somehow since it doesn't have access to ProfileStatus AFAIR.
| } | ||
|
|
||
| g.cleanupFuncs = append(g.cleanupFuncs, func() error { | ||
| return trace.Wrap(utils.RemoveFileIfExist(g.KubeconfigPath())) |
There was a problem hiding this comment.
What's the reason behind removing the kubeconfig when gateway closes? I know that the kubeconfig will work only as long as the gateway is up. From the UX perspective though, I wonder if leaving the kubeconfig wouldn't be better.
I'm thinking of a situation where the user doesn't know that Connect needs to be open for the kubeconfig to work. So they close Connect and either:
- Connect removes the kubeconfig. The user tries to use a third-party tool by supplying a path to the kubeconfig, but they get a "Not found" error. At this point they probably wonder why the kubeconfig got removed.
- Connect keeps the kubeconfig. The user tries to use a third-party tool, but they get a "Can't establish connection to localhost:45678". At this point they might wonder what is localhost:45678.
(These are just my assumptions as to how those 3rd party apps might work, do you know perhaps how they behave in those situations?)
Now that I wrote it out like this, removing the kubeconfig might be a better idea. The kubeconfig path is the only part of the "API" the user can easily tie back to Connect, since they got the kubeconfig from Connect. The same cannot be said about a random port on localhost (which they won't even see in the app).
But I wonder what you think about it.
There was a problem hiding this comment.
not found error kubectl example:
$ KUBECONFIG=not_found kubectl get pod
W0705 15:54:05.352504 28156 loader.go:222] Config not found: not_found
E0705 15:54:05.355350 28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.355788 28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.356873 28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.358003 28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0705 15:54:05.359051 28156 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
local proxy down kubectl example:
The connection to the server teleport.dev.aws.stevexin.me:443 was refused - did you specify the right host or port?
The reason I added the cleanup is to avoid keeping unused files around forever, since I added ConfigDir. If it's in tsh home, it will get cleanup eventually.
There was a problem hiding this comment.
I think the output you posted cements the idea that removing the kubeconfig is better than keeping it around. Though I still worry that we might run into 3rd party clients that completely blow up or force you back to the configuration screen when the kubeconfig goes missing vs just showing an error dialog when they can't connect to the cluster. But we can worry about this once we find such clients.
smallinsky
left a comment
There was a problem hiding this comment.
I have left few comments comments/suggestion but PR mostly LGTM.
| // | ||
| // The tlsca.RouteToDatabase.Database field is skipped, as it's an optional field and gateways can | ||
| // change their Config.TargetSubresourceName at any moment. | ||
| func (g *Gateway) RouteToDatabase() tlsca.RouteToDatabase { |
There was a problem hiding this comment.
Out of scope of this PR but wonder what would you say about defining a Gateway Interface with BasicGateway implementing that can be extended by KubeGateway and DatabaseGateway:
type Gateway interface {
Serve() error
Close() error
ReloadCert() error
URI() uri.ResourceURI
TargetURI() string
TargetName() string
Protocol() string
TargetUser() string
TargetSubresourceName() string
SetTargetSubresourceName(value string)
Log() *logrus.Entry
LocalAddress() string
LocalPort() string
LocalPortInt() int
Config() Config
CLICommand() (*api.GatewayCLICommand, error)
}
type BasicGateway struct {
// implements Gateway
}
type DatabaseGateway interface {
BasicGateway
RouteToDatabase() tlsca.RouteToDatabase
}
type KubeGateway interface {
BasicGateway
}
There was a problem hiding this comment.
I'll try to do this in my next PR. this interface would make it easier to mock gateway.
| case targetURI.IsKube(): | ||
| if err := gateway.makeLocalProxiesForKube(listener); err != nil { | ||
| return nil, trace.Wrap(err) |
There was a problem hiding this comment.
Right now during gateway creation from Cluster object we are calling
"c.GetDatabase(ctx, params.TargetURI)" independent of targetURI type so if possible the type check should be done only in one place.
There was a problem hiding this comment.
Cluster has to issue different certs and pass different parameters for kube vs db. So it's hard to avoid this check on separate levels.
An alternative that makes it slightly easier to read is to define a gateway type. but it doesn't eliminate the switch tho so i didn't do it.
| type kubeCertReissuer struct { | ||
| cert atomic.Value | ||
| onExpiredCert func(context.Context) error | ||
| } | ||
|
|
||
| func newKubeCertReissuer(cert tls.Certificate, onExpiredCert func(context.Context) error) *kubeCertReissuer { |
There was a problem hiding this comment.
It is a bit unclear for me why is the difference between kubeCertReissuerand *GatewayCertReissuer) ReissueCert
There was a problem hiding this comment.
gateway and cluster are injected along the way when GatewayCertReissuer callback is called. so at kubeCertReissuer level just trying to hide those. I am open to any suggestions though. I do feel bad when tracing this roundtrip.
* Connect Kube gateway part 1: lib/teleterm/gateway * fix lint * move IsDB/IsKube to resource URI * address review comments * config dir * use ProfileDir instead of ConfigDir * remove NewKubeForwardProxyWithListener
Part of
Implementation based on poc #27287. Decided to scratch the refactor attempt #27685 and stick with the current design.
This is part 1 of the actual implementation. This part covers:
lib/teleterm/api/urilib/teleterm/gatewayThe next part will cover the rest of the Golang side of changes.