Skip to content

Teleport Connect Kube gateway POC#27287

Closed
greedy52 wants to merge 2 commits intomasterfrom
STeve/26836_connect_kube_local_proxy
Closed

Teleport Connect Kube gateway POC#27287
greedy52 wants to merge 2 commits intomasterfrom
STeve/26836_connect_kube_local_proxy

Conversation

@greedy52
Copy link
Copy Markdown
Contributor

@greedy52 greedy52 commented Jun 2, 2023

Related PR #26836

Note this is a POC only. Code will be better written, tested, and broken to proper PRs once POC concept is approved.

POC overview:

  1. UI flow is almost the same as the current kube flow, with a new DocumentGatewayKube (mostly copied from DocumentGatewayCLICommand):
    • Calls daemon to create gateway on mount
    • Wait screen (copied from DocumentGatewayCLICommand) while the gateway is creating
    • Then opens a terminal, with KUBECONFIG set
    • Can have multiple kube terminals that SHARES the gateway/local proxy.
  2. On daemon side
    • No change to CreateGateway API, daemon creates a local proxy based on targetURI.
    • Kube Gateway returns a KUBECONFIG=<path-to-kubeconfig-for-local-proxy> env var for pty terminal to use. The path is based on the kube user cert path so uniq per kube resource and maintained by the gateway, so it can be shared between kube terminals.
    • Some changes to the gateway object to support forward proxy, different ways to reload cert etc.

@greedy52 greedy52 added kubernetes-access tls-routing Issues related to TLS routing teleport-connect Issues related to Teleport Connect. labels Jun 2, 2023
@greedy52 greedy52 self-assigned this Jun 2, 2023
@greedy52 greedy52 force-pushed the STeve/26836_connect_kube_local_proxy branch 2 times, most recently from 22dd0b5 to a64faca Compare June 2, 2023 18:58
@greedy52 greedy52 force-pushed the STeve/26836_connect_kube_local_proxy branch from a64faca to 37d5420 Compare June 2, 2023 19:25
@greedy52 greedy52 force-pushed the STeve/26836_connect_kube_local_proxy branch from 37d5420 to 5f65a84 Compare June 2, 2023 20:45
@greedy52 greedy52 requested review from gzdunek and ravicious June 5, 2023 14:35
@ravicious
Copy link
Copy Markdown
Member

Can have multiple kube terminals that SHARES the gateway/local proxy.

@gzdunek, to further explain this, the overarching idea was to try to keep the UX of using kubes the same but start a proxy underneath when a new kube tab is opened.

One of the issues with this is that if you restart the app and attempt to reopen multiple kube tabs pointing at the same kube cluster, they will all create a new proxy pointing at the same kube cluster.

To deal with this, we had the idea of making CreateGateway return an existing proxy pointing at the target if one exists. Since there's a mutex, when two tabs race to create the proxy, one will create it and the second call will simply return the existing gateway.

Comment thread lib/teleterm/clusters/cluster_gateways.go
Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that the React part would cause you much more problems but it looks like it was smooth sailing. 😏

I'll continue the review tomorrow, for now I left some comments mostly about that new doc component.

import { retryWithRelogin } from 'teleterm/ui/utils';
import { routing } from 'teleterm/ui/uri';

export const DocumentGatewayKube = (props: {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach with DocumentGatewayKube is fine, however this component doesn't need to follow exactly what DocumentGatewayCliClient does.

In general, DocumentGatewayCliClient expects that another document is going to create the gateway and thus it doesn't attempt to do so itself. For DocumentGatewayKube, you don't need WaitingForGateway with all that timeout logic and so on.

What I think we could do here is take the const [connectAttempt, createGateway] = useAsync part from WaitingForGateway and move it up to DocumentGatewayKube. Then we could also take useEffect and remove that timeout-related stuff from there.

Then when it comes to what this component is actually supposed to render, we could make a switch on connectAttempt. If it's '' or 'processing', I think we could simply return <Document visible={visible} px={2}> with nothing inside – the doc status already handles showing a progress bar.

If it's 'error', we need to show that error somewhere and a retry button which executes createGateway on click. I'd take the Flex from WaitingForGatewayContent and replace its content with this part from DocumentTerminal/Reconnect

<Text typography="h5" color="text.main">
{message}
</Text>
<Flex flexDirection="column" alignItems="center" mx="auto">
<Danger mb={3}>
<Text
textAlign="center"
// pre-wrap to make sure that newlines from any errors are preserved.
css={`
white-space: pre-wrap;
`}
>
{props.attempt.statusText}
</Text>
</Danger>
<ButtonPrimary width="100px" onClick={props.reconnect}>
{buttonText}
</ButtonPrimary>
</Flex>

message could be replaced with, idk, "Could not create a gateway.". The button could say just "Retry".

If the attempt status is 'success', then all that's left is to return DocumentTerminal.

Comment on lines +92 to +94
if (connectAttempt.status === '') {
createGateway();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike DocumentGatewayCliClient, DocumentGatewayKube is responsible for creating the gateway, so if an error happens, we have to remember about updating the status of the doc.

Suggested change
if (connectAttempt.status === '') {
createGateway();
}
(async () => {
const [, error] = await createGateway();
if (!(error instanceof CanceledError)) {
documentsService.update(doc.uri, { status: 'error' });
}
})();

CanceledError is from shared/hooks/useAsync.

s.mu.Lock()
defer s.mu.Unlock()

if gateway, ok := s.findGatewayByTargetURI(params.TargetURI); ok {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for kube gateways, but when it comes to dbs, the user can open multiple gateways for a single TargetURI, they're actually scoped by the TargetURI + db username combo.

For now I'd think we could just make this behavior special for kube gateways and explain why it's done this way.

Comment thread lib/srv/alpnproxy/kube.go
func NewKubeListener(casByTeleportCluster map[string]tls.Certificate) (net.Listener, error) {
// NewKubeListenerWithRandomPort creates a listener for kube local proxy using a
// random localhost port.
func NewKubeListenerWithRandomPort(casByTeleportCluster map[string]tls.Certificate) (net.Listener, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time to look into this closely, but why do kube proxies need a separate *WithRandomPort function whereas db proxies do not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db has a single ALPN local proxy, which should serve the port user specified.

kube (and other HTTP proxies like aws/azure/gcp) has two local proxies, one HTTPS_PROXY (in code it's called forward proxy) and the ALPN local proxy. Client connects through HTTPS_PROXY so it should use the port user specified. The HTTPS_PROXY forwards to the ALPN local proxy where it can use a random port.

The preivous NewKubeListener creates the tcp listener within itself. But gateway code needs to pass in a tcp listener.

@ravicious ravicious self-requested a review June 6, 2023 15:51
throw new Error(`No gateway found for ${doc.targetUri}`);
}

const env = tshdGateway.getCliCommandEnv(gateway.gatewayCliCommand);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to leave a comment that this line is why this stuff actually works – the env here returns KUBECONFIG which is used in the terminal and the KUBECONFIG points at the gateway running underneath.

@ravicious ravicious self-requested a review June 6, 2023 16:02
Copy link
Copy Markdown
Contributor

@gzdunek gzdunek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach makes sense to me, I don't see any major issues.

leafClusterId: string | undefined;
gatewayUri?: uri.GatewayUri;
targetUri: uri.KubeUri;
port?: string;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store port? If I am correct, it is not need to create a kube gateway?

Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the Go code today. I'll try to review the rest of the JS code early next week, but from skimming through it I didn't see any big issues, other than some light cleanup of values that are not necessary for the new document (that Grzegorz already mentioned).

Comment on lines +104 to +107
kubeCluster := uri.New(params.TargetURI).GetKubeName()
if kubeCluster == "" {
return nil, trace.BadParameter("invalid TargetURI %v for Kube gateway", params.TargetURI)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side note, I have an issue open (#15953) to cast strings to URIs on the gRPC handler level instead of "poisoning" the rest of the app with parsing URIs, what I don't have is the time to do this as it'd require some bigger changes throughout lib/teleterm. 😶‍🌫️

case uri.IsDB(targetURI):
return clusters.NewDbcmdCLICommandProvider(s.cfg.Storage, dbcmd.SystemExecer{})
case uri.IsKube(targetURI):
return new(clusters.KubeCLICommandProvider)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about new, I think I usually did something like clusters.KubeCLICommandProvider{}.

func (s *Service) createCLICommandProvider(targetURI string) gateway.CLICommandProvider {
switch {
case uri.IsDB(targetURI):
return clusters.NewDbcmdCLICommandProvider(s.cfg.Storage, dbcmd.SystemExecer{})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made a mistake here, as far as specifying dependencies goes, the CLI command provider should have been initialized outside of daemon.Service, it should be received as a field on Config in New.

Which would mean that the code here would merely return an existing struct instead of making a new one.

Could you leave a todo comment here?

// TODO(ravicious): Construct command providers outside of daemon.Service and pass them as fields of Config in New.

Copy link
Copy Markdown
Contributor Author

@greedy52 greedy52 Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravicious @smallinsky here is my first refactor attempt to decouple dependency a bit:
#27685

Let me know your thoughts on it. Thanks!

return nil, trace.NotFound("gateway is not found: %v", gatewayURI)
}

// findGatewayByTargetURI assumes that mu is already help by a public method
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// findGatewayByTargetURI assumes that mu is already help by a public method
// findGatewayByTargetURI assumes that mu is already held by a public method

// and saves it to disk.
ReissueDBCerts(context.Context, tlsca.RouteToDatabase) error
// TODO
ReissueKubeCerts(context.Context, string) error
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure what do to do about this one.

CertReissuer was added mostly to mock out in tests parts of the code which deal with actual certs.

From the looks of it, a better design might be to have a single method such as ReissueCerts(context.Context, *gateway.Gateway) and pass a different CertReissuer to GatewayCertReissuer when the gateway is created.

The problem is that there's a single GatewayCertReissuer created for the whole daemon.Service, so this would have to be refactored first.

If you don't see a clear way to address this, I think it'd be fine to leave it as is and reconsider the refactor when we'll be adding a third gateway type.

I'm still not 100% sure how to best models this kind of relationship in Go, I don't know if it wants to follow OOP or FP more in this regard. With how interfaces work, I'd have assumed that it should be more similar to OOP, which is what I described earlier in the comment.

}

default:
return trace.NotImplemented("gateway not supported for %v", gateway.URI())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention reloading certs in the error message so that it can be differentiated from the error message returned from clusters.Cluster.CreateGateway.

KeyPath string
// Insecure
Insecure bool
// ClusterName
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments on which this one was based one were left there due to oversight. I think we should just remove them, similar to comments that follow the format of "Foo is foo" because they don't bring any value.

@greedy52
Copy link
Copy Markdown
Contributor Author

greedy52 commented Jun 7, 2023

Hmm, I'm not sure what do to do about this one.

CertReissuer was added mostly to mock out in tests parts of the code which deal with actual certs.

From the looks of it, a better design might be to have a single method such as ReissueCerts(context.Context, *gateway.Gateway) and pass a different CertReissuer to GatewayCertReissuer when the gateway is created.

ya. i don't like doing kube vs db all the places too. I can experiment with some refactoring ideas and share those separately.

I opened the poc mainly I am not very confident on the JS side of changes. And thanks for the review and all the good comments. likely i won't update code here but I will surely address them in the actual PRs.

I'll try to review the rest of the JS code early next week

thanks!

@smallinsky smallinsky self-requested a review June 7, 2023 14:01
@ravicious ravicious self-requested a review June 12, 2023 09:49
Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of things we need to do a bit differently but nothing showstopping.

env: Record<string, string>;
};

export type GatewayKubeCommand = PtyCommandBase & {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GatewayKubeCommand and ShellCommand are in practice almost identical, they both open a new shell. The only difference between them is that GatewayKubeCommand sets KUBECONFIG.

There's a couple of places which should run the same logic when running a terminal, no matter if it's GatewayKubeCommand or ShellCommand, which currently match only on cmd.kind === 'pty.shell'.

Instead of adding checks for cmd.kind === 'pty.gateway-kube', I think a better option might be to extend ShellCommand with env?: Record<string, string>.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall the new type was avoiding refreshTitile from useDocumentTerminal.ts. what do you think

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to avoid refreshing the title for the kube tabs, but they already don't support showing proper cwd today and I don't want to pile too many refactors on you.

Still, I think implementing GatewayKubeCommand as ShellCommand instead is long-term a good idea. We could keep refreshTitle working only for regular terminal tabs by changing the conditional from cmd.kind !== 'pty.shell' to doc.kind !== 'doc.terminal_shell' and adding a TODO comment:

// TODO(ravicious): Enable updating cwd in doc.gateway_kube titles by moving title-updating logic to DocumentsService.

// useDocumentTerminal expects these fields to be set on the doc.
rootClusterId: string;
leafClusterId: string | undefined;
gatewayUri?: uri.GatewayUri;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need gatewayUri either. In DocumentGateway, it is used to perform operations on the gateway (such as changing the port). AFAIK we won't have to perform such operations on the kube gateway.

}

createGatewayKubeDocument(
opts: CreateGatewayKubeDocumentOpts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say let's just inline CreateGatewayKubeDocumentOpts here. Defining them in web/packages/teleterm/src/ui/services/workspacesService/documentsService/types.ts is in my opinion an unnecessary pattern which I'm trying to get rid of. There's no use in moving the function signature outside of the definition of the function, especially if said signature is not reused in any way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expand ShellCommand with env to handle kube gateways, openNewTerminalof DocumentsService should be updated to handle cwd properly.

cwd?: string needs to be added to DocumentGatewayKube first.

  openNewTerminal({
    rootClusterId,
    leafClusterId,
  }: {
    rootClusterId: string;
    leafClusterId: string;
  }) {
    const activeDocument = this.getActive();
    const doc: DocumentPtySession = {
      kind: 'doc.terminal_shell',
      uri: routing.getDocUri({ docId: unique() }),
      title: 'Terminal',
      rootClusterId,
      leafClusterId,
    };

    if (activeDocument && 'cwd' in activeDocument) {
      doc.cwd = activeDocument.cwd;
    }

    this.add(doc);
    this.setLocation(doc.uri);
  }

| DocumentGatewayKube
| DocumentGatewayCliClient
| DocumentTshNode
| DocumentTshKube;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documents are stored on disk in app_state.json – that's how we know what types you had open after you restart the app. After adding DocumentGatewayKube, we should be able to write a migration which rewrites existing DocumentTshKube to DocumentGatewayKube. Leave that migration to Grzegorz and I though, we should just remember to write it before backporting the PR with the kube gateway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Speaking of backward compatibility, Connect should ship with the latest tsh right? Though I recall there is an option to use a different tsh?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect should ship with the latest tsh right?

Yes, Connect always ships with a tsh built from the same commit.

Though I recall there is an option to use a different tsh?

Not really, there's CONNECT_TSH_BIN_PATH but it's for dev purposes only. The final packaged app always uses the tsh that is bundled with it.

export type KubeUri = RootClusterKubeUri | LeafClusterKubeUri;
export type DatabaseUri = RootClusterDatabaseUri | LeafClusterDatabaseUri;
export type ClusterOrResourceUri = ResourceUri | ClusterUri;
export type GatewayTargetUri = KubeUri | DatabaseUri;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move that definition to web/packages/teleterm/src/services/tshd/types.ts, just above the definition of the Gateway type. GatewayTargetUri is not really the same as the other URI types defined here.

I'd also keep it private without the export, at least for now.

gwConn.gatewayUri = doc.gatewayUri;
gwConn.port = doc.port;
gwConn.connected = !!this._clusterService.findGateway(
doc.gatewayUri
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing gatewayUri from the new doc type, we can use ClustersService.findGatewayByConnectionParams instead, similar to how we do this in useDocumentTerminal.

targetSubresourceName?: string;
}

export interface TrackedGatewayKubeConnection extends TrackedConnectionBase {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of different bits of connectionTracker seems correct. However, I don't think we want to add a new type here, instead we should opt to reuse TrackedKubeConnection and make it work with the new doc type.

I assume that after adding kube proxies, we'll want to make it so that when the user updates the app, all of their existing kube tabs will now use the kube gateway.

// targetUri and targetUser are also needed to find a gateway providing the connection to the
// target.
targetUri: tsh.Gateway['targetUri'];
targetUri: uri.DatabaseUri;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the end we should be able to keep it as it was but I'm not 100% sure.

@greedy52 greedy52 deleted the STeve/26836_connect_kube_local_proxy branch August 10, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge kubernetes-access teleport-connect Issues related to Teleport Connect. tls-routing Issues related to TLS routing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants