RFD 207: VNet SSH by nklaassen · Pull Request #53370 · gravitational/teleport

nklaassen · 2025-03-24T23:28:22Z

No description provided.

espadolini · 2025-03-25T15:12:25Z

+In this callback is where it will attempt to dial the target server with the
+requested username, and complete the per-session MFA ceremony if necessary.
+By doing this in the callback we can return an authentication error to the user
+if they fail to authenticate to the target server.


This is more of an implementation detail that doesn't necessarily need to be mentioned in the RFD, but this actually supports dropping into keyboard-interactive if the remote server is configured for publickey,keyboard-interactive in a much better way than our current forwarding (some customers have asked about better support for PAM authentication modules, and x/crypto/ssh doesn't quite have APIs flexible enough for that if the connecting client is trying to use publickey first).

ravicious · 2025-03-26T16:34:41Z

+Next to the usual "Connect" button on the SSH server card, she sees a new VNet
+button.


I don't know if we have enough space to put another button there. 😅

We could use this style of button that Sakshyam recently added for SAML apps. We'd have a three dot menu with "Connect with VNet" as the only extra option.

Yeah I was actually thinking of something like that, i think it's a good idea. would be nice to have some type of button on the server card for discoverability

Perhaps we should run this by the design team? cc @roraback

ravicious · 2025-03-26T16:53:43Z

+   1. VNet will apply any matching proxy templates found in the user's
+      `TELEPORT_HOME`.


This might be a bit of a pain for some users, as I'm sure that people usually configure tsh through ~/.tsh/config/config.yaml which Teleport Connect does not use. But it is something we might tackle in Q2, so it's not that big of a problem.

Yeah I don't think proxy templates are going to be that common to use for this, but I know there are some specific customers that use them heavily, and I wanted to make it at least possible to use them from connect, but idk using TELEPORT_HOME seemed better than the default tsh home

If our end goal is to unify the Connect and tsh config location this is probably fine for the short term. If someone really wants to make proxy templates work they can do so by at least for the short term copying the file.

nklaassen · 2025-04-03T01:11:40Z

+1. Because Alice has not started VNet before, VNet starts for the first time.
+1. The VNet panel in the top left opens and displays:
+   > 🟢 Proxying TCP connections to teleport.example.com<br>
+   > 🔴 VNet SSH is not configured<br>
+   > Open Status Page
+1. A toast pops up in the bottom right and displays:
+   "VNet SSH is not configured, click here (link) to enable VNet SSH"
+
+Clicking either the "Open Status Page" button in the VNet panel or the link
+in the toast both open a new VNet Status tab in Connect.
+This tab will include the existing VNet diagnostic report, and a new section at
+the top on the status of VNet SSH.
+
+The SSH section in the VNet Status page will display:
+
+> VNet SSH lets any SSH client connect to Teleport SSH servers.<br>
+> When you enable VNet SSH, Connect will automatically add configuration options
+> to `~/.ssh/config`.
+>
+> Enable VNet SSH (button)


@ravicious I'm just winging this let me know if you have thoughts 🙏

Why not have it enabled by default and let the user disable individual components if not needed? Do we even need to let them disable individual components at all?

If disabling individual components is something we want to have available, I think it might be nicer to expand the VNet panel with two sections rather than adding extra stuff to the diag report tab. I think it'd feel nicer to be able to control everything from that panel.

The idea was to avoid editing the user's ~/.ssh/config until they explicitly opt-in, I don't want to just do that by default. Do you think we have room to warn them about that before turning on VNet SSH in the VNet panel?

We should be able to add a confirmation dialog for that.

+1 for not touching ~/.ssh/config without explicit permission.

A nice effect of using Include is that it's also pretty easy to detect if the line is still there, and to offer to add it again. It should also not be a requirement to do so to enable the server IMO, ssh -F ~/path/to/vnet_ssh_config will work just as well.

The idea is everything in VNet will actually be enabled by default, ssh -F ~/path/to/vnet_ssh_config should work as Edoarda mentioned, Connect will just give you the option to automatically add the Include to ~/.ssh/config

One could think of presence of the Include line in ~/.ssh/config as a sort of diagnostic, with an option to remedy that problem with a button on the diagnostic page. For the confirmation dialog you mentioned @ravicious I'm not sure when we'd open that

Ah sorry, my messages were not clear at all, I just noticed that. 🤦

If we have to make the SSH feature opt-in, then instead of putting it in the diag report page, we could have it just in the VNet panel. I imagine the toggle could look somewhat like this:

I didn't want to call it "VNet SSH" because that would be the third time this panel says "VNet". If VNet TCP should also be configurable, then we could replace that status indicator with a toggle maybe.

Now, I imagine that when Alice clicks on "Connect with VNet" next to an SSH node, we could show a notification as you mentioned. The notification would say "VNet SSH needs to be explicitly enabled" or something along those lines. Instead of a link, it'd have a button that says "Enable VNet SSH". When Alice clicks on that button or when she clicks on the VNet SSH toggle, we could then show the confirmation dialog. The dialog would explain that enabling VNet SSH is going to modify ~/.ssh/config. After clicking "Proceed", VNet SSH would be enabled.

One minor problem is the very first start of VNet. For TCP apps, when you click on "Connect" for the first time next to a TCP app, we don't turn on VNet immediately. Instead, we first show the info tab about VNet. From there, you can learn what VNet is and then turn it on. After the first launch, clicking on "Connect" starts VNet immediately without showing the info tab.

For SSH, I think it'd be a bit too much to go from "Connect with VNet" -> the info tab -> turn on VNet -> show notification about VNet SSH -> enable VNet SSH -> finally connect through SSH. But IMHO for SSH we can skip the info tab completely. It makes sense to show it for TCP apps I guess where the flow is a little bit more complex and it might not be obvious what to do next. But for SSH nodes, once you click "Connect with VNet" and see a notification that says "Connect with any SSH client to <address> (copied to clipboard)", I think it's pretty clear what the user should do next. Besides, the user might click "Connect" next to a TCP app without knowing what VNet is. For SSH, it's unlikely that someone is going to open the three dot menu and then click "Connect with VNet" without having an idea what VNet is.

ravicious · 2025-04-08T16:39:22Z

+1. Because Alice has not started VNet before, VNet starts for the first time.
+1. The VNet panel in the top left opens and displays:
+   > 🟢 Proxying TCP connections to teleport.example.com<br>
+   > 🔴 VNet SSH is not configured<br>
+   > Open Status Page
+1. A toast pops up in the bottom right and displays:
+   "VNet SSH is not configured, click here (link) to enable VNet SSH"
+
+Clicking either the "Open Status Page" button in the VNet panel or the link
+in the toast both open a new VNet Status tab in Connect.
+This tab will include the existing VNet diagnostic report, and a new section at
+the top on the status of VNet SSH.
+
+The SSH section in the VNet Status page will display:
+
+> VNet SSH lets any SSH client connect to Teleport SSH servers.<br>
+> When you enable VNet SSH, Connect will automatically add configuration options
+> to `~/.ssh/config`.
+>
+> Enable VNet SSH (button)


Why not have it enabled by default and let the user disable individual components if not needed? Do we even need to let them disable individual components at all?

If disabling individual components is something we want to have available, I think it might be nicer to expand the VNet panel with two sections rather than adding extra stuff to the diag report tab. I think it'd feel nicer to be able to control everything from that panel.

ravicious · 2025-04-08T16:40:25Z

+
+Clicking either the "Open Status Page" button in the VNet panel or the link
+in the toast both open a new VNet Status tab in Connect.
+This tab will include the existing VNet diagnostic report, and a new section at


Are there any diagnostics we can add for VNet SSH that could be included in the report?

I think "whether or not the Include directive is in ~/.ssh/config" is kind of a diagnostic, not sure what else we could really do though

ravicious · 2025-04-08T16:43:46Z

+If per-session MFA is required, Connect will prompt the user to tap their
+hardware key.
+
+#### User story - VSCode remote development


Now that there's a marketing tab for VNet in Connect, it'd be good to have something about SSH in there too. This use case in particular could be nice – it's shows what is possible rather than just describing how it works (which might be better suited for docs). We'd just need to have this use case described somewhere so that we can direct the user there if they're interested in it. A section in the docs would be enough. A blog post would be good too, but that requires much more effort.

espadolini · 2025-04-09T10:18:58Z

+1. VNet will apply any matching proxy templates found in the user's
+   `TELEPORT_HOME`.
+1. VNet will query the matching cluster to see if any SSH servers have a
+   matching SSH server using the `ResolveSSHTarget` rpc.
+1. If a match is found a free IP will be assigned to that SSH server and it
+   will be returned in an authoritative DNS answer.
+1. If no matching server is found a `NXDOMAIN` response will be returned with
+   no cache TTL.


How quickly can this react to configuration changes, both in the proxy templates and in the cluster? It'd be nice if we were in control of how long a hostname to hostID mapping will last, but I don't think it's possible if we want to respond NXDOMAIN to unknown hostnames (and the alternative is to reject connections if there's no longer a pairing, which is confusing to the user although it's probably the same thing that happens if a host goes down?)

In the latest revision, we never query for host IDs (unless a proxy template requires it) we always dial by hostname so we can react ~instantly

espadolini · 2025-04-18T10:14:26Z

+She gets an error message from `ssh` on the CLI reading
+`ERROR: access denied to root connecting to devbox`.


Can we return such a detailed error message to openssh during ssh auth with x/crypto/ssh?

(please don't say "we act like the authentication succeeded and the session was opened and then we write on stderr in the session")

After some silly checks, for openssh ssh we can either display a banner, which is likely not going to be surfaced well in GUI clients but should be always displayed by ssh in a terminal, or we can fork x/crypto to allow using a custom disconnect message which is probably going to be more visible in all clients but is formatted kind of weird:

Received disconnect from [<ipv6 addr>] port 22:<integer>: <error message> Disconnected from <ipv6 addr> port 22

I know @zmb3 would very much like it if we could return more actionable error messages when the SSH handshake fails. Any errors stemming from device trust issues exposed to users are very opaque and unhelpful message.

yeah unfortunately this was just an example error message i stole from tsh ssh and forgot to double check, the real error message displayed by openssh clients is more likely to be root@devbox: Permission denied (publickey). unless we hack an error message into the banner or fork x/crypto/ssh (the ideas you mentioned in the other thread)

these are really just 2 different ways of displaying the same error though, the tsh ssh error doesn't have any additional information, we just format it "nicer" on any ssh auth error

teleport/lib/client/client.go

Lines 339 to 350 in e0030e5

if utils.IsHandshakeFailedError(err) {

conn.Close()

// TODO(codingllama): Improve error message below for device trust.

// An alternative we have here is querying the cluster to check if device

// trust is required, a check similar to `IsMFARequired`.

log.InfoContext(ctx, "Access denied connecting to host",

"login", sshConfig.User,

"target_host", nodeName,

"error", err,

)

return nil, trace.AccessDenied("access denied to %v connecting to %v", sshConfig.User, nodeName)

}

we can't really do much better than this without some api changes i guess

espadolini · 2025-04-18T10:23:46Z

+1. The existing TCP app lookup will run first, if the address matches a TCP app
+   then the IP will be permanently assigned to that app and regular TCP app
+   forwarding will take over. This matches the current VNet behavior where TCP
+   app matches are permanent.


Don't we also know the destination port of the TCP connection? Can't we take that into account for our decision?

We could take it into account for sure, although it would be a breaking change for anyone currently connecting to a TCP app on port 22

espadolini · 2025-04-18T10:33:05Z

+1. VNet will set `ssh.ServerConfig.PublicKeyCallback` to a function that:
+   1. Validates that the user public key matches `id_vnet.pub` mentioned in [SSH client configuration](#ssh-client-configuration)
+   1. Attempts an SSH connection to the target with and without session MFA. If
+      both fail, an authentication error is returned. This is done in
+      `PublicKeyCallback` so that authentication failures to the target are returned
+      as authentication failures to the client. Optimistically we can  send
+      more informative errors to Connect.


This doesn't actually work that well; the PublicKeyCallback can be called an arbitrary amount of times and can end up being called with a key that the client is not actually in control of, and you're only going to know that after finishing the authentication. 😢 golang/go#70795 would fix it but it's sort of stuck while the x/crypto/ssh maintainers figure out the v2 api.

Thankfully it's not a very big deal to optimistically open the connection (once, obviously) to the remote server and just close it if the client doesn't prove itself to be in control of the id_vnet private key.

That being said, we can also fork x/crypto/ssh since this is something that runs in our binary rather than being in the api module (maybe to also add a way to return a custom authn error message?)

Are there any alternatives to leveraging the PublicKeyCallback like this? Is this purely to provide more useful error messaging to the end user?

No, this is about the fact that PublicKeyCallback gets called even when the client presents just a public key without any proof of possession.

It's not a particularly big deal to open the connection immediately tho, this is something that could kinda be exploited by an attacker on the local machine to open connections to SSH servers just to have them closed immediately - and at no big cost to anything in the cluster even, since the connection will just get stuck at version negotiation until the local client proves it's in possession of the vnet private key, so the only cost is a bit of network and some logging, not even any cpu usage due to encryption.

And it's not like the public key is going to be broadcast to the world, it's not treated as sensitive data but it's also something that you'd need a bit of shenanigans to obtain in the first place.

@rosstimothy yeah we don't have to do this in PublicKeyCallback it was just an idea so that we could return authentication errors to openssh clients when the user actually fails to auth to the teleport node

we could dial the target node in BannerCallback instead to display the error as a banner, it feels sketchy but i guess it's no less sketchy than doing it in PublicKeyCallback where the private key has not been verified yet

although at this point i'm thinking we just shouldn't shove teleport errors into the SSH protocol, and shouldn't dial the node before the client authenticates with the private key, instead we can send whatever error message we want to Connect in a "notification" the same way we currently notify users when they connect to an invalid port of a multi-port TCP app

We don't need BannerCallback, the Cool™ way is to get a ServerPreAuthConn through PreAuthConnCallback and then we can send an error message in a banner at any time before the authentication completes.

I think that the best UX would be to display the error in both a banner (so it shows up in the terminal that the user is interacting with) and in a connect notification (to not swallow errors if the connection is happening somewhere other than in a terminal).

I agree in principal that showing better and more informative SSH errors is something we should do. However, it's not a problem limited to VNet SSH. If we can tackle error handling while working on VNet SSH that's great, but I don't think the internals of how that happens need to be ironed out in this RFD.

viktor-doyensec-teleport · 2025-05-08T10:03:39Z

Hey team! I just finished my review of the RFD.

Taking into consideration the fact that the new functionality leverages the existing SSH tunneling implementation, the changes shouldn't introduce any new threat surface.

The RDF looks good to me!

github-actions Bot requested review from Joerger and strideynet March 24, 2025 23:28

github-actions Bot added rfd Request for Discussion size/md labels Mar 24, 2025

RFD 207: VNet SSH

26d49ea

nklaassen force-pushed the rfd/0207-vnet-ssh branch from 88dfaa0 to 26d49ea Compare March 25, 2025 02:42

rosstimothy requested review from espadolini, ravicious, rosstimothy and zmb3 March 25, 2025 13:34

zmb3 reviewed Mar 25, 2025

View reviewed changes

Comment thread rfd/0207-vnet-ssh.md Outdated

Comment thread rfd/0207-vnet-ssh.md Outdated

espadolini reviewed Mar 25, 2025

View reviewed changes

ravicious reviewed Mar 26, 2025

View reviewed changes

edits

a318992

rosstimothy mentioned this pull request Mar 31, 2025

Ansible Integration with Teleport's Per-Session MFA #30197

Open

Joerger reviewed Mar 31, 2025

View reviewed changes

Comment thread rfd/0207-vnet-ssh.md

add ~/.ssh/config configuration

36e3570

nklaassen requested a review from espadolini April 3, 2025 01:10

nklaassen commented Apr 3, 2025

View reviewed changes

ravicious reviewed Apr 8, 2025

View reviewed changes

espadolini reviewed Apr 9, 2025

View reviewed changes

nklaassen added 4 commits April 11, 2025 17:49

use include directive and GlobalKnownHostsFile

93ecc15

wip

cb7979e

ssh config edits

1936020

update connection sequence

8f22902

nklaassen added the no-changelog Indicates that a PR does not require a changelog entry label Apr 17, 2025

fix lint

20a8448

nklaassen requested a review from espadolini April 18, 2025 01:07

espadolini approved these changes Apr 18, 2025

View reviewed changes

rosstimothy approved these changes Apr 22, 2025

View reviewed changes

add considerations for web apps

c40182a

nklaassen mentioned this pull request May 5, 2025

[vnet] feat: add transparent SSH proxy #54525

Merged

simplify handling for non-matching addrs

faea3b6

nklaassen enabled auto-merge May 15, 2025 00:56

nklaassen added this pull request to the merge queue May 15, 2025

Merged via the queue into master with commit d06a5a7 May 15, 2025
39 of 40 checks passed

nklaassen deleted the rfd/0207-vnet-ssh branch May 15, 2025 01:09

ravicious mentioned this pull request Jun 12, 2025

[vnet] feat: add SSH configuration diagnostic #55594

Merged

rosstimothy mentioned this pull request Sep 29, 2025

Provide error message when host user creation is not allowed in roles #40370

Closed

		Next to the usual "Connect" button on the SSH server card, she sees a new VNet
		button.

		1. VNet will apply any matching proxy templates found in the user's
		`TELEPORT_HOME`.

		She gets an error message from `ssh` on the CLI reading
		`ERROR: access denied to root connecting to devbox`.

	if utils.IsHandshakeFailedError(err) {
	conn.Close()
	// TODO(codingllama): Improve error message below for device trust.
	// An alternative we have here is querying the cluster to check if device
	// trust is required, a check similar to `IsMFARequired`.
	log.InfoContext(ctx, "Access denied connecting to host",
	"login", sshConfig.User,
	"target_host", nodeName,
	"error", err,
	)
	return nil, trace.AccessDenied("access denied to %v connecting to %v", sshConfig.User, nodeName)
	}

Conversation

nklaassen commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ravicious Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rosstimothy Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ravicious Apr 22, 2025 •

edited

Loading

rosstimothy Apr 18, 2025 •

edited

Loading