Skip to content

RFD 207: VNet SSH#53370

Merged
nklaassen merged 10 commits intomasterfrom
rfd/0207-vnet-ssh
May 15, 2025
Merged

RFD 207: VNet SSH#53370
nklaassen merged 10 commits intomasterfrom
rfd/0207-vnet-ssh

Conversation

@nklaassen
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions github-actions Bot requested review from Joerger and strideynet March 24, 2025 23:28
@github-actions github-actions Bot added rfd Request for Discussion size/md labels Mar 24, 2025
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +344 to +347
In this callback is where it will attempt to dial the target server with the
requested username, and complete the per-session MFA ceremony if necessary.
By doing this in the callback we can return an authentication error to the user
if they fail to authenticate to the target server.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an implementation detail that doesn't necessarily need to be mentioned in the RFD, but this actually supports dropping into keyboard-interactive if the remote server is configured for publickey,keyboard-interactive in a much better way than our current forwarding (some customers have asked about better support for PAM authentication modules, and x/crypto/ssh doesn't quite have APIs flexible enough for that if the connecting client is trying to use publickey first).

Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +94 to +95
Next to the usual "Connect" button on the SSH server card, she sees a new VNet
button.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we have enough space to put another button there. 😅

We could use this style of button that Sakshyam recently added for SAML apps. We'd have a three dot menu with "Connect with VNet" as the only extra option.
menu-login-with-action-menu

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was actually thinking of something like that, i think it's a good idea. would be nice to have some type of button on the server card for discoverability

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should run this by the design team? cc @roraback

Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +276 to +277
1. VNet will apply any matching proxy templates found in the user's
`TELEPORT_HOME`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit of a pain for some users, as I'm sure that people usually configure tsh through ~/.tsh/config/config.yaml which Teleport Connect does not use. But it is something we might tackle in Q2, so it's not that big of a problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think proxy templates are going to be that common to use for this, but I know there are some specific customers that use them heavily, and I wanted to make it at least possible to use them from connect, but idk using TELEPORT_HOME seemed better than the default tsh home

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our end goal is to unify the Connect and tsh config location this is probably fine for the short term. If someone really wants to make proxy templates work they can do so by at least for the short term copying the file.

Comment thread rfd/0207-vnet-ssh.md
@nklaassen nklaassen requested a review from espadolini April 3, 2025 01:10
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +84 to +103
1. Because Alice has not started VNet before, VNet starts for the first time.
1. The VNet panel in the top left opens and displays:
> 🟢 Proxying TCP connections to teleport.example.com<br>
> 🔴 VNet SSH is not configured<br>
> Open Status Page
1. A toast pops up in the bottom right and displays:
"VNet SSH is not configured, click here (link) to enable VNet SSH"

Clicking either the "Open Status Page" button in the VNet panel or the link
in the toast both open a new VNet Status tab in Connect.
This tab will include the existing VNet diagnostic report, and a new section at
the top on the status of VNet SSH.

The SSH section in the VNet Status page will display:

> VNet SSH lets any SSH client connect to Teleport SSH servers.<br>
> When you enable VNet SSH, Connect will automatically add configuration options
> to `~/.ssh/config`.
>
> Enable VNet SSH (button)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravicious I'm just winging this let me know if you have thoughts 🙏

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have it enabled by default and let the user disable individual components if not needed? Do we even need to let them disable individual components at all?

If disabling individual components is something we want to have available, I think it might be nicer to expand the VNet panel with two sections rather than adding extra stuff to the diag report tab. I think it'd feel nicer to be able to control everything from that panel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to avoid editing the user's ~/.ssh/config until they explicitly opt-in, I don't want to just do that by default. Do you think we have room to warn them about that before turning on VNet SSH in the VNet panel?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to add a confirmation dialog for that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not touching ~/.ssh/config without explicit permission.

A nice effect of using Include is that it's also pretty easy to detect if the line is still there, and to offer to add it again. It should also not be a requirement to do so to enable the server IMO, ssh -F ~/path/to/vnet_ssh_config will work just as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is everything in VNet will actually be enabled by default, ssh -F ~/path/to/vnet_ssh_config should work as Edoarda mentioned, Connect will just give you the option to automatically add the Include to ~/.ssh/config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could think of presence of the Include line in ~/.ssh/config as a sort of diagnostic, with an option to remedy that problem with a button on the diagnostic page. For the confirmation dialog you mentioned @ravicious I'm not sure when we'd open that

Copy link
Copy Markdown
Member

@ravicious ravicious Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, my messages were not clear at all, I just noticed that. 🤦

If we have to make the SSH feature opt-in, then instead of putting it in the diag report page, we could have it just in the VNet panel. I imagine the toggle could look somewhat like this:

vnet-panel-ssh-toggle

I didn't want to call it "VNet SSH" because that would be the third time this panel says "VNet". If VNet TCP should also be configurable, then we could replace that status indicator with a toggle maybe.

Now, I imagine that when Alice clicks on "Connect with VNet" next to an SSH node, we could show a notification as you mentioned. The notification would say "VNet SSH needs to be explicitly enabled" or something along those lines. Instead of a link, it'd have a button that says "Enable VNet SSH". When Alice clicks on that button or when she clicks on the VNet SSH toggle, we could then show the confirmation dialog. The dialog would explain that enabling VNet SSH is going to modify ~/.ssh/config. After clicking "Proceed", VNet SSH would be enabled.


One minor problem is the very first start of VNet. For TCP apps, when you click on "Connect" for the first time next to a TCP app, we don't turn on VNet immediately. Instead, we first show the info tab about VNet. From there, you can learn what VNet is and then turn it on. After the first launch, clicking on "Connect" starts VNet immediately without showing the info tab.

For SSH, I think it'd be a bit too much to go from "Connect with VNet" -> the info tab -> turn on VNet -> show notification about VNet SSH -> enable VNet SSH -> finally connect through SSH. But IMHO for SSH we can skip the info tab completely. It makes sense to show it for TCP apps I guess where the flow is a little bit more complex and it might not be obvious what to do next. But for SSH nodes, once you click "Connect with VNet" and see a notification that says "Connect with any SSH client to <address> (copied to clipboard)", I think it's pretty clear what the user should do next. Besides, the user might click "Connect" next to a TCP app without knowing what VNet is. For SSH, it's unlikely that someone is going to open the three dot menu and then click "Connect with VNet" without having an idea what VNet is.

Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +84 to +103
1. Because Alice has not started VNet before, VNet starts for the first time.
1. The VNet panel in the top left opens and displays:
> 🟢 Proxying TCP connections to teleport.example.com<br>
> 🔴 VNet SSH is not configured<br>
> Open Status Page
1. A toast pops up in the bottom right and displays:
"VNet SSH is not configured, click here (link) to enable VNet SSH"

Clicking either the "Open Status Page" button in the VNet panel or the link
in the toast both open a new VNet Status tab in Connect.
This tab will include the existing VNet diagnostic report, and a new section at
the top on the status of VNet SSH.

The SSH section in the VNet Status page will display:

> VNet SSH lets any SSH client connect to Teleport SSH servers.<br>
> When you enable VNet SSH, Connect will automatically add configuration options
> to `~/.ssh/config`.
>
> Enable VNet SSH (button)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have it enabled by default and let the user disable individual components if not needed? Do we even need to let them disable individual components at all?

If disabling individual components is something we want to have available, I think it might be nicer to expand the VNet panel with two sections rather than adding extra stuff to the diag report tab. I think it'd feel nicer to be able to control everything from that panel.

Comment thread rfd/0207-vnet-ssh.md

Clicking either the "Open Status Page" button in the VNet panel or the link
in the toast both open a new VNet Status tab in Connect.
This tab will include the existing VNet diagnostic report, and a new section at
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any diagnostics we can add for VNet SSH that could be included in the report?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "whether or not the Include directive is in ~/.ssh/config" is kind of a diagnostic, not sure what else we could really do though

Comment thread rfd/0207-vnet-ssh.md
If per-session MFA is required, Connect will prompt the user to tap their
hardware key.

#### User story - VSCode remote development
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there's a marketing tab for VNet in Connect, it'd be good to have something about SSH in there too. This use case in particular could be nice – it's shows what is possible rather than just describing how it works (which might be better suited for docs). We'd just need to have this use case described somewhere so that we can direct the user there if they're interested in it. A section in the docs would be enough. A blog post would be good too, but that requires much more effort.

Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +283 to +290
1. VNet will apply any matching proxy templates found in the user's
`TELEPORT_HOME`.
1. VNet will query the matching cluster to see if any SSH servers have a
matching SSH server using the `ResolveSSHTarget` rpc.
1. If a match is found a free IP will be assigned to that SSH server and it
will be returned in an authoritative DNS answer.
1. If no matching server is found a `NXDOMAIN` response will be returned with
no cache TTL.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How quickly can this react to configuration changes, both in the proxy templates and in the cluster? It'd be nice if we were in control of how long a hostname to hostID mapping will last, but I don't think it's possible if we want to respond NXDOMAIN to unknown hostnames (and the alternative is to reject connections if there's no longer a pairing, which is confusing to the user although it's probably the same thing that happens if a host goes down?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest revision, we never query for host IDs (unless a proxy template requires it) we always dial by hostname so we can react ~instantly

@nklaassen nklaassen added the no-changelog Indicates that a PR does not require a changelog entry label Apr 17, 2025
@nklaassen nklaassen requested a review from espadolini April 18, 2025 01:07
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +150 to +151
She gets an error message from `ssh` on the CLI reading
`ERROR: access denied to root connecting to devbox`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return such a detailed error message to openssh during ssh auth with x/crypto/ssh?

(please don't say "we act like the authentication succeeded and the session was opened and then we write on stderr in the session")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some silly checks, for openssh ssh we can either display a banner, which is likely not going to be surfaced well in GUI clients but should be always displayed by ssh in a terminal, or we can fork x/crypto to allow using a custom disconnect message which is probably going to be more visible in all clients but is formatted kind of weird:

Received disconnect from [<ipv6 addr>] port 22:<integer>: <error message>
Disconnected from <ipv6 addr> port 22

Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know @zmb3 would very much like it if we could return more actionable error messages when the SSH handshake fails. Any errors stemming from device trust issues exposed to users are very opaque and unhelpful message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately this was just an example error message i stole from tsh ssh and forgot to double check, the real error message displayed by openssh clients is more likely to be root@devbox: Permission denied (publickey). unless we hack an error message into the banner or fork x/crypto/ssh (the ideas you mentioned in the other thread)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are really just 2 different ways of displaying the same error though, the tsh ssh error doesn't have any additional information, we just format it "nicer" on any ssh auth error

if utils.IsHandshakeFailedError(err) {
conn.Close()
// TODO(codingllama): Improve error message below for device trust.
// An alternative we have here is querying the cluster to check if device
// trust is required, a check similar to `IsMFARequired`.
log.InfoContext(ctx, "Access denied connecting to host",
"login", sshConfig.User,
"target_host", nodeName,
"error", err,
)
return nil, trace.AccessDenied("access denied to %v connecting to %v", sshConfig.User, nodeName)
}

we can't really do much better than this without some api changes i guess

Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +301 to +304
1. The existing TCP app lookup will run first, if the address matches a TCP app
then the IP will be permanently assigned to that app and regular TCP app
forwarding will take over. This matches the current VNet behavior where TCP
app matches are permanent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also know the destination port of the TCP connection? Can't we take that into account for our decision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could take it into account for sure, although it would be a breaking change for anyone currently connecting to a TCP app on port 22

Comment thread rfd/0207-vnet-ssh.md
Comment thread rfd/0207-vnet-ssh.md Outdated
Comment on lines +322 to +328
1. VNet will set `ssh.ServerConfig.PublicKeyCallback` to a function that:
1. Validates that the user public key matches `id_vnet.pub` mentioned in [SSH client configuration](#ssh-client-configuration)
1. Attempts an SSH connection to the target with and without session MFA. If
both fail, an authentication error is returned. This is done in
`PublicKeyCallback` so that authentication failures to the target are returned
as authentication failures to the client. Optimistically we can send
more informative errors to Connect.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually work that well; the PublicKeyCallback can be called an arbitrary amount of times and can end up being called with a key that the client is not actually in control of, and you're only going to know that after finishing the authentication. 😢 golang/go#70795 would fix it but it's sort of stuck while the x/crypto/ssh maintainers figure out the v2 api.

Thankfully it's not a very big deal to optimistically open the connection (once, obviously) to the remote server and just close it if the client doesn't prove itself to be in control of the id_vnet private key.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, we can also fork x/crypto/ssh since this is something that runs in our binary rather than being in the api module (maybe to also add a way to return a custom authn error message?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any alternatives to leveraging the PublicKeyCallback like this? Is this purely to provide more useful error messaging to the end user?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is about the fact that PublicKeyCallback gets called even when the client presents just a public key without any proof of possession.

It's not a particularly big deal to open the connection immediately tho, this is something that could kinda be exploited by an attacker on the local machine to open connections to SSH servers just to have them closed immediately - and at no big cost to anything in the cluster even, since the connection will just get stuck at version negotiation until the local client proves it's in possession of the vnet private key, so the only cost is a bit of network and some logging, not even any cpu usage due to encryption.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not like the public key is going to be broadcast to the world, it's not treated as sensitive data but it's also something that you'd need a bit of shenanigans to obtain in the first place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosstimothy yeah we don't have to do this in PublicKeyCallback it was just an idea so that we could return authentication errors to openssh clients when the user actually fails to auth to the teleport node

we could dial the target node in BannerCallback instead to display the error as a banner, it feels sketchy but i guess it's no less sketchy than doing it in PublicKeyCallback where the private key has not been verified yet

although at this point i'm thinking we just shouldn't shove teleport errors into the SSH protocol, and shouldn't dial the node before the client authenticates with the private key, instead we can send whatever error message we want to Connect in a "notification" the same way we currently notify users when they connect to an invalid port of a multi-port TCP app

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need BannerCallback, the Cool™ way is to get a ServerPreAuthConn through PreAuthConnCallback and then we can send an error message in a banner at any time before the authentication completes.

I think that the best UX would be to display the error in both a banner (so it shows up in the terminal that the user is interacting with) and in a connect notification (to not swallow errors if the connection is happening somewhere other than in a terminal).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principal that showing better and more informative SSH errors is something we should do. However, it's not a problem limited to VNet SSH. If we can tackle error handling while working on VNet SSH that's great, but I don't think the internals of how that happens need to be ironed out in this RFD.

@viktor-doyensec-teleport
Copy link
Copy Markdown

Hey team! I just finished my review of the RFD.

Taking into consideration the fact that the new functionality leverages the existing SSH tunneling implementation, the changes shouldn't introduce any new threat surface.

The RDF looks good to me!

@nklaassen nklaassen enabled auto-merge May 15, 2025 00:56
@nklaassen nklaassen added this pull request to the merge queue May 15, 2025
Merged via the queue into master with commit d06a5a7 May 15, 2025
39 of 40 checks passed
@nklaassen nklaassen deleted the rfd/0207-vnet-ssh branch May 15, 2025 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants