Conversation
2b9d1ba to
42df7a2
Compare
7fec826 to
273b629
Compare
nklaassen
left a comment
There was a problem hiding this comment.
is there a plan for distributing the required polkit and systemd unit files?
|
@tangyatsu - this PR will require admin approval to merge due to its size. Consider breaking it up into a series smaller changes. |
367cbf9 to
4ba7247
Compare
Working on it in PR #64362 I think it would be better to ship these files via package, but there is a problem both teleport and teleport-connect packages would need these files |
nklaassen
left a comment
There was a problem hiding this comment.
The code looks good to me, but I would like to see a manual testplan in the PR description before approving. You have instructions for how to run it it would be nice to see some test steps you have executed. E.g.
- can connect to TCP app over VNet
- HTTP app access continues to work with VNet on or off
- public URIs under a custom DNS zone are still reachable (try adding google.com as a custom DNS zone and make sure you can still connect to mail.google.com).
- can start and stop VNet from Connect (if that works at this point)
- can run
tsh vnet - can view logs from the root/daemon process
There was a problem hiding this comment.
💡 Codex Review
This build-tag change excludes the only fallback definitions of (*NetInterfaces).interfaceApp and (*RouteConflictDiag).commands from linux, and there is no routeconflict_linux.go in lib/vnet/diag to provide replacements. That leaves routeconflict.go with unresolved method calls (n.interfaceApp / c.commands) when building linux targets (including via lib/vnet imports like opensshconfig.go), so the commit makes linux builds fail at compile time.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 706f21dda5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a1753c571
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for _, cidrRange := range cfg.cidrRanges { | ||
| if slices.Contains(state.configuredCidrRanges, cidrRange) { | ||
| continue | ||
| } | ||
| log.InfoContext(ctx, "Setting an IPv4 route", "netmask", cidrRange) |
There was a problem hiding this comment.
Remove stale CIDR routes when config shrinks
platformConfigureOS only appends newly seen CIDRs to state.configuredCidrRanges and never removes routes that disappeared from cfg.cidrRanges, so long-lived sessions keep stale ip route entries after clusters/logins change. In practice, traffic for removed Teleport ranges will still be routed into the VNet TUN and can be blackholed (or misrouted) until the whole VNet process is restarted.
Useful? React with 👍 / 👎.
ravicious
left a comment
There was a problem hiding this comment.
Good job, let's ship it!
I tested a couple of edge cases, e.g. missing D-Bus service when running through Connect and the app behaves correctly, that is it reports the error in the UI. I was also able to add a systemd override with TELEPORT_DEBUG=1 and get debug logs in the systemd service.
| if cf.Debug { | ||
| level = slog.LevelDebug | ||
| } | ||
| if _, err := utils.InitLogger(utils.LoggingForDaemon, level); err != nil { |
There was a problem hiding this comment.
I was going to mention that this duplicates timestamps already added logged by systemd, e.g.
Mar 13 18:00:12 ubuntu-linux-22-04-02-desktop tsh[14844]: 2026-03-13T18:00:12.346+01:00 INFO [VNET] Running VNet admin process vnet/admin_process_linux.go:41
…but I quickly checked if we do this any different for the teleport binary and it does the same thing so I guess it can stay this way for now.
| select { | ||
| case err := <-done: | ||
| // network stack exited cleanly within timeout | ||
| return trace.Wrap(err, "running VNet admin process") |
There was a problem hiding this comment.
Are we intentionally returning an error for a clean exit?
There was a problem hiding this comment.
this code was only in darwin version previously. my understanding is there was a bug where the network stack could hang on shutdown #58298, so “clean exit” here means it exited within the timeout window, regardless of whether it returned an error
| return dbus.MakeFailedError(trace.Wrap(err, "authorization failed")) | ||
| } | ||
|
|
||
| if d.closing.Load() { |
There was a problem hiding this comment.
What happens if d.closing is false when we read it but turns true just after that?
There was a problem hiding this comment.
I looked at it, in the worst case:
Close() runs after Start() readed d.closing but before Start() sets started=true.
Close() sees started==false, sends nil to done, Wait() returns immediately.
Then Start() launches startAdminProcess goroutine while the daemon is already shutting down
I need to somehow check both of these flags atomically, I didn’t come up with a better idea than adding a mutex.
| } | ||
| log.InfoContext(stopCtx, "Successfully stopped systemd service") | ||
| return nil | ||
| case <-ticker.C: |
There was a problem hiding this comment.
What exactly are we polling here?
There was a problem hiding this comment.
vnet daemon lifecycle is managed by a systemd unit, If the daemon exits, the unit’s ActiveState transitions to inactive or failed, so we periodically poll the unit state to detect that and exit the user process
https://www.freedesktop.org/software/systemd/man/latest/systemd.html#Units
builds on #55542.
D-bus daemon
changelog: Added Linux support for VNet
D-bus is basically a hybrid IPC/RPC mechanism used in Unix-based distros. Services can register and expose methods on the bus, and authorization is handled by polkit, your service must call the polkit API over D-bus.
Good article describing D-bus and polkit: https://u1f383.github.io/linux/2025/05/25/dbus-and-polkit-introduction.html.
The VNet daemon is managed by systemd. The systemd unit runs
tsh vnet-daemon. To allow D-bus to activate the unit when it isn’t running, we add a D-Bus service file that points to this systemd unit.The daemon registers on the system bus as
org.teleport.vnet1and exports interfaceorg.teleport.vnet1.Daemon. It exposes two methods:Start(addr, credPath)Stop()For authorization we use a single polkit action:
org.teleport.vnet1.manage-daemon. There’s no automatic mapping from polkit action IDs to D-Bus methods, the mapping is enforced in code, and we use the same action for both Start and Stop.The daemon exits completely when
Stop()is called or when the admin process exits on its own. It gets brought back on the nextStart()call.No D-Bus daemon
If the D-Bus service is not available:
tsh vnet-admin-setup.Manual Test Plan
Test Environment
it relies on the presence of systemd, D-Bus and polkit, most modern desktop Linux distros include them.
To make it work, you need to install the required polkit and D-Bus configuration files.
Polkit action:
D‑Bus config
By default, d-bus will not allow any user (including root) to own the
org.teleport.vnet1name:D‑Bus service file (activates systemd unit)
Systemd unit
I tested this on a cluster with:
I also set
google.comas a custom DNS zone for VNet and checked access toworkspace.google.comwith VNet turned on.It is actually convenient to inspect DNS behavior with
resolvectl.resolvectl statusshows all active links.resolvectl domainshows domains attached to each link, for example:Link 37 (TeleportVNet): ~internal.example.com ~google.com ~teleport.tangyatsu.com ~teleport.devYou can query and verify which link handled DNS:
Example output:
Test Cases
tsh vnetjournalctl -u teleport-vnet.servicesudo tsh vnetwithout presence of config files