RFD 0242: Improve Windows installation experience by gzdunek · Pull Request #62545 · gravitational/teleport

gzdunek · 2025-12-31T14:33:59Z

adrian-doyensec

I have several concerns regarding the current shape of the Update Process:

The sc start TeleportUpdateService install-connect-update --path=... --cluster=... command can be executed by any authenticated user and could directly result in a Local Privilege Elevation (LPE). Therefore, the service must be extremely careful and must not implicitly trust any input data.
As mentioned in the inline comment: C:\Windows\Temp should not be used. Instead, create a %ProgramData%\Teleport\Updates\ (or similar) directory with a properly configured ACL. Lock it down to Administrators and LocalSystem access only. The %ProgramData% is user-writable by default so ensure the directory wasn't pre-created before and there aren't any planted DLLs suitable for DLL hijacking.
The service must not move the file. A move operation could be exploited to remove important system files from their original location. This Arbitrary File Delete primitive could result in DoS for other applications (e.g. EDR) or even LPE (see https://www.zerodayinitiative.com/blog/2022/3/16/abusing-arbitrary-file-deletes-to-escalate-privilege-and-other-great-tricks). Instead, the file should be copied to a secure location.
Unfortunately, a simple copy operation is also risky. Consider what would happen if local user asked the service to copy sensitive files (e.g. content from C:\Windows\repair) and later read their content from a new location. This could result in another LPE vector or information disclosure.
The service needs to verify the copied files. However, it also needs to detect any unexpected scenarios and reject: network locations, UNC paths, device paths, alternate data streams, symbolic links and potentially others. It must ensure no TOCTOU window exist where attacker can still manipulate the file.
The points 3, 4, and 5 can be mitigated by a design change in which the service impersonates the calling user. It might still include extra flags when opening the file, to e.g., prevent reparsing points and use FILE_SHARE_READ to block file modifications, but the initial file open should occur under the caller’s security context. If this succeeds, the user already has legitimate access to the file. Instead of using CopyFile or similar fs operation, the service should read the file via the existing handle and then, using a privileged context, write the bytes into a newly created destination file.
There's one last risk. An attacker could use the service to simply copy arbitrary files into our secure %ProgramData% location. For example, they could first copy a malicious version.dll (loaded by almost every executable out there), then copy a properly signed executable and have it executed. The malicious DLL could then be loaded from the same directory, resulting in another LPE vector. This can be mitigated by creating a unique subdirectory for each service invocation, e.g. %ProgramData%\Teleport\Updates\<GUID>.

Alternatively, the destination file could always be removed by the service at the end of update process. However, it is likely to be insufficient in preventing file content disclosure and very racy in stopping injection attacks. The dedicated directory design seems much more hardened.

ravicious · 2026-01-07T16:09:28Z

+An attacker could trick a user into adding a malicious cluster to the app and setting it as the one managing updates,
+effectively granting that remote cluster control over the local per-machine service version.


How complex is this attack to execute? What does it take to set a malicious cluster to dictate updates?

I might have already asked about this and you might have already answered, but it'll be good to have this under the RFD too.

I feel like we go to great lengths to avoid this vector, making the implementation and admin/user experience much more complex to avoid an attack that is in practice quite hard to execute.

The following needs to happen:

Teleport Connect is installed in per-machine mode with automatic updates enabled (default configuration).

An attacker tricks the user to add a malicious cluster through one of two methods:

Via a deep link, like teleport://malicious-cluster.com/connect_my_computer (the user only has to click Next).

By typing the cluster address manually through Add Cluster... > Next.

A malicious cluster must manage updates. An attacker-controlled cluster could manage updates if:

It is the only cluster present, it is selected by default.

It is chosen automatically by Connect's internal selection logic.

The user is tricked to manually select this cluster to manage updates.

The attack requires a vulnerable version of Teleport Connect (specifically, a vulnerable tsh.exe) to be hosted either on the official CDN or at a custom hosting defined by the TELEPORT_CDN_BASE_URL.

The user must trigger a system service to execute the vulnerable tsh.exe binary with LocalSystem privileges. This can be achieved by either launching VNet or triggering the update service (e.g., by initiating another update installation).

So in short, this attack requires two simultaneous conditions: the user must add a malicious cluster that manages updates, and a vulnerable Teleport Connect binary must be available for download.
The scenario is indeed quite complex, but it's hard to me to say that if we accept the risk or not.

If we do choose to accept it, I am concerned about our response strategy for potential future vulnerabilities in tsh (so in Connect). Even if admins update their client_tools and verify users are on up-to-date version, we cannot guarantee that an insecure version won't be reinstalled later.
Customers would either have to disable client updates indefinitely, or we would have to remove the compromised versions from the CDN and customers from their local mirrors (if any).
I know the same problem applies to per-user installs, but the risk is greater when tsh runs with system-level privileges rather than standard user permissions (I think it's a difference in terms of CVSS).

EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.

EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.

By "this check" you mean looking up if a cluster is in the allowlist? How does it help against manually passing a compromised build?

The build would be passed using sc start TeleportUpdateService install-connect-update --path=... --cluster=...
The cluster must be one the allowlist, otherwise the service won't install the update.
Then the service compares the version from the passed build with the one from the ping to the cluster. The installation will happen only if they are equal.

Another workaround would be to mark a self-hosted cluster as "safe to use as a candidate during auto-updates" only after the user logs in to it at least once but:

It could be countered with "What if the attacker deploys a cluster that accepts any login credentials or creates a random SSO user?".

There's a bunch of edge cases to deal with now that we support tsh dir sharing.

I'm sure it can lead to a bunch of weird auto-update behaviors when adding a new cluster, but fortunately users typically do not add new clusters very often.

Still, it seems better than requiring self-hosted admins to set up MDM policies or update the registry just to make auto-updates work. It seems like a ridiculous amount of effort just to have working auto-updates!

Long-term we could implement some kind of an allowlist for cluster addresses that an admin could manage for customers who are super security-oriented. But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.

Yes, allowing the cluster to provide updates only after the user has logged seems to introduce several edge cases. More importantly, it's unclear to me whether this approach aligns well with general system security principles. On one hand, the privileged update process must be careful not to trust any user-writable data; on the other hand, this model would require trusting the contents of the ~/.tsh directory.

But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.

I agree :(
After discussing this with Zac and Stephen, it seems there is a middle-ground approach: the updater should only allow upgrades. Currently, downgrades are also permitted in order to match how client updates for tsh work.
This change would prevent installing very old versions. However, it would still be possible to upgrade to an older release from a newer major version, for example, updating from the latest v17 to the initial v18 release.
But it doesn't seem there's a way to fully address this without fundamentally changing our client update model, such as restricting updates to the latest available version only.

gzdunek · 2026-01-08T17:12:13Z

The points 3, 4, and 5 can be mitigated by a design change in which the service impersonates the calling user. It might still include extra flags when opening the file, to e.g., prevent reparsing points and use FILE_SHARE_READ to block file modifications, but the initial file open should occur under the caller’s security context. If this succeeds, the user already has legitimate access to the file. Instead of using CopyFile or similar fs operation, the service should read the file via the existing handle and then, using a privileged context, write the bytes into a newly created destination file.

Thanks for the review, I will looking into impersonating the calling user and how to implement it.

…rience

gzdunek · 2026-01-14T16:40:52Z

@ravicious, @zmb3 do you have any other thoughts on this RFD?

I believe an open point is that extra protection against updates from malicious clusters #62545 (comment).
I'm not happy about adding complexity to the flow, but I don't see a viable alternative that would protect us from downgrade attacks.

gzdunek · 2026-01-22T11:05:37Z

I've made some final edits to the RFD after working more on the implementation.

The service is no longer started with command-line arguments. Instead, the client connects to it via a named pipe. While investigating client impersonation, I found that the most common approach is to use ImpersonateNamedPipeClient. Unfortunately, this function is not available in golang.org/x/sys/windows.
But since we already establish a named pipe connection, we can use it to transfer the update binary instead: the client reads the file using its own permissions and sends it over the pipe, and the service then stores it in ProgramData.
Added checksum validation to the service. This ensures that only Teleport Connect updates can be installed, even when the service itself is unsigned. The checksum is fetched from our CDN or from the per-machine CdnBaseUrl. This is particularly important for OSS builds, which may be unsigned; without this validation, the service could install arbitrary binaries.
The app determines whether it is installed per-machine by comparing its actual install path with the path stored in the system registry, rather than relying on a hardcoded Program Files check.

@adrian-doyensec would you mind taking another look at the update process?

…istry

gzdunek · 2026-01-29T15:16:47Z

Although the required approvals are in place, I'll keep this RFD open a little longer to get the final feedback from Doyensec as well. In the meantime, I'll proceed with the implementation and merge it into master.

cthach

This is a very well-written RFD. Great work!

adrian-doyensec

Great work, and thank you for addressing all my concerns.

* RFD 0242: Improve Windows installation experience * Review comments * Fix spelling * Review fixes * Update reviewers * Replace clusters allowlist with upgrade-only update policy * Resolve install location from system registry * Improve update process * Lint * Clarify downgrade behavior for versions set via env var or system registry * Clarify creating secure directory * Fix typo * Make improvements to update process based on implementation findings * Typos * Add exact security descriptors * Lint

RFD 0242: Improve Windows installation experience

11e96c3

gzdunek added rfd Request for Discussion teleport-connect Issues related to Teleport Connect. labels Dec 31, 2025

github-actions Bot requested review from bl-nero and fspmarshall December 31, 2025 14:34

github-actions Bot added the size/md label Dec 31, 2025

gzdunek requested review from avatus, ravicious and zmb3 and removed request for bl-nero and fspmarshall December 31, 2025 14:36

gzdunek added the no-changelog Indicates that a PR does not require a changelog entry label Dec 31, 2025

zmb3 reviewed Dec 31, 2025

View reviewed changes

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

Comment thread rfd/0242-improve-windows-installation-experience.md

Review comments

e140b1f

adrian-doyensec reviewed Jan 2, 2026

View reviewed changes

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

ravicious reviewed Jan 7, 2026

View reviewed changes

gzdunek added 2 commits January 8, 2026 15:53

Fix spelling

c8a61cf

Review fixes

c0a1e39

gzdunek requested a review from ravicious January 8, 2026 17:12

gzdunek added 2 commits January 14, 2026 17:15

Update reviewers

51cb2e1

Merge branch 'master' into rfd/0242-improve-windows-installation-expe…

70e935d

…rience

gzdunek requested a review from zmb3 January 14, 2026 16:40

gzdunek mentioned this pull request Jan 16, 2026

Connect: switch Windows installer to dual mode #62910

Merged

Replace clusters allowlist with upgrade-only update policy

9c0e0ef

avatus approved these changes Jan 21, 2026

View reviewed changes

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

gzdunek added 3 commits January 22, 2026 11:31

Resolve install location from system registry

0cb7e0c

Improve update process

77fffb7

Lint

becf948

gzdunek requested a review from adrian-doyensec January 22, 2026 11:06

gzdunek mentioned this pull request Jan 26, 2026

Make Windows Service setup reusable #63132

Merged

gzdunek added 2 commits January 26, 2026 19:59

Clarify downgrade behavior for versions set via env var or system reg…

dde02c6

…istry

Clarify creating secure directory

deabde4

ravicious approved these changes Jan 27, 2026

View reviewed changes

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

gzdunek mentioned this pull request Jan 27, 2026

Connect: allow only version upgrades in automatic updates #63187

Merged

Fix typo

169f492

gzdunek mentioned this pull request Jan 29, 2026

Connect: read automatic updates configuration from Windows registry #63281

Merged

Make improvements to update process based on implementation findings

47b0e16

This was referenced Feb 6, 2026

Connect: add privileged updater service #63572

Merged

Connect: install updates via privileged service, add signature verification #63573

Merged

cthach reviewed Feb 12, 2026

View reviewed changes

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated

gzdunek added 2 commits February 13, 2026 13:21

Typos

bb5be86

Add exact security descriptors

50eb747

adrian-doyensec approved these changes Feb 18, 2026

View reviewed changes

Lint

f551fa4

gzdunek enabled auto-merge February 18, 2026 12:12

gzdunek added this pull request to the merge queue Feb 18, 2026

Merged via the queue into master with commit 9ebf6c7 Feb 18, 2026
42 checks passed

gzdunek deleted the rfd/0242-improve-windows-installation-experience branch February 18, 2026 12:27

This was referenced Mar 18, 2026

Connect: verify signature in privileged updater against hardcoded Windows signing cert #64754

Merged

Connect: update managed updates docs for Windows dual-mode install and registry policy config #64905

Merged

		An attacker could trick a user into adding a malicious cluster to the app and setting it as the one managing updates,
		effectively granting that remote cluster control over the local per-machine service version.

Conversation

gzdunek commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrian-doyensec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravicious Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gzdunek Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ravicious Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gzdunek Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ravicious Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gzdunek Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gzdunek commented Jan 8, 2026

Uh oh!

gzdunek commented Jan 14, 2026

Uh oh!

Uh oh!

gzdunek commented Jan 22, 2026

Uh oh!

Uh oh!

gzdunek commented Jan 29, 2026

Uh oh!

cthach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrian-doyensec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gzdunek commented Dec 31, 2025 •

edited

Loading

gzdunek Jan 8, 2026 •

edited

Loading