Skip to content

RFD 0242: Improve Windows installation experience#62545

Merged
gzdunek merged 17 commits intomasterfrom
rfd/0242-improve-windows-installation-experience
Feb 18, 2026
Merged

RFD 0242: Improve Windows installation experience#62545
gzdunek merged 17 commits intomasterfrom
rfd/0242-improve-windows-installation-experience

Conversation

@gzdunek
Copy link
Copy Markdown
Contributor

@gzdunek gzdunek commented Dec 31, 2025

@gzdunek gzdunek added rfd Request for Discussion teleport-connect Issues related to Teleport Connect. labels Dec 31, 2025
@gzdunek gzdunek requested review from avatus, ravicious and zmb3 and removed request for bl-nero and fspmarshall December 31, 2025 14:36
@gzdunek gzdunek added the no-changelog Indicates that a PR does not require a changelog entry label Dec 31, 2025
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md
Copy link
Copy Markdown

@adrian-doyensec adrian-doyensec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several concerns regarding the current shape of the Update Process:

  1. The sc start TeleportUpdateService install-connect-update --path=... --cluster=... command can be executed by any authenticated user and could directly result in a Local Privilege Elevation (LPE). Therefore, the service must be extremely careful and must not implicitly trust any input data.
  2. As mentioned in the inline comment: C:\Windows\Temp should not be used. Instead, create a %ProgramData%\Teleport\Updates\ (or similar) directory with a properly configured ACL. Lock it down to Administrators and LocalSystem access only. The %ProgramData% is user-writable by default so ensure the directory wasn't pre-created before and there aren't any planted DLLs suitable for DLL hijacking.
  3. The service must not move the file. A move operation could be exploited to remove important system files from their original location. This Arbitrary File Delete primitive could result in DoS for other applications (e.g. EDR) or even LPE (see https://www.zerodayinitiative.com/blog/2022/3/16/abusing-arbitrary-file-deletes-to-escalate-privilege-and-other-great-tricks). Instead, the file should be copied to a secure location.
  4. Unfortunately, a simple copy operation is also risky. Consider what would happen if local user asked the service to copy sensitive files (e.g. content from C:\Windows\repair) and later read their content from a new location. This could result in another LPE vector or information disclosure.
  5. The service needs to verify the copied files. However, it also needs to detect any unexpected scenarios and reject: network locations, UNC paths, device paths, alternate data streams, symbolic links and potentially others. It must ensure no TOCTOU window exist where attacker can still manipulate the file.
  6. The points 3, 4, and 5 can be mitigated by a design change in which the service impersonates the calling user. It might still include extra flags when opening the file, to e.g., prevent reparsing points and use FILE_SHARE_READ to block file modifications, but the initial file open should occur under the caller’s security context. If this succeeds, the user already has legitimate access to the file. Instead of using CopyFile or similar fs operation, the service should read the file via the existing handle and then, using a privileged context, write the bytes into a newly created destination file.
  7. There's one last risk. An attacker could use the service to simply copy arbitrary files into our secure %ProgramData% location. For example, they could first copy a malicious version.dll (loaded by almost every executable out there), then copy a properly signed executable and have it executed. The malicious DLL could then be loaded from the same directory, resulting in another LPE vector. This can be mitigated by creating a unique subdirectory for each service invocation, e.g. %ProgramData%\Teleport\Updates\<GUID>.

Alternatively, the destination file could always be removed by the service at the end of update process. However, it is likely to be insufficient in preventing file content disclosure and very racy in stopping injection attacks. The dedicated directory design seems much more hardened.

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Comment on lines +90 to +91
An attacker could trick a user into adding a malicious cluster to the app and setting it as the one managing updates,
effectively granting that remote cluster control over the local per-machine service version.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How complex is this attack to execute? What does it take to set a malicious cluster to dictate updates?

I might have already asked about this and you might have already answered, but it'll be good to have this under the RFD too.

I feel like we go to great lengths to avoid this vector, making the implementation and admin/user experience much more complex to avoid an attack that is in practice quite hard to execute.

Copy link
Copy Markdown
Contributor Author

@gzdunek gzdunek Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following needs to happen:

  1. Teleport Connect is installed in per-machine mode with automatic updates enabled (default configuration).
  2. An attacker tricks the user to add a malicious cluster through one of two methods:
    • Via a deep link, like teleport://malicious-cluster.com/connect_my_computer (the user only has to click Next).
    • By typing the cluster address manually through Add Cluster... > Next.
  3. A malicious cluster must manage updates. An attacker-controlled cluster could manage updates if:
  • It is the only cluster present, it is selected by default.
  • It is chosen automatically by Connect's internal selection logic.
  • The user is tricked to manually select this cluster to manage updates.
  1. The attack requires a vulnerable version of Teleport Connect (specifically, a vulnerable tsh.exe) to be hosted either on the official CDN or at a custom hosting defined by the TELEPORT_CDN_BASE_URL.
  2. The user must trigger a system service to execute the vulnerable tsh.exe binary with LocalSystem privileges. This can be achieved by either launching VNet or triggering the update service (e.g., by initiating another update installation).

So in short, this attack requires two simultaneous conditions: the user must add a malicious cluster that manages updates, and a vulnerable Teleport Connect binary must be available for download.
The scenario is indeed quite complex, but it's hard to me to say that if we accept the risk or not.

If we do choose to accept it, I am concerned about our response strategy for potential future vulnerabilities in tsh (so in Connect). Even if admins update their client_tools and verify users are on up-to-date version, we cannot guarantee that an insecure version won't be reinstalled later.
Customers would either have to disable client updates indefinitely, or we would have to remove the compromised versions from the CDN and customers from their local mirrors (if any).
I know the same problem applies to per-user installs, but the risk is greater when tsh runs with system-level privileges rather than standard user permissions (I think it's a difference in terms of CVSS).

EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.

By "this check" you mean looking up if a cluster is in the allowlist? How does it help against manually passing a compromised build?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build would be passed using sc start TeleportUpdateService install-connect-update --path=... --cluster=...
The cluster must be one the allowlist, otherwise the service won't install the update.
Then the service compares the version from the passed build with the one from the ping to the cluster. The installation will happen only if they are equal.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another workaround would be to mark a self-hosted cluster as "safe to use as a candidate during auto-updates" only after the user logs in to it at least once but:

  1. It could be countered with "What if the attacker deploys a cluster that accepts any login credentials or creates a random SSO user?".
  2. There's a bunch of edge cases to deal with now that we support tsh dir sharing.
  3. I'm sure it can lead to a bunch of weird auto-update behaviors when adding a new cluster, but fortunately users typically do not add new clusters very often.

Still, it seems better than requiring self-hosted admins to set up MDM policies or update the registry just to make auto-updates work. It seems like a ridiculous amount of effort just to have working auto-updates!

Long-term we could implement some kind of an allowlist for cluster addresses that an admin could manage for customers who are super security-oriented. But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, allowing the cluster to provide updates only after the user has logged seems to introduce several edge cases. More importantly, it's unclear to me whether this approach aligns well with general system security principles. On one hand, the privileged update process must be careful not to trust any user-writable data; on the other hand, this model would require trusting the contents of the ~/.tsh directory.

But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.

I agree :(
After discussing this with Zac and Stephen, it seems there is a middle-ground approach: the updater should only allow upgrades. Currently, downgrades are also permitted in order to match how client updates for tsh work.
This change would prevent installing very old versions. However, it would still be possible to upgrade to an older release from a newer major version, for example, updating from the latest v17 to the initial v18 release.
But it doesn't seem there's a way to fully address this without fundamentally changing our client update model, such as restricting updates to the latest available version only.

@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Jan 8, 2026

The points 3, 4, and 5 can be mitigated by a design change in which the service impersonates the calling user. It might still include extra flags when opening the file, to e.g., prevent reparsing points and use FILE_SHARE_READ to block file modifications, but the initial file open should occur under the caller’s security context. If this succeeds, the user already has legitimate access to the file. Instead of using CopyFile or similar fs operation, the service should read the file via the existing handle and then, using a privileged context, write the bytes into a newly created destination file.

Thanks for the review, I will looking into impersonating the calling user and how to implement it.

@gzdunek gzdunek requested a review from ravicious January 8, 2026 17:12
@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Jan 14, 2026

@ravicious, @zmb3 do you have any other thoughts on this RFD?

I believe an open point is that extra protection against updates from malicious clusters #62545 (comment).
I'm not happy about adding complexity to the flow, but I don't see a viable alternative that would protect us from downgrade attacks.

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Jan 22, 2026

I've made some final edits to the RFD after working more on the implementation.

  1. The service is no longer started with command-line arguments. Instead, the client connects to it via a named pipe. While investigating client impersonation, I found that the most common approach is to use ImpersonateNamedPipeClient. Unfortunately, this function is not available in golang.org/x/sys/windows.
    But since we already establish a named pipe connection, we can use it to transfer the update binary instead: the client reads the file using its own permissions and sends it over the pipe, and the service then stores it in ProgramData.
  2. Added checksum validation to the service. This ensures that only Teleport Connect updates can be installed, even when the service itself is unsigned. The checksum is fetched from our CDN or from the per-machine CdnBaseUrl. This is particularly important for OSS builds, which may be unsigned; without this validation, the service could install arbitrary binaries.
  3. The app determines whether it is installed per-machine by comparing its actual install path with the path stored in the system registry, rather than relying on a hardcoded Program Files check.

@adrian-doyensec would you mind taking another look at the update process?

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Jan 29, 2026

Although the required approvals are in place, I'll keep this RFD open a little longer to get the final feedback from Doyensec as well. In the meantime, I'll proceed with the implementation and merge it into master.

Copy link
Copy Markdown
Contributor

@cthach cthach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very well-written RFD. Great work!

Comment thread rfd/0242-improve-windows-installation-experience.md Outdated
Copy link
Copy Markdown

@adrian-doyensec adrian-doyensec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, and thank you for addressing all my concerns.

@gzdunek gzdunek enabled auto-merge February 18, 2026 12:12
@gzdunek gzdunek added this pull request to the merge queue Feb 18, 2026
Merged via the queue into master with commit 9ebf6c7 Feb 18, 2026
42 checks passed
@gzdunek gzdunek deleted the rfd/0242-improve-windows-installation-experience branch February 18, 2026 12:27
cthach pushed a commit that referenced this pull request Feb 20, 2026
* RFD 0242: Improve Windows installation experience

* Review comments

* Fix spelling

* Review fixes

* Update reviewers

* Replace clusters allowlist with upgrade-only update policy

* Resolve install location from system registry

* Improve update process

* Lint

* Clarify downgrade behavior for versions set via env var or system registry

* Clarify creating secure directory

* Fix typo

* Make improvements to update process based on implementation findings

* Typos

* Add exact security descriptors

* Lint
danielashare pushed a commit that referenced this pull request Feb 23, 2026
* RFD 0242: Improve Windows installation experience

* Review comments

* Fix spelling

* Review fixes

* Update reviewers

* Replace clusters allowlist with upgrade-only update policy

* Resolve install location from system registry

* Improve update process

* Lint

* Clarify downgrade behavior for versions set via env var or system registry

* Clarify creating secure directory

* Fix typo

* Make improvements to update process based on implementation findings

* Typos

* Add exact security descriptors

* Lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry rfd Request for Discussion size/md teleport-connect Issues related to Teleport Connect.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants