RFD 0242: Improve Windows installation experience#62545
Conversation
adrian-doyensec
left a comment
There was a problem hiding this comment.
I have several concerns regarding the current shape of the Update Process:
- The
sc start TeleportUpdateService install-connect-update --path=... --cluster=...command can be executed by any authenticated user and could directly result in a Local Privilege Elevation (LPE). Therefore, the service must be extremely careful and must not implicitly trust any input data. - As mentioned in the inline comment:
C:\Windows\Tempshould not be used. Instead, create a%ProgramData%\Teleport\Updates\(or similar) directory with a properly configured ACL. Lock it down toAdministratorsandLocalSystemaccess only. The%ProgramData%is user-writable by default so ensure the directory wasn't pre-created before and there aren't any planted DLLs suitable for DLL hijacking. - The service must not move the file. A move operation could be exploited to remove important system files from their original location. This Arbitrary File Delete primitive could result in DoS for other applications (e.g. EDR) or even LPE (see https://www.zerodayinitiative.com/blog/2022/3/16/abusing-arbitrary-file-deletes-to-escalate-privilege-and-other-great-tricks). Instead, the file should be copied to a secure location.
- Unfortunately, a simple copy operation is also risky. Consider what would happen if local user asked the service to copy sensitive files (e.g. content from
C:\Windows\repair) and later read their content from a new location. This could result in another LPE vector or information disclosure. - The service needs to verify the copied files. However, it also needs to detect any unexpected scenarios and reject: network locations, UNC paths, device paths, alternate data streams, symbolic links and potentially others. It must ensure no TOCTOU window exist where attacker can still manipulate the file.
- The points 3, 4, and 5 can be mitigated by a design change in which the service impersonates the calling user. It might still include extra flags when opening the file, to e.g., prevent reparsing points and use
FILE_SHARE_READto block file modifications, but the initial file open should occur under the caller’s security context. If this succeeds, the user already has legitimate access to the file. Instead of usingCopyFileor similar fs operation, the service should read the file via the existing handle and then, using a privileged context, write the bytes into a newly created destination file. - There's one last risk. An attacker could use the service to simply copy arbitrary files into our secure
%ProgramData%location. For example, they could first copy a maliciousversion.dll(loaded by almost every executable out there), then copy a properly signed executable and have it executed. The malicious DLL could then be loaded from the same directory, resulting in another LPE vector. This can be mitigated by creating a unique subdirectory for each service invocation, e.g.%ProgramData%\Teleport\Updates\<GUID>.
Alternatively, the destination file could always be removed by the service at the end of update process. However, it is likely to be insufficient in preventing file content disclosure and very racy in stopping injection attacks. The dedicated directory design seems much more hardened.
| An attacker could trick a user into adding a malicious cluster to the app and setting it as the one managing updates, | ||
| effectively granting that remote cluster control over the local per-machine service version. |
There was a problem hiding this comment.
How complex is this attack to execute? What does it take to set a malicious cluster to dictate updates?
I might have already asked about this and you might have already answered, but it'll be good to have this under the RFD too.
I feel like we go to great lengths to avoid this vector, making the implementation and admin/user experience much more complex to avoid an attack that is in practice quite hard to execute.
There was a problem hiding this comment.
The following needs to happen:
- Teleport Connect is installed in per-machine mode with automatic updates enabled (default configuration).
- An attacker tricks the user to add a malicious cluster through one of two methods:
- Via a deep link, like
teleport://malicious-cluster.com/connect_my_computer(the user only has to clickNext). - By typing the cluster address manually through
Add Cluster...>Next.
- Via a deep link, like
- A malicious cluster must manage updates. An attacker-controlled cluster could manage updates if:
- It is the only cluster present, it is selected by default.
- It is chosen automatically by Connect's internal selection logic.
- The user is tricked to manually select this cluster to manage updates.
- The attack requires a vulnerable version of Teleport Connect (specifically, a vulnerable
tsh.exe) to be hosted either on the official CDN or at a custom hosting defined by theTELEPORT_CDN_BASE_URL. - The user must trigger a system service to execute the vulnerable
tsh.exebinary withLocalSystemprivileges. This can be achieved by either launching VNet or triggering the update service (e.g., by initiating another update installation).
So in short, this attack requires two simultaneous conditions: the user must add a malicious cluster that manages updates, and a vulnerable Teleport Connect binary must be available for download.
The scenario is indeed quite complex, but it's hard to me to say that if we accept the risk or not.
If we do choose to accept it, I am concerned about our response strategy for potential future vulnerabilities in tsh (so in Connect). Even if admins update their client_tools and verify users are on up-to-date version, we cannot guarantee that an insecure version won't be reinstalled later.
Customers would either have to disable client updates indefinitely, or we would have to remove the compromised versions from the CDN and customers from their local mirrors (if any).
I know the same problem applies to per-user installs, but the risk is greater when tsh runs with system-level privileges rather than standard user permissions (I think it's a difference in terms of CVSS).
EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.
There was a problem hiding this comment.
EDIT: Another thing is that if we remove this check, the update service will install any signed Connect binary the user provides. This means we can't effectively 'kill' a vulnerable version by pulling it from a CDN, as a user could manually pass a compromised build to the update service which would install it.
By "this check" you mean looking up if a cluster is in the allowlist? How does it help against manually passing a compromised build?
There was a problem hiding this comment.
The build would be passed using sc start TeleportUpdateService install-connect-update --path=... --cluster=...
The cluster must be one the allowlist, otherwise the service won't install the update.
Then the service compares the version from the passed build with the one from the ping to the cluster. The installation will happen only if they are equal.
There was a problem hiding this comment.
Another workaround would be to mark a self-hosted cluster as "safe to use as a candidate during auto-updates" only after the user logs in to it at least once but:
- It could be countered with "What if the attacker deploys a cluster that accepts any login credentials or creates a random SSO user?".
- There's a bunch of edge cases to deal with now that we support tsh dir sharing.
- I'm sure it can lead to a bunch of weird auto-update behaviors when adding a new cluster, but fortunately users typically do not add new clusters very often.
Still, it seems better than requiring self-hosted admins to set up MDM policies or update the registry just to make auto-updates work. It seems like a ridiculous amount of effort just to have working auto-updates!
Long-term we could implement some kind of an allowlist for cluster addresses that an admin could manage for customers who are super security-oriented. But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.
There was a problem hiding this comment.
Yes, allowing the cluster to provide updates only after the user has logged seems to introduce several edge cases. More importantly, it's unclear to me whether this approach aligns well with general system security principles. On one hand, the privileged update process must be careful not to trust any user-writable data; on the other hand, this model would require trusting the contents of the ~/.tsh directory.
But with the current proposal I cannot imagine that any self-hosted customer would enjoy jumping through this many hoops for auto-updates.
I agree :(
After discussing this with Zac and Stephen, it seems there is a middle-ground approach: the updater should only allow upgrades. Currently, downgrades are also permitted in order to match how client updates for tsh work.
This change would prevent installing very old versions. However, it would still be possible to upgrade to an older release from a newer major version, for example, updating from the latest v17 to the initial v18 release.
But it doesn't seem there's a way to fully address this without fundamentally changing our client update model, such as restricting updates to the latest available version only.
Thanks for the review, I will looking into impersonating the calling user and how to implement it. |
|
@ravicious, @zmb3 do you have any other thoughts on this RFD? I believe an open point is that extra protection against updates from malicious clusters #62545 (comment). |
|
I've made some final edits to the RFD after working more on the implementation.
@adrian-doyensec would you mind taking another look at the update process? |
|
Although the required approvals are in place, I'll keep this RFD open a little longer to get the final feedback from Doyensec as well. In the meantime, I'll proceed with the implementation and merge it into master. |
cthach
left a comment
There was a problem hiding this comment.
This is a very well-written RFD. Great work!
adrian-doyensec
left a comment
There was a problem hiding this comment.
Great work, and thank you for addressing all my concerns.
* RFD 0242: Improve Windows installation experience * Review comments * Fix spelling * Review fixes * Update reviewers * Replace clusters allowlist with upgrade-only update policy * Resolve install location from system registry * Improve update process * Lint * Clarify downgrade behavior for versions set via env var or system registry * Clarify creating secure directory * Fix typo * Make improvements to update process based on implementation findings * Typos * Add exact security descriptors * Lint
* RFD 0242: Improve Windows installation experience * Review comments * Fix spelling * Review fixes * Update reviewers * Replace clusters allowlist with upgrade-only update policy * Resolve install location from system registry * Improve update process * Lint * Clarify downgrade behavior for versions set via env var or system registry * Clarify creating secure directory * Fix typo * Make improvements to update process based on implementation findings * Typos * Add exact security descriptors * Lint
Rendered