Skip to content

Connect: add privileged updater service#63572

Merged
gzdunek merged 17 commits intomasterfrom
gzdunek/connect-updater-1
Mar 6, 2026
Merged

Connect: add privileged updater service#63572
gzdunek merged 17 commits intomasterfrom
gzdunek/connect-updater-1

Conversation

@gzdunek
Copy link
Copy Markdown
Contributor

@gzdunek gzdunek commented Feb 6, 2026

Contributes to #59295
RFD #62545

Part 1/2 of the updater service. The aim of the service is to silently install per-machine updates, without requiring admin privileges from the user.

The update process is described in the RFD https://github.com/gravitational/teleport/blob/rfd/0242-improve-windows-installation-experience/rfd/0242-improve-windows-installation-experience.md#update-process.

While working on the implementation, I've made slight adjustments to the update process (I also updated the RFD):

  1. I flattened the directory structure. Previously it was %ProgramData%\TeleportConnect\Updates\<GUID>; it is now %ProgramData%\TeleportConnectUpdater\<GUID>. This makes the directory easier to secure.
  2. Initially, if the directory already existed, the plan was to verify that its ACLs matched the expected ones (and exit early on a mismatch). In practice, this check appears to be a really complex thing in Windows, and it also introduced a potential denial-of-service vector: a standard user could pre-create the directory and block the service. Instead, the service now takes ownership of the directory and resets its ACLs to the desired configuration.

Manual Test Plan

Tested along with #63573. The test plan is available there.

@gzdunek gzdunek added no-changelog Indicates that a PR does not require a changelog entry backport/branch/v17 backport/branch/v18 labels Feb 6, 2026
@github-actions github-actions Bot requested review from cthach and hugoShaka February 6, 2026 11:24
@gzdunek gzdunek requested review from nklaassen and ravicious and removed request for cthach and hugoShaka February 6, 2026 11:25
Copy link
Copy Markdown
Contributor

@cthach cthach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the whole thing yet, but I noticed that at a glance, there are no tests.

Can we please add tests for this new addition?

@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Feb 9, 2026

Yeah, there are no tests right now 😞 mainly because I didn't have a good way to test this. The code is heavily Windows-specific, while our CI runs on Linux.
I don't think it makes much sense to extract and test only the platform-independent parts, such as ensureIsUpgrade or verifyUpdateChecksum. That wouldn't add much value and would likely make the code harder to read, especially since this logic is only needed on Windows.

However, I've just realized that we already run some integration tests on Windows (#57059). I'll work on adding integration tests for this updater as well. This will require some minor changes, for example adding an option to run the service implementation directly, instead of through Windows SCM.

I'll open a separate PR for it so this one doesn't grow further.

@russjones
Copy link
Copy Markdown
Contributor

russjones commented Feb 10, 2026

@gzdunek Can we add test coverage to this PR?

The main thing with test coverage is to encode expected behavior into the test. How you do that is up to you, but let's make sure that is covered in this PR so we don't have someone accidentally cause a regression because that expectation was just in our collective understanding rather than in test code.

Spending extra time doing that now will save us more time in the future when someone has to figure out those implicit expectations when resolving a regression under time pressure.

@cthach
Copy link
Copy Markdown
Contributor

cthach commented Feb 10, 2026

Yeah, there are no tests right now 😞 mainly because I didn't have a good way to test this. The code is heavily Windows-specific, while our CI runs on Linux. I don't think it makes much sense to extract and test only the platform-independent parts, such as ensureIsUpgrade or verifyUpdateChecksum. That wouldn't add much value and would likely make the code harder to read, especially since this logic is only needed on Windows.

However, I've just realized that we already run some integration tests on Windows (#57059). I'll work on adding integration tests for this updater as well. This will require some minor changes, for example adding an option to run the service implementation directly, instead of through Windows SCM.

I'll open a separate PR for it so this one doesn't grow further.

Adding on to @russjones, perhaps instead of a separate PR for tests, we do that here?

The pro is we have related code and tests in a single commit. The con is as you mentioned, it'll be longer in size, but I feel like the tradeoff is worth it so we have the confidence that the code that we intend to commit is and will continue to behave as expected/intended.

From a security angle, it is personally hard for me to convince myself that this privileged piece of code that has the ability to autonomously install more code should go live in prod (edit: eventually, I'm aware merging doesn't go straight to prod) without high levels of confidence it will do exactly what we expect it to do all scenarios. While we can't test everything, some tests are better than no tests.

@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Feb 11, 2026

Sure, I added the integrations tests to this PR. These tests will run on CI once we merge #63732.

I'll also include the manual testing steps once I get the tag builds published. I’m running into some random errors with that at the moment.

@cthach
Copy link
Copy Markdown
Contributor

cthach commented Feb 12, 2026

Sure, I added the integrations tests to this PR. These tests will run on CI once we merge #63732.

I'll also include the manual testing steps once I get the tag builds published. I’m running into some random errors with that at the moment.

Thank you!

I just reviewed the RFD to get some context. Now I'm reviewing this specific PR now.

Copy link
Copy Markdown
Contributor

@cthach cthach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much Windows experience, so I reviewed what I could.

Please ensure we get a SME with Windows expertise to give this a review.

Also, please document any manual tests you did.

Comment thread lib/teleterm/autoupdate/privileged_updater_client_windows.go Outdated
Comment thread integration/autoupdate/tools/connect_privileged_updater_windows_test.go Outdated
Comment thread lib/teleterm/autoupdate/privilegedupdater/client_windows.go
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread integration/autoupdate/tools/connect_privileged_updater_windows_test.go Outdated
Comment thread integration/autoupdate/tools/connect_privileged_updater_windows_test.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_client_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to waitForSingleClient in lib/teleterm/autoupdate/privileged_updater_service_windows.go, I'll continue the review next week.

Comment on lines +57 to +65
// Allow SYSTEM/Admins Full Control, Authenticated Users read/write, implicitly denies everyone else.
pipeSecurityDescriptor = "D:" + // DACL
"(A;;GA;;;SY)" + // Allow (A);; Generic All (GA);;; SYSTEM (SY)
"(A;;GA;;;BA)" + // Allow (A);; Generic All (GA);;; Built-in Admins (BA)
"(A;;GRGW;;;AU)" // Allow (A);; Generic Read/Write (GRGW);;; Authenticated Users (AU)
updateDirSecurityDescriptor = "O:SY" + // Owner SYSTEM
"D:P" + // 'P' blocks permissions inheritance from the parent directory
"(A;OICI;GA;;;SY)" + // Allow System Full Access
"(A;OICI;GA;;;BA)" // Allow Built-in Administrators Full Access
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a some kind of an article about pipes, folders and DACL that we could use to cross-check these descriptors against?

The RFD isn't super specific on this. But I also see it says "First try to create a directory with correct ACLs, granting write access only to SYSTEM and Administrators" but here we grant them GA.

In other words, how do we know this is secure? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFD isn't super specific on this. But I also see it says "First try to create a directory with correct ACLs, granting write access only to SYSTEM and Administrators" but here we grant them GA.

I phrased it incorrectly. Instead of "granting write-only access to" I should have said "granting access only to"(as in Doyensec's comment). We can't grant write-only access, since the service also needs permission to read and delete data.

Is there a some kind of an article about pipes, folders and DACL that we could use to cross-check these descriptors against?

It's surprisingly difficult to find good examples of how to configure a DACL for requirements like ours. Here's some documentation explaining how the different parts of a security descriptor work:

General DACL structure:
https://techcommunity.microsoft.com/blog/askds/the-security-descriptor-definition-language-of-love-part-1/395202
https://techcommunity.microsoft.com/blog/askds/the-security-descriptor-definition-language-of-love-part-2/395258

Securing pipes:
https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipe-security-and-access-rights
https://stackoverflow.com/questions/29947524/c-let-user-process-write-to-local-system-named-pipe-custom-security-descrip

Based on these references, I've narrowed down the pipe's access configuration slightly. Microsoft's documentation notes that the default FILE_GENERIC_WRITE permission can be problematic, as it may allow unintended users (in our case authenticated users) to open additional pipe instances. We can fix this by constructing a custom access mask (Generic Read + File Write Data) that grants only the necessary I/O rights.

Comment thread lib/teleterm/autoupdate/privilegedupdater/service_windows.go
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privileged_updater_service_windows.go Outdated
Comment thread lib/teleterm/autoupdate/privilegedupdater/service_windows.go
@ravicious ravicious self-requested a review February 13, 2026 17:28
@cthach cthach self-requested a review February 18, 2026 01:21
Comment thread lib/teleterm/autoupdate/privilegedupdater/service_windows.go Outdated
@gzdunek gzdunek requested a review from nklaassen February 19, 2026 15:37
Comment thread lib/teleterm/autoupdate/privilegedupdater/service_windows.go Outdated
Comment on lines +404 to +407
// This function runs with SYSTEM privileges and relies on the Go standard library’s
// os.RemoveAll implementation on Windows. It detects reparse points (symlinks and
// junctions) and removes the link itself without ever recursing into the target,
// mitigating junction/symlink crossing attacks.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find much about this in the godoc or the source of os.RemoveAll.

https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/path.go;l=68-75
https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/removeall_at.go;l=15

I mean maybe these two lines point to it? https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/removeall_at.go;l=67-68 But it's very obscure.

Where did you find that? :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think this behavior is explicitly documented anywhere. I based my understanding on this change: kkpan11/go@6d41809 (which switches to removefileat, as in your third link).

Under the hood, removefileat is implemented as:

windows.Deleteat(dirfd, name, windows.FILE_NON_DIRECTORY_FILE)

My understanding is that this deletes name if it is a regular file, a symlink, or a junction. If it's an actual directory, the call fails due to the FILE_NON_DIRECTORY_FILE flag, and removeAll then falls back to recursively removing its contents.

Before opening the PR, I manually tested that it doesn't recurse into symlinks/junctions, and we also have an integration test for that:

require.NoError(t, err, "cleanup must not remove files outside base dir via junction traversal")

Comment thread lib/teleterm/autoupdate/privilegedupdater/client_windows.go Outdated
// It starts the update service, sends update metadata, and transfers the binary for validation and installation.
func RunServiceAndInstallUpdateFromClient(ctx context.Context, path string, forceRun bool, version string) error {
if err := ensureServiceRunning(ctx); err != nil {
// Service failed to start; fall back to client-side install (UAC).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would make a good error log message and we could also include err to know what the error was, no? At the moment err is just lost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the error is currently being swallowed. The issue is that there’s no way to log it at that point - Connect has already exited, and tsh.exe doesn't write logs to file (we would need a separate file just for this call).

I addressed this in #63573 (comment). Connect now invokes RunServiceAndInstallUpdateFromClient synchronously, which allows it to properly capture and handle any errors returned from that call.

I also moved the UAC fallback logic into the JS layer, which feels like a more appropriate place for it.

@gzdunek gzdunek requested a review from nklaassen February 20, 2026 10:05
Copy link
Copy Markdown
Contributor

@cthach cthach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@gzdunek gzdunek added this pull request to the merge queue Mar 2, 2026
@gzdunek gzdunek removed this pull request from the merge queue due to a manual request Mar 2, 2026
@gzdunek gzdunek added this pull request to the merge queue Mar 6, 2026
Merged via the queue into master with commit ad36d4e Mar 6, 2026
39 checks passed
@gzdunek gzdunek deleted the gzdunek/connect-updater-1 branch March 6, 2026 13:32
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@gzdunek See the table below for backport results.

Branch Result
branch/v17 Failed
branch/v18 Failed

nixpig pushed a commit that referenced this pull request Mar 11, 2026
* Add privileged updater service

* Add integration tests for updater

* Review fixes

* Move privileged updater to its own module

* Fix comments

* Interpolate registry pathnames, switch errors to AccessDenied

* Improve error handling in `waitForSingleClient`

* Use stricter DACL for named pipe

* Close `conn` on context cancellation

* Move reading update meta to separate function

* `trace.LimitExceeded` -> `trace.Errorf`

* Fix test

* Ensure updater only allows HTTPS

* Use TLS server in tests

* Fix tests
gzdunek added a commit that referenced this pull request Mar 27, 2026
* Add privileged updater service

* Add integration tests for updater

* Review fixes

* Move privileged updater to its own module

* Fix comments

* Interpolate registry pathnames, switch errors to AccessDenied

* Improve error handling in `waitForSingleClient`

* Use stricter DACL for named pipe

* Close `conn` on context cancellation

* Move reading update meta to separate function

* `trace.LimitExceeded` -> `trace.Errorf`

* Fix test

* Ensure updater only allows HTTPS

* Use TLS server in tests

* Fix tests

(cherry picked from commit ad36d4e)
Copy link
Copy Markdown

@adrian-doyensec adrian-doyensec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sound from the security point of view. (I’m ignoring the missing certificate verification here since that is addressed in #63573)

Comment on lines +394 to +397
dacl, _, err := sa.SecurityDescriptor.DACL()
if err != nil {
return trace.Wrap(err, "reading DACL from security descriptor")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: you could defensively check also dacl != nil here, since a NULL DACL means unrestricted access. That said, this is only a sanity check, not real ACL validation. The descriptor is currently hardcoded, and a non-nil DACL could still be maliciously weak.

gzdunek added a commit that referenced this pull request Mar 30, 2026
* Add privileged updater service

* Add integration tests for updater

* Review fixes

* Move privileged updater to its own module

* Fix comments

* Interpolate registry pathnames, switch errors to AccessDenied

* Improve error handling in `waitForSingleClient`

* Use stricter DACL for named pipe

* Close `conn` on context cancellation

* Move reading update meta to separate function

* `trace.LimitExceeded` -> `trace.Errorf`

* Fix test

* Ensure updater only allows HTTPS

* Use TLS server in tests

* Fix tests

(cherry picked from commit ad36d4e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/branch/v17 backport/branch/v18 no-changelog Indicates that a PR does not require a changelog entry size/lg

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants