Connect: add privileged updater service#63572
Conversation
cthach
left a comment
There was a problem hiding this comment.
I didn't review the whole thing yet, but I noticed that at a glance, there are no tests.
Can we please add tests for this new addition?
|
Yeah, there are no tests right now 😞 mainly because I didn't have a good way to test this. The code is heavily Windows-specific, while our CI runs on Linux. However, I've just realized that we already run some integration tests on Windows (#57059). I'll work on adding integration tests for this updater as well. This will require some minor changes, for example adding an option to run the service implementation directly, instead of through Windows SCM. I'll open a separate PR for it so this one doesn't grow further. |
|
@gzdunek Can we add test coverage to this PR? The main thing with test coverage is to encode expected behavior into the test. How you do that is up to you, but let's make sure that is covered in this PR so we don't have someone accidentally cause a regression because that expectation was just in our collective understanding rather than in test code. Spending extra time doing that now will save us more time in the future when someone has to figure out those implicit expectations when resolving a regression under time pressure. |
Adding on to @russjones, perhaps instead of a separate PR for tests, we do that here? The pro is we have related code and tests in a single commit. The con is as you mentioned, it'll be longer in size, but I feel like the tradeoff is worth it so we have the confidence that the code that we intend to commit is and will continue to behave as expected/intended. From a security angle, it is personally hard for me to convince myself that this privileged piece of code that has the ability to autonomously install more code should go live in prod (edit: eventually, I'm aware merging doesn't go straight to prod) without high levels of confidence it will do exactly what we expect it to do all scenarios. While we can't test everything, some tests are better than no tests. |
|
Sure, I added the integrations tests to this PR. These tests will run on CI once we merge #63732. I'll also include the manual testing steps once I get the tag builds published. I’m running into some random errors with that at the moment. |
Thank you! I just reviewed the RFD to get some context. Now I'm reviewing this specific PR now. |
cthach
left a comment
There was a problem hiding this comment.
I don't have much Windows experience, so I reviewed what I could.
Please ensure we get a SME with Windows expertise to give this a review.
Also, please document any manual tests you did.
ravicious
left a comment
There was a problem hiding this comment.
Got to waitForSingleClient in lib/teleterm/autoupdate/privileged_updater_service_windows.go, I'll continue the review next week.
| // Allow SYSTEM/Admins Full Control, Authenticated Users read/write, implicitly denies everyone else. | ||
| pipeSecurityDescriptor = "D:" + // DACL | ||
| "(A;;GA;;;SY)" + // Allow (A);; Generic All (GA);;; SYSTEM (SY) | ||
| "(A;;GA;;;BA)" + // Allow (A);; Generic All (GA);;; Built-in Admins (BA) | ||
| "(A;;GRGW;;;AU)" // Allow (A);; Generic Read/Write (GRGW);;; Authenticated Users (AU) | ||
| updateDirSecurityDescriptor = "O:SY" + // Owner SYSTEM | ||
| "D:P" + // 'P' blocks permissions inheritance from the parent directory | ||
| "(A;OICI;GA;;;SY)" + // Allow System Full Access | ||
| "(A;OICI;GA;;;BA)" // Allow Built-in Administrators Full Access |
There was a problem hiding this comment.
Is there a some kind of an article about pipes, folders and DACL that we could use to cross-check these descriptors against?
The RFD isn't super specific on this. But I also see it says "First try to create a directory with correct ACLs, granting write access only to SYSTEM and Administrators" but here we grant them GA.
In other words, how do we know this is secure? :)
There was a problem hiding this comment.
The RFD isn't super specific on this. But I also see it says "First try to create a directory with correct ACLs, granting write access only to SYSTEM and Administrators" but here we grant them GA.
I phrased it incorrectly. Instead of "granting write-only access to" I should have said "granting access only to"(as in Doyensec's comment). We can't grant write-only access, since the service also needs permission to read and delete data.
Is there a some kind of an article about pipes, folders and DACL that we could use to cross-check these descriptors against?
It's surprisingly difficult to find good examples of how to configure a DACL for requirements like ours. Here's some documentation explaining how the different parts of a security descriptor work:
General DACL structure:
https://techcommunity.microsoft.com/blog/askds/the-security-descriptor-definition-language-of-love-part-1/395202
https://techcommunity.microsoft.com/blog/askds/the-security-descriptor-definition-language-of-love-part-2/395258
Securing pipes:
https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipe-security-and-access-rights
https://stackoverflow.com/questions/29947524/c-let-user-process-write-to-local-system-named-pipe-custom-security-descrip
Based on these references, I've narrowed down the pipe's access configuration slightly. Microsoft's documentation notes that the default FILE_GENERIC_WRITE permission can be problematic, as it may allow unintended users (in our case authenticated users) to open additional pipe instances. We can fix this by constructing a custom access mask (Generic Read + File Write Data) that grants only the necessary I/O rights.
| // This function runs with SYSTEM privileges and relies on the Go standard library’s | ||
| // os.RemoveAll implementation on Windows. It detects reparse points (symlinks and | ||
| // junctions) and removes the link itself without ever recursing into the target, | ||
| // mitigating junction/symlink crossing attacks. |
There was a problem hiding this comment.
I cannot find much about this in the godoc or the source of os.RemoveAll.
https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/path.go;l=68-75
https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/removeall_at.go;l=15
I mean maybe these two lines point to it? https://cs.opensource.google/go/go/+/refs/tags/go1.26.0:src/os/removeall_at.go;l=67-68 But it's very obscure.
Where did you find that? :D
There was a problem hiding this comment.
Yeah, I don't think this behavior is explicitly documented anywhere. I based my understanding on this change: kkpan11/go@6d41809 (which switches to removefileat, as in your third link).
Under the hood, removefileat is implemented as:
windows.Deleteat(dirfd, name, windows.FILE_NON_DIRECTORY_FILE)My understanding is that this deletes name if it is a regular file, a symlink, or a junction. If it's an actual directory, the call fails due to the FILE_NON_DIRECTORY_FILE flag, and removeAll then falls back to recursively removing its contents.
Before opening the PR, I manually tested that it doesn't recurse into symlinks/junctions, and we also have an integration test for that:
| // It starts the update service, sends update metadata, and transfers the binary for validation and installation. | ||
| func RunServiceAndInstallUpdateFromClient(ctx context.Context, path string, forceRun bool, version string) error { | ||
| if err := ensureServiceRunning(ctx); err != nil { | ||
| // Service failed to start; fall back to client-side install (UAC). |
There was a problem hiding this comment.
This comment would make a good error log message and we could also include err to know what the error was, no? At the moment err is just lost.
There was a problem hiding this comment.
Yeah, the error is currently being swallowed. The issue is that there’s no way to log it at that point - Connect has already exited, and tsh.exe doesn't write logs to file (we would need a separate file just for this call).
I addressed this in #63573 (comment). Connect now invokes RunServiceAndInstallUpdateFromClient synchronously, which allows it to properly capture and handle any errors returned from that call.
I also moved the UAC fallback logic into the JS layer, which feels like a more appropriate place for it.
* Add privileged updater service * Add integration tests for updater * Review fixes * Move privileged updater to its own module * Fix comments * Interpolate registry pathnames, switch errors to AccessDenied * Improve error handling in `waitForSingleClient` * Use stricter DACL for named pipe * Close `conn` on context cancellation * Move reading update meta to separate function * `trace.LimitExceeded` -> `trace.Errorf` * Fix test * Ensure updater only allows HTTPS * Use TLS server in tests * Fix tests
* Add privileged updater service * Add integration tests for updater * Review fixes * Move privileged updater to its own module * Fix comments * Interpolate registry pathnames, switch errors to AccessDenied * Improve error handling in `waitForSingleClient` * Use stricter DACL for named pipe * Close `conn` on context cancellation * Move reading update meta to separate function * `trace.LimitExceeded` -> `trace.Errorf` * Fix test * Ensure updater only allows HTTPS * Use TLS server in tests * Fix tests (cherry picked from commit ad36d4e)
adrian-doyensec
left a comment
There was a problem hiding this comment.
This looks sound from the security point of view. (I’m ignoring the missing certificate verification here since that is addressed in #63573)
| dacl, _, err := sa.SecurityDescriptor.DACL() | ||
| if err != nil { | ||
| return trace.Wrap(err, "reading DACL from security descriptor") | ||
| } |
There was a problem hiding this comment.
Minor: you could defensively check also dacl != nil here, since a NULL DACL means unrestricted access. That said, this is only a sanity check, not real ACL validation. The descriptor is currently hardcoded, and a non-nil DACL could still be maliciously weak.
* Add privileged updater service * Add integration tests for updater * Review fixes * Move privileged updater to its own module * Fix comments * Interpolate registry pathnames, switch errors to AccessDenied * Improve error handling in `waitForSingleClient` * Use stricter DACL for named pipe * Close `conn` on context cancellation * Move reading update meta to separate function * `trace.LimitExceeded` -> `trace.Errorf` * Fix test * Ensure updater only allows HTTPS * Use TLS server in tests * Fix tests (cherry picked from commit ad36d4e)
Contributes to #59295
RFD #62545
Part 1/2 of the updater service. The aim of the service is to silently install per-machine updates, without requiring admin privileges from the user.
The update process is described in the RFD https://github.com/gravitational/teleport/blob/rfd/0242-improve-windows-installation-experience/rfd/0242-improve-windows-installation-experience.md#update-process.
While working on the implementation, I've made slight adjustments to the update process (I also updated the RFD):
%ProgramData%\TeleportConnect\Updates\<GUID>; it is now%ProgramData%\TeleportConnectUpdater\<GUID>. This makes the directory easier to secure.Manual Test Plan
Tested along with #63573. The test plan is available there.