Move cluster state to main process by gzdunek · Pull Request #59643 · gravitational/teleport

gzdunek · 2025-09-26T12:27:36Z

Contributes to #25806

Part 2/2 of moving cluster state to the main process.

Centralizing the cluster state in the main process makes it easier to read it and update in both the main and the renderer processes.

Originally, I had planned to refactor the overall shape of the state as well, but that turned out to be a really large change. So for now, I've only moved the necessary parts of the state to the main process, and aimed to keep everything working as before.

The three main behaviors I wanted to maintain are:

State is updated before the handlers resolve
This is achieved using AwaitableSender, which ensures that the renderer acknowledges each update message before the handler completes. This maintains the previous assumption that the state is immediately available after an update.
useStateSelector can still work effectively
In the previous implementation, calling setState only updated the parts of the state that actually changed, allowing useStateSelector to avoid unnecessary re-renders.
If we were to send the full state over IPC, it would be serialized with structuredClone, resulting in a new object each time, which would break referential stability and make useStateSelector useless.
To avoid this, we now generate patches in the main process (using immer) and apply them in the renderer. This preserves object references where possible and keeps useStateSelector working effectively. This feature is probably not super important, I'm treating it as a nice to have, as it was easy to achieve.
Errors are still handled in the notifications. This is done by invoking all the functions from the renderer, so we get the errors back. I'd still want to change that and something like error field to the cluster state, I'm going to refactor the state shape in the future.

web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts

web/packages/teleterm/src/mainProcess/mainProcess.ts

web/packages/teleterm/src/mainProcess/rootClusterProxyHostAllowList.ts

web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.test.ts

web/packages/teleterm/src/ui/services/clusters/clustersService.ts

web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts

ravicious

Found a bug related to the proxy host allow list.

ravicious · 2025-10-16T16:30:06Z

web/packages/teleterm/src/ui/services/clusters/clustersService.ts

-          // The workaround is to update the field in case of a failure,
-          // so the places that wait for showResources !== UNSPECIFIED don't get stuck indefinitely.
-          cluster.showResources = ShowResources.ACCESSIBLE_ONLY;
+  private subscribeToClusterStore(): void {


Do you think it makes sense to add tests which check if subscribeToClusterStore indeed preserves identity? I was thinking that it is something that can easily slip past us, OTOH… Is it even possible for it to not preserve identity if it's backed by Immer? This also got me thinking and I came to the conclusion that this statement:

If we were to send the full state over IPC, it would be serialized with structuredClone, resulting in a new object each time, which would break referential stability and make useStateSelector useless.

is not necessarily correct. If we were sending the full state, then Immer would still take care to change only those parts of the state that actually need to change. It could just end up being super expensive. I remember Bartosz had some performance problems initially when he switched to Immer in the role editor.

With that said I suppose I answered my one question: we don't need those identity tests because it's impossible for subscribeToClusterStore to not preserve object identity.

Funny that I wrote this comment:

teleport/web/packages/teleterm/src/ui/services/immutableStore/immutableStore.ts

Lines 62 to 67 in 127e916

// It doesn't appear to be explicitly documented anywhere, but Immer preserves object

// identity, so Object.is works as expected. This behavior is covered by our tests.

const hasSelectedStateChanged = !Object.is(

newSelectedState,

selectedState

);

This feature is probably not super important, I'm treating it as a nice to have, as it was easy to achieve.

What do you mean it's not super important? You mean keeping useStateSelector working? I feel like it's super important because otherwise many places in the app would start re-rendering way more often than necessary! 😅

If we were sending the full state, then Immer would still take care to change only those parts of the state that actually need to change. It could just end up being super expensive. I remember Bartosz had some performance problems initially when he switched to Immer in the role editor.

I'm not sure if I understand.
If were sending a full state, it would be applied in the renderer in the following way:

this.setState(c => { c.clusters = castDraft(e.value); })

Now, if we send an update with the full state, but with one cluster having flipped the connected flag, would Immer really be that smart to only modify that one flag? I was under impression that it would just always replace c.clusters with a new value.

In the docs there's a following example:

case "adduser-3": // OK: returning a new state. But, unnecessary complex and expensive return { userCount: draft.userCount + 1, users: [...draft.users, action.payload] }

But here we merge the state manually.

With that said, I think we probably don't need a test for that, as long as we produce patches on the one side and consume it on the another. Immer will take of preserving the identity.

What do you mean it's not super important? You mean keeping useStateSelector working? I feel like it's super important because otherwise many places in the app would start re-rendering way more often than necessary! 😅

Ah, I was thinking we still use ClustersService.useState quite a lot, but actually it’s not that bad - there are only a few instances of it, and they’re not even that high up in the component tree. So indeed, it's important to keep useStateSelector working!

Now, if we send an update with the full state, but with one cluster having flipped the connected flag, would Immer really be that smart to only modify that one flag? I was under impression that it would just always replace c.clusters with a new value.

It turns out I was wrong, it'd indeed replace it with a new value.

I added this patch:

Patch

diff --git a/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts b/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts index 985fd0658be..c9c74db872f 100644 --- a/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts +++ b/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts @@ -172,8 +172,8 @@ export class ClusterStore { this.senders.values().map(sender => { const send = this.withErrorHandling(update => sender.send(update)); return send({ - kind: 'patches', - value: patches, + kind: 'state', + value: this.state, }); }) ); diff --git a/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx b/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx index a1d8576b4b7..5b6e0494f52 100644 --- a/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx +++ b/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx @@ -44,6 +44,8 @@ export function StatusBar(props: { onAssumedRolesClick(): void }) { const assumed = useAssumedRequests(rootClusterUri); const assumedRolesText = getAssumedRoles(assumed); + console.log('%c Rendering StatusBar', 'background: #222; color: #bada55'); + return ( <Flex width="100%"

You can see that the status bar does not re-render when I close a gateway (thus updating only the gateways part of the clusters service). But when I log out of a cluster, the states bar does get re-rendered.

Rendering with full state updates

rendering.with.full.state.mov

This does not happen when only patches are sent through:

Rendering with patches

rendering.with.patches.mov

web/packages/teleterm/src/mainProcess/windowsManager.ts

web/packages/teleterm/src/mainProcess/rootClusterProxyHostAllowList.ts

ravicious · 2025-10-17T16:32:34Z

web/packages/teleterm/src/mainProcess/mainProcess.ts

-      );
-    });
+    this.clusterStore = new ClusterStore(
+      this.getTshdClients().then(c => c.terminalService),


The coordination between the processes is getting a little hectic and we don't have it well documented. If I were to look at MainProcess and ClusterStore with no knowledge about them, my questions would be:

What if getTshdClients returns an error? How is the error surfaced to the user?

If the renderer process directly depends on ClusterStore and ClusterStore depends on the result of a promise, what happens if that promise hangs indefinitely? Can the renderer reasonably assume that ClusterStore is ready when the renderer wants to talk to it?

The answer to both questions is concealed in the fact that both getTshdClients and the startup of the frontend app depend on the result of the same promise. The error from said promise is surfaced primarily in the UI of the renderer.

Could you add docs for getTshdClients and resolvedChildProcessAddresses that would provide some context behind this?

getTshdClients can also be made private now.

I realized that passing a promise directly to the constructor is problematic: if it rejects before any callsite attaches a handler, it triggers an unhandled promise rejection. I fixed this by updating ClusterStore to accept a function that returns a promise instead.

The error is propagated to the caller. Currently, only the renderer invokes the ClusterStore (so the errors are handled in the same way as of today).

If the renderer process directly depends on ClusterStore and ClusterStore depends on the result of a promise, what happens if that promise hangs indefinitely?

Then any call to ClusterStore that depends on that promise would also hang. This doesn’t seem to have any new effect on the renderer. If resolvedChildProcessAddresses promise hangs, then the renderer will be stuck on the loading screen, and don't even call any ClusterStore method.

Can the renderer reasonably assume that ClusterStore is ready when the renderer wants to talk to it?

Yes, I think so. Since the ClusterStore is now initialized synchronously, it can immediately start accepting requests (which will wait for the tshd client to initialize).

…ener`

* Create `ClusterStore` that manages cluster state * Fix tests that mocked tshd directly * Remove IPC to notify the main process about cluster list changes * Load immer plugins in `MainProcess` * Improve comments * Refactor `useSender` * Get rid of unnecessary Map and try/catch around send * Get rid of `MainProcess.create` * Do not return early `c.proxyHost` is falsy * Add more context to test * Add missing logout handler in main process * Fix applying patches * Adjust `subscribeToClusterStore` to updated `startAwaitableSenderListener` * Crash window when sending state update fails * Extract WebContents navigation handlers and add tests for opening links * Improve error message * Initialize `ClusterStore` synchronously * Convert `lazyTshdClient` field to `getTshdClient` function, add docs * Remove unused eslint directive

ravicious · 2025-11-06T09:17:31Z

Just a reminder that this will need to be backported together with #61044.

* Create `ClusterStore` that manages cluster state * Fix tests that mocked tshd directly * Remove IPC to notify the main process about cluster list changes * Load immer plugins in `MainProcess` * Improve comments * Refactor `useSender` * Get rid of unnecessary Map and try/catch around send * Get rid of `MainProcess.create` * Do not return early `c.proxyHost` is falsy * Add more context to test * Add missing logout handler in main process * Fix applying patches * Adjust `subscribeToClusterStore` to updated `startAwaitableSenderListener` * Crash window when sending state update fails * Extract WebContents navigation handlers and add tests for opening links * Improve error message * Initialize `ClusterStore` synchronously * Convert `lazyTshdClient` field to `getTshdClient` function, add docs * Remove unused eslint directive

* Create `ClusterStore` that manages cluster state * Fix tests that mocked tshd directly * Remove IPC to notify the main process about cluster list changes * Load immer plugins in `MainProcess` * Improve comments * Refactor `useSender` * Get rid of unnecessary Map and try/catch around send * Get rid of `MainProcess.create` * Do not return early `c.proxyHost` is falsy * Add more context to test * Add missing logout handler in main process * Fix applying patches * Adjust `subscribeToClusterStore` to updated `startAwaitableSenderListener` * Crash window when sending state update fails * Extract WebContents navigation handlers and add tests for opening links * Improve error message * Initialize `ClusterStore` synchronously * Convert `lazyTshdClient` field to `getTshdClient` function, add docs * Remove unused eslint directive (cherry picked from commit a41d021)

* Combine `ClustersService` logout functions (#59539) * Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext` * Review callsites to ensure cluster is properly checked before being accessed * Revert "Review callsites to ensure cluster is properly checked before being accessed" This reverts commit 8343c3c. * Switch to removing the cluster at the end of logout sequence * Lint * Move `logoutWithCleanup` to `ui/ClusterLogout` (cherry picked from commit de6b4ed) * Enable sending messages from main to renderer with acknowledgments (#59642) * Create awaitable sender * Review comments * Fix test and lint (cherry picked from commit 5dc76fe) * Move cluster state to main process (#59643) * Create `ClusterStore` that manages cluster state * Fix tests that mocked tshd directly * Remove IPC to notify the main process about cluster list changes * Load immer plugins in `MainProcess` * Improve comments * Refactor `useSender` * Get rid of unnecessary Map and try/catch around send * Get rid of `MainProcess.create` * Do not return early `c.proxyHost` is falsy * Add more context to test * Add missing logout handler in main process * Fix applying patches * Adjust `subscribeToClusterStore` to updated `startAwaitableSenderListener` * Crash window when sending state update fails * Extract WebContents navigation handlers and add tests for opening links * Improve error message * Initialize `ClusterStore` synchronously * Convert `lazyTshdClient` field to `getTshdClient` function, add docs * Remove unused eslint directive (cherry picked from commit a41d021) * Connect: make logout function idempotent (#60553) * Remove `ClusterRemove` RPC, make logging out idempotent * Move calling `removeKubeConfig` and `maybeRemoveAppUpdatesManagingCluster` to main process The main process should not depend on the renderer to clean up its own resources. * Remove cleaning up kube dir * Lint (cherry picked from commit 2d1bc7b) * Connect: add profile watcher (#60622) * Add profile watcher * Move `makeClusterWithOnlyProfileProperties` to `profileWatcher.ts`, improve test * Handle watched directory removal * Improve comments * Make tests faster, pass abort signal everywhere * Improve docs * Make `removing tsh directory does not break watcher` easier to understand * Make test dir per test * Improve timing in tests * Add a limit of how many events can be emitted by `fs.watch` (to break the endless stream of events on Windows when watched dir is removed), go into the polling mode only when it's expected that the watched dir was removed * Use `expect().rejects.toThrow` correctly * Deflake 'max file system events count is restricted' * Replace `makeClusterWithOnlyProfileProperties` with `mergeClusterProfileWithDetails`, move it back to `cluster.ts` * Attempt to fix tests * Clarify comment (cherry picked from commit d4e6f19) * Initialize tshdClients in MainProcess constructor (#61044) (cherry picked from commit c7a4233) * Connect: react to tsh actions by watching tsh dir (#60884) * Add `ClusterLifecycleManager` * Register handlers for adding, removing and logging out from cluster * Provide `rootCluster` in `useWorkspaceContext` The handlers in the profile watcher will proceed with updating the cluster store, even if the renderer handlers returned errors. This check protects us from a runtime error if the renderer fails to remove the workspace. * Improve docs * Move processing queue to listener * Make `will-` operations always interrupt main process actions * Improve error messages * Do not remove managing cluster when **only** logging out The app updater displays all clusters, not just those the user is logged into. * Revert "Provide `rootCluster` in `useWorkspaceContext`" This reverts commit cf76d2b. * Rename `logoutWithCleanup` to `cleanUpBeforeLogout` * Do not pass `AbortSignal` to `this.mainProcessClient.syncRootClusters` * Lint * Fix types issues * Do not stack watcher notifications (cherry picked from commit 5fa8249) * Connect: close cluster clients when profile changes (#61090) * Include expiration time in `LoggedInUser` This will allow the profile watcher to detect when the user relogged. * Display expiration time in UI * Add `ClearStaleClusterClients` RPC * Implement `ClearStaleClusterClients` * Clear stale clients when profile changes * Improve session expiration component * Move refresh button back to top * `ClearCachedStaleClientsForRoot` -> `ClearStaleCachedClientsForRoot` * `unchanged` -> `stale` * Make "closing stale clients" a subtest * Add `clientcache` test * Remove `getProfile` error wrapping * Improve comment * Convert story to controls (cherry picked from commit 6615e42) * Gracefully handle missing `current-profile` and respect `TELEPORT_PROXY` in `tsh status` (#61295) * Respect `TELEPORT_PROXY` env var in `tsh status` * Enable listing profiles if there is no active profile * Add test * Define `err` within the block where it's actually used * Handle missing current profile in `tsh logout` * Make check more explicit * Revert mistakenly commited change (cherry picked from commit 95bec3a) * Connect: switch tsh home directory to ~/.tsh (#61352) * Switch tsh home directory to ~/.tsh * Migrate old tsh home to new location, disallow updating fields outside the `state` key in app_state.json from the renderer process * Show banner about migrated tsh home * `promoteMigratedTshHome` -> `showTshHomeMigrationBanner` * `MigratedTshHomeBanner` -> `TshHomeMigrationBanner` * 'Profiles are' -> 'Profiles are now', remove unnecessary space * Fix assigning colors for new workspaces * Improve logs (cherry picked from commit 54b5f6c) * Connect: refresh resources when access changes and add tests for `ClusterLifecycleManager` (#61479) * Detect when user's access changes * Refresh resources in UI when `did-change-access` is received * Add tests for `ClusterLifecycleManager` * Add better docs for ClusterLifecycleEvent * Test assuming requests too * Improve test names (cherry picked from commit 4b00520) * Set up deep links as soon as possible (#61668) (cherry picked from commit 0b5ab6b) * Serialize IPC errors (#61665) * Serialize all enumerable error fields * Add wrappers around `ipcMain.handle` and `ipcRenderer.invoke` * Fix `Method Error.prototype.toString called on incompatible receiver undefined` * Improve docs * Lint (cherry picked from commit a1f2ae0) * Fix unrecoverable ssh cert errors in tsh/Connect (#61322) * Initialize default Username/HostLogin only in tsh * Move `Username()` from `api.go` to `tsh.go` * Remove wrong `Profile.SiteName` default * Remove resetting `SiteName` Not sure why it was needed. Perhaps to clear the default that we just removed? But even if add the default back and remove this fix, everything works. * Gracefully handle missing SSH/TLS certs * Remove unused `TeleportClient.LoadKeyForClusterWithReissue` * Revert "Move `Username()` from `api.go` to `tsh.go`" This reverts commit f7ff0ff. * Revert "Initialize default Username/HostLogin only in tsh" This reverts commit ed38bab. * When any of SSH/TLS cert is missing, return partial profile * Only log non-nil errors * Revert "Remove wrong `Profile.SiteName` default" * Revert "Remove resetting `SiteName`" This reverts commit f54ab3f. * Set `SiteName` when adding cluster * Improve comments * Add test * Fix test * Add myself to TODO * Add test for logging out with missing SSH cert * Lint (cherry picked from commit cd3c8f8) * Connect: update docs for sharing ~/.tsh directory (#61467) * Update docs for sharing ~/.tsh directory * Review comments * Lint (cherry picked from commit 19533bf) --------- Co-authored-by: ravicious <rafal.cieslak@goteleport.com>

gzdunek requested review from avatus and ravicious September 26, 2025 12:27

gzdunek added the no-changelog Indicates that a PR does not require a changelog entry label Sep 26, 2025

github-actions bot added size/md ui labels Sep 26, 2025

github-actions bot requested review from alexhemard and ryanclark September 26, 2025 12:28

gzdunek removed request for alexhemard and ryanclark September 26, 2025 12:28

gzdunek force-pushed the gzdunek/cluster-store branch from a61b146 to 3b0f6fc Compare September 26, 2025 12:34

ravicious reviewed Oct 7, 2025

View reviewed changes

ravicious self-requested a review October 7, 2025 16:27

ravicious reviewed Oct 8, 2025

View reviewed changes

gzdunek force-pushed the gzdunek/awaitable-sender branch from c9a9402 to 87fda96 Compare October 13, 2025 12:32

gzdunek force-pushed the gzdunek/cluster-store branch from 3b0f6fc to 3e4c985 Compare October 13, 2025 12:33

gzdunek force-pushed the gzdunek/awaitable-sender branch from 87fda96 to 032b4ac Compare October 13, 2025 13:36

gzdunek force-pushed the gzdunek/cluster-store branch 2 times, most recently from 67d5a63 to 6669dd1 Compare October 15, 2025 12:00

gzdunek requested a review from ravicious October 15, 2025 12:07

ravicious reviewed Oct 15, 2025

View reviewed changes

ravicious self-requested a review October 15, 2025 15:54

ravicious mentioned this pull request Oct 15, 2025

Enable sending messages from main to renderer with acknowledgments #59642

Merged

ravicious reviewed Oct 16, 2025

View reviewed changes

gzdunek mentioned this pull request Oct 17, 2025

Enable noImplictThis rule #60346

Merged

gzdunek requested a review from ravicious October 17, 2025 13:37

ravicious approved these changes Oct 17, 2025

View reviewed changes

avatus approved these changes Oct 20, 2025

View reviewed changes

Base automatically changed from gzdunek/awaitable-sender to master October 22, 2025 10:29

gzdunek added 2 commits October 22, 2025 12:35

Create ClusterStore that manages cluster state

4484b83

Fix tests that mocked tshd directly

557653b

gzdunek added 13 commits October 22, 2025 12:35

Refactor useSender

2901930

Get rid of unnecessary Map and try/catch around send

6234d85

Get rid of MainProcess.create

158683e

Do not return early c.proxyHost is falsy

8095850

Add more context to test

6b2a039

Add missing logout handler in main process

bd5e38f

Fix applying patches

d3e6bea

Adjust subscribeToClusterStore to updated `startAwaitableSenderList…

7207924

…ener`

Crash window when sending state update fails

6d06106

Extract WebContents navigation handlers and add tests for opening links

e9f0629

Improve error message

211af50

Initialize ClusterStore synchronously

d146f54

Convert lazyTshdClient field to getTshdClient function, add docs

75e6c59

gzdunek force-pushed the gzdunek/cluster-store branch from 8d21d0d to 75e6c59 Compare October 22, 2025 10:36

Remove unused eslint directive

11a1faa

gzdunek enabled auto-merge October 22, 2025 10:43

gzdunek added this pull request to the merge queue Oct 22, 2025

Merged via the queue into master with commit a41d021 Oct 22, 2025
41 checks passed

gzdunek deleted the gzdunek/cluster-store branch October 22, 2025 11:04

ravicious mentioned this pull request Nov 5, 2025

Connect: Initialize tshdClients in MainProcess constructor #61044

Merged

gzdunek mentioned this pull request Nov 21, 2025

Serialize IPC errors #61665

Merged

gzdunek mentioned this pull request Nov 28, 2025

[v18] Share TELEPORT_HOME between Connect and tsh #61846

Merged

	// It doesn't appear to be explicitly documented anywhere, but Immer preserves object
	// identity, so Object.is works as expected. This behavior is covered by our tests.
	const hasSelectedStateChanged = !Object.is(
	newSelectedState,
	selectedState
	);

Conversation

gzdunek commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravicious left a comment

Choose a reason for hiding this comment

Uh oh!

ravicious Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gzdunek Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ravicious Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ravicious Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

gzdunek Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ravicious commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants