Skip to content

Combine ClustersService logout functions#59539

Merged
gzdunek merged 6 commits intomasterfrom
gzdunek/combine-logout-functions
Oct 13, 2025
Merged

Combine ClustersService logout functions#59539
gzdunek merged 6 commits intomasterfrom
gzdunek/combine-logout-functions

Conversation

@gzdunek
Copy link
Copy Markdown
Contributor

@gzdunek gzdunek commented Sep 24, 2025

While working on moving the cluster state to the main process, I got stuck on logging out.
To preserve the current behavior, I'd need to move both logout and removeCluster methods. That separation feels like a poor API design and made me revisit the original PR #24978 that introduced it.

I now believe it was more of a workaround than a proper fix. The real issue was that several parts of the code assumed that every workspace always has an associated cluster.
That assumption is incorrect; clusters and workspaces are managed separately, and there's no guarantee they'll always stay in sync.
Even today, if you remove a profile from disk and then call ClustersService.syncRootClustersAndCatchErrors, the app will likely crash because the cluster is suddenly missing.

If we had strictNullChecks enabled, each call to ClustersService.findCluster() would return Cluster | undefined, requiring the caller to explicitly handle the undefined case.

There are two ways we could address this:

  • Ensure that a workspace is only rendered when a corresponding cluster exists (the cluster could be added to useWorkspaceContext).
  • Require all callsites to explicitly check whether the cluster exists.

I chose the second approach. Most usages already perform a null check, so it was more consistent and easier to apply the same pattern to the remaining cases.

I also moved the logout code from useClusterLogout to the app context to enable the upcoming profile watcher to call it.

@gzdunek gzdunek requested review from avatus and ravicious September 24, 2025 13:53
@gzdunek gzdunek added the no-changelog Indicates that a PR does not require a changelog entry label Sep 24, 2025
@gzdunek gzdunek changed the title Combine Connect logout functions Combine ClustersService logout functions Sep 24, 2025
@github-actions github-actions bot requested a review from kiosion September 24, 2025 13:54
@gzdunek gzdunek removed the request for review from kiosion September 24, 2025 13:54
*/
/** Logs out of the cluster. */
async logout(clusterUri: uri.RootClusterUri) {
// TODO(gzdunek): logout and removeCluster should be combined into a single acton in tshd
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To preserve the current behavior, I'd need to move both logout and removeCluster methods. That separation feels like a poor API design and made me revisit the original PR #24978 that introduced it.

As far as I remember that PR didn't introduce logout & removeCluster. There are two separate RPCs for that because in the alpha version of Connect it was possible to log out of a cluster without removing it from the list of clusters in the app. Similar to how you can disconnect a gateway and only then remove it from the connections.

Though I think I see what you mean in the context of #24978 splitting the logout sequence into first changing connected to false and actually removing the cluster from the state at the very end.

Even today, if you remove a profile from disk and then call ClustersService.syncRootClustersAndCatchErrors, the app will likely crash because the cluster is suddenly missing.

I understand this becomes a larger concern when ~/.tsh sharing gets implemented, right? Because at the moment I don't think there are many opportunities to trigger ClustersService.syncRootClustersAndCatchErrors beyond the app start, but looking at its callsites what you described is technically possible.

Most usages already perform a null check, so it was more consistent and easier to apply the same pattern to the remaining cases.

That was surprising to me because if I had to bet I wouldn't have said that this is the case. 😅 I think in my head I've always assumed that we've had this sweet little invariant where the existence of a workspace at least implies that a root cluster is available.


I don't know, I'm not entirely opposed to this change, it just feels like a big departure from something I've always assumed was invariant, so I do feel a bit uneasy about it.

Perhaps we should document functions returning Cluster from ClustersService to note that they might return no cluster?

Copy link
Copy Markdown
Contributor Author

@gzdunek gzdunek Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember that PR didn't introduce logout & removeCluster. There are two separate RPCs for that because in the alpha version of Connect it was possible to log out of a cluster without removing it from the list of clusters in the app

Yeah, I meant the methods in ClustersService, not the tshd RPCs. Before that PR, the logout sequence started from the logout in tshd and removing the ClustersService state. In the PR, we switched to removing the state at the very end.

Tbh, even the fix that we added could be done easier. In the comment for ClustersService.logout we said:

A code that operates on that intermediate state is in useClusterLogout.tsx.
After invoking logout(), it looks for the next workspace to switch to. If we hadn't marked the cluster as disconnected, the method might have returned us the same cluster we wanted to log out of.

We could as well explicitly filter out that cluster when looking for the next connected workspace in useCluserLogout :)

I understand this becomes a larger concern when ~/.tsh sharing gets implemented, right? Because at the moment I don't think there are many opportunities to trigger ClustersService.syncRootClustersAndCatchErrors beyond the app start, but looking at its callsites what you described is technically possible.

This change isn't strictly necessary for sharing ~/.tsh, but having a single method helps make the logic a bit cleaner.
I assume that in the ideal world, it would work like that:

  1. The profile watcher detects a logout.
  2. It calls logout on the cluster service (in the main process) to update the internal state.
  3. It sends a request to the renderer to clean up its local state (or to multiple renderers in theory).

It still could be four steps (as we have it today), where step 2 only calls tshd and sets .connected = false, and a separate step 4 actually removes the cluster, but that's a tighter coupling between the renderer and main process than seems necessary.

One alternative is switching steps 2 and 3, so the cluster is removed at the very end. That way, it would be removed after the workspace.
However, this still doesn’t address the other issue: ClustersService.syncRootClustersAndCatchError can be triggered beyond just the app start, which could cause a mismatch between workspaces and clusters.
So maybe it’s cleaner to have the null checks, unless we can guarantee that these stores are always in sync (or alternatively, prevent this function from being called beyond app initialization).

But hmm, now that I think of it, maybe it actually has more sense to switch the steps? So the renderer first needs to remove the workspace and other dependencies and then we attempt to logout in tsh and remove the cluster (and we forget about ClustersService.syncRootClustersAndCatchError).

That was surprising to me because if I had to bet I wouldn't have said that this is the case. 😅

Maybe it wasn't the majority, but we did have 17 places with null checks and something around that I added in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing it through DMs with Rafał, we decided to switch the steps and perform the logout at the end of the logout sequence.
It makes more sense this way, since a cluster can exist without a workspace, but not the other way around.
If the logout in tshd fails, the app will remain usable. The cluster will still appear in the profile selector, allowing the user to retry the logout or open a new workspace for it.

To address the types issues, we should pass a cluster through the workspace context, so that we won't need all these null checks.
When it comes to ClustersService.syncRootClustersAndCatchError, it should be called only once, before creating the workspaces. I left a TODO item to fix the one incorrect usage.

Copy link
Copy Markdown
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a couple of scenarios with it and it seems to work fine.

}

/** Disposes cluster-related resources and then logs out. */
export async function logoutWithCleanup(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this to web/packages/teleterm/src/ui/ClusterLogout/logoutWithCleanup.ts? I know it'll be used by more than just ClusterLogout, but appContext doesn't feel like the best place for it and it really is just related to logging out so it'd be hard to find a better place for it than ClusterLogout.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move it there.

@gzdunek gzdunek added this pull request to the merge queue Oct 13, 2025
Merged via the queue into master with commit de6b4ed Oct 13, 2025
41 checks passed
@gzdunek gzdunek deleted the gzdunek/combine-logout-functions branch October 13, 2025 11:56
mmcallister pushed a commit that referenced this pull request Nov 6, 2025
* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`
rhammonds-teleport pushed a commit that referenced this pull request Nov 6, 2025
* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`
mmcallister pushed a commit that referenced this pull request Nov 19, 2025
* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`
mmcallister pushed a commit that referenced this pull request Nov 20, 2025
* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`
gzdunek added a commit that referenced this pull request Nov 28, 2025
* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`

(cherry picked from commit de6b4ed)
github-merge-queue bot pushed a commit that referenced this pull request Dec 4, 2025
* Combine `ClustersService` logout functions (#59539)

* Remove clusters immediately after a logout, move `useClusterLogout` to `AppContext`

* Review callsites to ensure cluster is properly checked before being accessed

* Revert "Review callsites to ensure cluster is properly checked before being accessed"

This reverts commit 8343c3c.

* Switch to removing the cluster at the end of logout sequence

* Lint

* Move `logoutWithCleanup` to `ui/ClusterLogout`

(cherry picked from commit de6b4ed)

* Enable sending messages from main to renderer with acknowledgments (#59642)

* Create awaitable sender

* Review comments

* Fix test and lint

(cherry picked from commit 5dc76fe)

* Move cluster state to main process (#59643)

* Create `ClusterStore` that manages cluster state

* Fix tests that mocked tshd directly

* Remove IPC to notify the main process about cluster list changes

* Load immer plugins in `MainProcess`

* Improve comments

* Refactor `useSender`

* Get rid of unnecessary Map and try/catch around send

* Get rid of `MainProcess.create`

* Do not return early `c.proxyHost` is falsy

* Add more context to test

* Add missing logout handler in main process

* Fix applying patches

* Adjust `subscribeToClusterStore` to updated `startAwaitableSenderListener`

* Crash window when sending state update fails

* Extract WebContents navigation handlers and add tests for opening links

* Improve error message

* Initialize `ClusterStore` synchronously

* Convert `lazyTshdClient` field to `getTshdClient` function, add docs

* Remove unused eslint directive

(cherry picked from commit a41d021)

* Connect: make logout function idempotent (#60553)

* Remove `ClusterRemove` RPC, make logging out idempotent

* Move calling `removeKubeConfig` and `maybeRemoveAppUpdatesManagingCluster` to main process

The main process should not depend on the renderer to clean up its own resources.

* Remove cleaning up kube dir

* Lint

(cherry picked from commit 2d1bc7b)

* Connect: add profile watcher (#60622)

* Add profile watcher

* Move `makeClusterWithOnlyProfileProperties` to `profileWatcher.ts`, improve test

* Handle watched directory removal

* Improve comments

* Make tests faster, pass abort signal everywhere

* Improve docs

* Make `removing tsh directory does not break watcher` easier to understand

* Make test dir per test

* Improve timing in tests

* Add a limit of how many events can be emitted by `fs.watch` (to break the endless stream of events on Windows when watched dir is removed), go into the polling mode only when it's expected that the watched dir was removed

* Use `expect().rejects.toThrow` correctly

* Deflake 'max file system events count is restricted'

* Replace `makeClusterWithOnlyProfileProperties` with `mergeClusterProfileWithDetails`, move it back to `cluster.ts`

* Attempt to fix tests

* Clarify comment

(cherry picked from commit d4e6f19)

* Initialize tshdClients in MainProcess constructor (#61044)

(cherry picked from commit c7a4233)

* Connect: react to tsh actions by watching tsh dir (#60884)

* Add `ClusterLifecycleManager`

* Register handlers for adding, removing and logging out from cluster

* Provide `rootCluster` in `useWorkspaceContext`

The handlers in the profile watcher will proceed with updating the cluster store, even if the renderer handlers returned errors.
This check protects us from a runtime error if the renderer fails to remove the workspace.

* Improve docs

* Move processing queue to listener

* Make `will-` operations always interrupt main process actions

* Improve error messages

* Do not remove managing cluster when **only** logging out

The app updater displays all clusters, not just those the user is logged into.

* Revert "Provide `rootCluster` in `useWorkspaceContext`"

This reverts commit cf76d2b.

* Rename `logoutWithCleanup` to `cleanUpBeforeLogout`

* Do not pass `AbortSignal` to `this.mainProcessClient.syncRootClusters`

* Lint

* Fix types issues

* Do not stack watcher notifications

(cherry picked from commit 5fa8249)

* Connect: close cluster clients when profile changes (#61090)

* Include expiration time in `LoggedInUser`

This will allow the profile watcher to detect when the user relogged.

* Display expiration time in UI

* Add `ClearStaleClusterClients` RPC

* Implement `ClearStaleClusterClients`

* Clear stale clients when profile changes

* Improve session expiration component

* Move refresh button back to top

* `ClearCachedStaleClientsForRoot` -> `ClearStaleCachedClientsForRoot`

* `unchanged` -> `stale`

* Make "closing stale clients" a subtest

* Add `clientcache` test

* Remove `getProfile` error wrapping

* Improve comment

* Convert story to controls

(cherry picked from commit 6615e42)

* Gracefully handle missing `current-profile` and respect `TELEPORT_PROXY` in `tsh status`  (#61295)

* Respect `TELEPORT_PROXY` env var in `tsh status`

* Enable listing profiles if there is no active profile

* Add test

* Define `err` within the block where it's actually used

* Handle missing current profile in `tsh logout`

* Make check more explicit

* Revert mistakenly commited change

(cherry picked from commit 95bec3a)

* Connect: switch tsh home directory to ~/.tsh (#61352)

* Switch tsh home directory to ~/.tsh

* Migrate old tsh home to new location, disallow updating fields outside the `state` key in app_state.json from the renderer process

* Show banner about migrated tsh home

* `promoteMigratedTshHome` -> `showTshHomeMigrationBanner`

* `MigratedTshHomeBanner` -> `TshHomeMigrationBanner`

* 'Profiles are' -> 'Profiles are now', remove unnecessary space

* Fix assigning colors for new workspaces

* Improve logs

(cherry picked from commit 54b5f6c)

* Connect: refresh resources when access changes and add tests for `ClusterLifecycleManager` (#61479)

* Detect when user's access changes

* Refresh resources in UI when `did-change-access` is received

* Add tests for `ClusterLifecycleManager`

* Add better docs for ClusterLifecycleEvent

* Test assuming requests too

* Improve test names

(cherry picked from commit 4b00520)

* Set up deep links as soon as possible (#61668)

(cherry picked from commit 0b5ab6b)

* Serialize IPC errors  (#61665)

* Serialize all enumerable error fields

* Add wrappers around `ipcMain.handle` and `ipcRenderer.invoke`

* Fix `Method Error.prototype.toString called on incompatible receiver undefined`

* Improve docs

* Lint

(cherry picked from commit a1f2ae0)

* Fix unrecoverable ssh cert errors in tsh/Connect (#61322)

* Initialize default Username/HostLogin only in tsh

* Move `Username()` from `api.go` to `tsh.go`

* Remove wrong `Profile.SiteName` default

* Remove resetting `SiteName`

Not sure why it was needed. Perhaps to clear the default that we just removed? But even if add the default back and remove this fix, everything works.

* Gracefully handle missing SSH/TLS certs

* Remove unused `TeleportClient.LoadKeyForClusterWithReissue`

* Revert "Move `Username()` from `api.go` to `tsh.go`"

This reverts commit f7ff0ff.

* Revert "Initialize default Username/HostLogin only in tsh"

This reverts commit ed38bab.

* When any of SSH/TLS cert is missing, return partial profile

* Only log non-nil errors

* Revert "Remove wrong `Profile.SiteName` default"

* Revert "Remove resetting `SiteName`"

This reverts commit f54ab3f.

* Set `SiteName` when adding cluster

* Improve comments

* Add test

* Fix test

* Add myself to TODO

* Add test for logging out with missing SSH cert

* Lint

(cherry picked from commit cd3c8f8)

* Connect: update docs for sharing ~/.tsh directory (#61467)

* Update docs for sharing ~/.tsh directory

* Review comments

* Lint

(cherry picked from commit 19533bf)

---------

Co-authored-by: ravicious <rafal.cieslak@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry size/sm ui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants