feat(host-service): tRPC query timeout + retry, fix hung file tree and git status#3838
Conversation
- Time out listDirectory after 5s and retry up to 3x with linear backoff - Abort in-flight requests on unmount and workspace/root change - Show a loading spinner in FilesTab while the workspace query resolves - Enable abortOnUnmount globally on the workspace tRPC client
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors directory loading in Changes
Sequence Diagram(s)sequenceDiagram
participant FilesTab as FilesTab (UI)
participant Hook as useFileTree
participant TRPC as workspaceTrpc client
participant Router as filesystemRouter
participant Service as FsHostService
participant HostFS as WorkspaceFS
FilesTab->>Hook: loadDirectory(options)
Hook->>Hook: apply cached listDirectory.getData (if present)
alt needs server fetch
Hook->>TRPC: start fetch (with AbortController + 5s timeout)
TRPC->>Router: RPC listDirectory (propagate signal)
Router->>Service: service.listDirectory(..., { signal })
Service->>HostFS: listDirectory(..., { signal })
rect rgba(200,200,255,0.5)
HostFS-->>Service: entries / error
end
Service-->>Router: entries / error
Router-->>TRPC: response / error
TRPC-->>Hook: response or error
alt timeout
Hook->>Hook: schedule retry (attempt+1) after backoff (up to 3)
Hook-->>Hook: re-invoke loadDirectory({force:true, ignoreInFlight:true, attempt:next})
end
end
FilesTab-->>Hook: unmount or root change
Hook->>Hook: abort active controllers & clear retry timers
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds resilience to the file-tree's Confidence Score: 5/5Safe to merge; all remaining findings are minor style/edge-case suggestions that do not affect correctness. No P0/P1 issues found. The timeout + retry logic, abort-controller lifecycle, stale-request guards, and cleanup paths are all correct. The two P2 comments are optional improvements. No files require special attention.
|
| Filename | Overview |
|---|---|
| apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts | Major resilience overhaul: adds 5 s timeout + linear-backoff retries, ref-based stale-request guards, cache-before-fetch, and proper cleanup on unmount/context change. |
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx | Empty-state guard changed from checking workspaceQuery.data?.worktreePath to !rootPath, with a loading spinner when the workspace query is in-flight. |
| packages/workspace-client/src/workspace-trpc.ts | Adds abortOnUnmount: true globally to the tRPC React client — safe one-liner that complements the hook-level abort logic. |
Sequence Diagram
sequenceDiagram
participant FT as FilesTab
participant UFT as useFileTree
participant AC as AbortController
participant TRPC as tRPC (listDirectory)
FT->>UFT: mount (rootPath, workspaceId)
UFT->>UFT: isMountedRef = true
UFT->>UFT: loadDirectory(rootPath, {force:true})
UFT->>UFT: check tRPC cache (getData)
alt cache hit
UFT->>UFT: applyDirectoryEntries (cached)
end
UFT->>AC: new AbortController
UFT->>TRPC: fetch(input, signal) + 5s timeout
alt success
TRPC-->>UFT: entries
UFT->>UFT: isRequestCurrent? applyDirectoryEntries
else timeout (5s)
UFT->>AC: abort()
UFT->>UFT: catch ListDirectoryTimeoutError
alt nextAttempt <= MAX_ATTEMPTS (3)
UFT->>UFT: schedule retry (300ms * attempt)
UFT->>UFT: loadDirectory(path, {attempt:N, force, ignoreInFlight})
else all retries exhausted
UFT->>UFT: clearDirectoryLoading
end
else external abort
AC->>TRPC: abort signal
UFT->>UFT: isRequestCurrent false, discard
end
FT->>UFT: unmount
UFT->>UFT: clearRetryTimers, abortActiveLoads, isMountedRef=false
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts
Line: 157-165
Comment:
**Only timeouts trigger retries — not other transient errors**
Network failures, server 5xx errors, or host-service crashes will fall through to the `console.error` + `clearDirectoryLoading` path without retrying. Given the PR goal of surviving a "slow or hung host service", a connection-reset or unexpected server error would still leave the directory in a permanent failed state. Consider whether `shouldRetryListDirectory` should also cover `DOMException` `AbortError` from non-timeout aborts or generic network errors if the intent is broad resilience.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx
Line: 474
Comment:
**`isFetching` triggers the spinner during background refetches**
`workspaceQuery.isFetching` is `true` whenever a fetch is in progress — including background refetches that happen after the query already has data. If a workspace genuinely has no `worktreePath` (confirmed by a previous fetch) and a background refetch starts, `!rootPath` and `isFetching` are both true, so the spinner replaces "Workspace worktree not available" for the duration of the refetch. Using `workspaceQuery.isLoading` (true only when there is no data yet) would avoid this flash while still covering the intended initial-load case.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(desktop): make file tree resilient t..." | Re-trigger Greptile
| function shouldRetryListDirectory( | ||
| error: unknown, | ||
| nextAttempt: number, | ||
| ): boolean { | ||
| return ( | ||
| error instanceof ListDirectoryTimeoutError && | ||
| nextAttempt <= LIST_DIRECTORY_MAX_ATTEMPTS | ||
| ); | ||
| } |
There was a problem hiding this comment.
Only timeouts trigger retries — not other transient errors
Network failures, server 5xx errors, or host-service crashes will fall through to the console.error + clearDirectoryLoading path without retrying. Given the PR goal of surviving a "slow or hung host service", a connection-reset or unexpected server error would still leave the directory in a permanent failed state. Consider whether shouldRetryListDirectory should also cover DOMException AbortError from non-timeout aborts or generic network errors if the intent is broad resilience.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts
Line: 157-165
Comment:
**Only timeouts trigger retries — not other transient errors**
Network failures, server 5xx errors, or host-service crashes will fall through to the `console.error` + `clearDirectoryLoading` path without retrying. Given the PR goal of surviving a "slow or hung host service", a connection-reset or unexpected server error would still leave the directory in a permanent failed state. Consider whether `shouldRetryListDirectory` should also cover `DOMException` `AbortError` from non-timeout aborts or generic network errors if the intent is broad resilience.
How can I resolve this? If you propose a fix, please make it concise.| <div className="flex h-full items-center justify-center text-sm text-muted-foreground"> | ||
| Workspace worktree not available | ||
| <div className="flex h-full items-center justify-center gap-2 text-sm text-muted-foreground"> | ||
| {workspaceQuery.isLoading || workspaceQuery.isFetching ? ( |
There was a problem hiding this comment.
isFetching triggers the spinner during background refetches
workspaceQuery.isFetching is true whenever a fetch is in progress — including background refetches that happen after the query already has data. If a workspace genuinely has no worktreePath (confirmed by a previous fetch) and a background refetch starts, !rootPath and isFetching are both true, so the spinner replaces "Workspace worktree not available" for the duration of the refetch. Using workspaceQuery.isLoading (true only when there is no data yet) would avoid this flash while still covering the intended initial-load case.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx
Line: 474
Comment:
**`isFetching` triggers the spinner during background refetches**
`workspaceQuery.isFetching` is `true` whenever a fetch is in progress — including background refetches that happen after the query already has data. If a workspace genuinely has no `worktreePath` (confirmed by a previous fetch) and a background refetch starts, `!rootPath` and `isFetching` are both true, so the spinner replaces "Workspace worktree not available" for the duration of the refetch. Using `workspaceQuery.isLoading` (true only when there is no data yet) would avoid this flash while still covering the intended initial-load case.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts`:
- Around line 157-164: The retry check in shouldRetryListDirectory incorrectly
limits retries because attempt starts at 1; change the inequality so the final
planned retry is allowed (e.g. compare nextAttempt against
LIST_DIRECTORY_MAX_ATTEMPTS + 1 instead of LIST_DIRECTORY_MAX_ATTEMPTS) to
restore the intended 3 retries; update the same fix where the same pattern
appears (the other checks around the blocks referencing
ListDirectoryTimeoutError and LIST_DIRECTORY_MAX_ATTEMPTS at the other
occurrences).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a5ab21cb-f33b-48fe-b3da-e9dc3271fb72
📒 Files selected for processing (3)
apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.tsapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsxpackages/workspace-client/src/workspace-trpc.ts
| function shouldRetryListDirectory( | ||
| error: unknown, | ||
| nextAttempt: number, | ||
| ): boolean { | ||
| return ( | ||
| error instanceof ListDirectoryTimeoutError && | ||
| nextAttempt <= LIST_DIRECTORY_MAX_ATTEMPTS | ||
| ); |
There was a problem hiding this comment.
The retry budget is off by one.
attempt starts at 1, so nextAttempt <= LIST_DIRECTORY_MAX_ATTEMPTS only schedules retries for attempts 2 and 3. That gives you 2 retries total, not the 3 retries described in the PR, so a slow host request can still fail one timeout earlier than intended.
🔧 Proposed fix
- const { force = false, ignoreInFlight = false, attempt = 1 } = options;
+ const { force = false, ignoreInFlight = false, attempt = 0 } = options;
…
- }, getListDirectoryRetryDelayMs(attempt));
+ }, getListDirectoryRetryDelayMs(nextAttempt));Also applies to: 361-363, 427-440
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts`
around lines 157 - 164, The retry check in shouldRetryListDirectory incorrectly
limits retries because attempt starts at 1; change the inequality so the final
planned retry is allowed (e.g. compare nextAttempt against
LIST_DIRECTORY_MAX_ATTEMPTS + 1 instead of LIST_DIRECTORY_MAX_ATTEMPTS) to
restore the intended 3 retries; update the same fix where the same pattern
appears (the other checks around the blocks referencing
ListDirectoryTimeoutError and LIST_DIRECTORY_MAX_ATTEMPTS at the other
occurrences).
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx">
<violation number="1" location="apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx:474">
P2: `workspaceQuery.isFetching` is `true` during background refetches even when cached data already exists. If a workspace genuinely has no `worktreePath`, any background refetch will cause the spinner to flash instead of showing "Workspace worktree not available". Use `workspaceQuery.isLoading` alone here, which is only `true` when there is no cached data yet.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…orkspace name - Use AbortController membership in the active set as the "still relevant" check, replacing isMountedRef + activeContextRef + isRequestCurrent - Inline single-use helpers (markDirectoryLoading, clearDirectoryLoading, shouldRetryListDirectory, getListDirectoryRetryDelayMs) - Drop the workspaceName prop from FilesTab/WorkspaceSidebar/WorkspaceContent; the files header is just "Explorer"
There was a problem hiding this comment.
♻️ Duplicate comments (1)
apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts (1)
295-297:⚠️ Potential issue | 🟠 MajorRetry budget is still off by one (2 retries instead of 3).
Line 295 starts
attemptat1, and Line 366 caps onnextAttempt <= 3, so only attempts 2 and 3 are retried. That yields 2 retries, not the stated 3.Suggested fix
- const { force = false, ignoreInFlight = false, attempt = 1 } = options; + const { force = false, ignoreInFlight = false, attempt = 0 } = options; ... - }, LIST_DIRECTORY_RETRY_DELAY_MS * attempt); + }, LIST_DIRECTORY_RETRY_DELAY_MS * nextAttempt);Also applies to: 363-367, 375-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts` around lines 295 - 297, The retry budget is off by one because options destructures attempt = 1 while the retry gating uses nextAttempt <= 3; change the logic so three retries are allowed by either initializing attempt to 0 (options: attempt = 0) or by raising the cap (nextAttempt <= 4) so nextAttempt progression (attempt -> attempt+1) yields three retry attempts; update the symbols attempt and nextAttempt in the useFileTree retry logic (the options destructuring and the nextAttempt <= 3 check) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.ts`:
- Around line 295-297: The retry budget is off by one because options
destructures attempt = 1 while the retry gating uses nextAttempt <= 3; change
the logic so three retries are allowed by either initializing attempt to 0
(options: attempt = 0) or by raising the cap (nextAttempt <= 4) so nextAttempt
progression (attempt -> attempt+1) yields three retry attempts; update the
symbols attempt and nextAttempt in the useFileTree retry logic (the options
destructuring and the nextAttempt <= 3 check) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 15a6398b-423c-46a9-b77c-d88b713c0218
📒 Files selected for processing (4)
apps/desktop/src/renderer/hooks/host-service/useFileTree/useFileTree.tsapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/WorkspaceSidebar.tsxapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsxapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/page.tsx
💤 Files with no reviewable changes (2)
- apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/WorkspaceSidebar.tsx
- apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/page.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsx
…tion # Conflicts: # apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/page.tsx
When the renderer aborts a listDirectory query (timeout or workspace switch), the host-service was running fs.readdir + per-symlink fs.stat to completion and discarding the result. Node's fs API doesn't honor AbortSignal, but we can short-circuit between operations — useful in symlink-heavy directories (node_modules) where the per-entry stat loop dominates.
…e abort Process per-entry symlink stats in batches of 16 with a signal check between batches. With Promise.all-over-everything, all stats kick off in the same microtask and an in-flight abort can't interrupt any of them. Batching bounds the zombie work to one batch (~16 stats) per abort.
Hung host-service IPC (slow git, slow filesystem ops) was leaving the
renderer spinning indefinitely. Replace the bespoke per-hook retry/timeout
in useFileTree with a single server-side middleware that bounds every
query procedure.
Server (queryProcedure builder + middleware):
- t.middleware races next() against a per-procedure timeout, rejecting
with a TRPCError({ code: "TIMEOUT" }). Default 5s, override via
`.meta({ timeoutMs })` on procedures that legitimately take longer
(search, listCommits, getDiff, getStatus, getBranchSyncStatus,
getPullRequestThreads, readFile).
- Switch all query procedures in filesystem and git routers to
queryProcedure; mutations remain on protectedProcedure.
Client (workspace-client QueryClient):
- defaultOptions.queries.retry retries TIMEOUT errors up to 2 times with
linear backoff (300ms, 600ms). Other errors keep the previous single
retry.
useFileTree:
- Drop withAbortableTimeout, ListDirectoryTimeoutError, retry-timer set,
retry recursion, loadDirectoryRef, activeLoadAbortControllersRef.
- Just await utils.filesystem.listDirectory.fetch(input). React Query's
retry policy handles TIMEOUT; abortOnUnmount cancels in-flight on
unmount/workspace switch. ~120 lines removed.
Side benefit: git.getStatus / listBranches / listCommits etc. all gain
hung-IPC protection automatically. The Changes tab no longer spins
forever on a slow `git status`.
Documents the queryProcedure / timeoutMiddleware pattern: where it
lives, how to set per-procedure budgets via .meta({ timeoutMs }),
the current budget table, and what timeouts do (and don't) interrupt.
Summary
A hung host-service IPC (slow
git status, slowfs.readdir, hung child process) was leaving the renderer spinning indefinitely on the Files tab and Changes tab. This PR adds a single server-side timeout middleware that bounds every tRPC query procedure, plus a TIMEOUT-aware retry policy on the client.Server:
queryProcedurebuilderA new
queryProcedure = protectedProcedure.use(timeoutMiddleware)inpackages/host-service/src/trpc/index.ts:next()against a per-procedure timeout, rejecting withTRPCError({ code: "TIMEOUT" }).meta({ timeoutMs })for procedures that legitimately take longerprotectedProcedurebecause their latency variesRouters updated:
filesystem.*(queries):listDirectory,getMetadatause the default 5s;readFile30s;searchFiles30s;searchContent60sgit.*(queries):listBranches,getBaseBranch,getPullRequestuse 5s;getStatus15s;getCommitFiles15s;listCommits/getDiff/getBranchSyncStatus/getPullRequestThreads30sClient: TIMEOUT-aware retry
In
WorkspaceClientProvider.tsx, the QueryClient now retriesTRPCClientErrorwithcode: "TIMEOUT"up to 2 times with linear backoff (300ms, 600ms). Other errors keep the previous single retry.useFileTree simplification
Drop the bespoke client-side timeout/retry plumbing now that the server enforces a budget:
withAbortableTimeout,ListDirectoryTimeoutError, retry-timer set, retry recursion,loadDirectoryRef,activeLoadAbortControllersRef,LoadDirectoryOptions.attempt/ignoreInFlightloadingDirectories/loadedDirectories/invalidatedDirectoriesstate machine, cache fast path, fs:event-driven reloadOther client-side changes carried over from earlier commits
abortOnUnmount: trueon the workspace tRPC client so in-flight requests cancel on unmount/workspace switchAbortSignalplumbed throughfilesystem.listDirectory→FsService→fs.tsso the post-readdir stat loop interrupts on cancel (Node'sfs.readdir/fs.statthemselves ignore signals; we batch the stat loop in groups of 16 with a signal check between batches)workspaceQuery.isLoading(notisFetching) to avoid flashing on background refetchesworkspaceNameprop from FilesTab/WorkspaceSidebar/WorkspaceContentSide benefits
git status.meta({ timeoutMs })for known-slow onesCaveats
fs.readdiris uncancellable — a single huge flat directory on a slow disk still runs to completion server-side, then the procedure rejects via the middleware timeout. Bounded zombie work, not zero.protectedProcedureand aren't covered; opt-in deliberately as needed.Test plan
listDirectoryretries and either recovers or fails cleanly without leaving the spinner upgit statusand confirm it doesn't spin forever — retries 2x then surfaces an errorbun run typecheckandbun testpass