fix(web): recover terminal websockets after mobile background/resume#4685
Conversation
Mobile Chrome freezes backgrounded tabs and closes their websockets. WebTerminal opened a single one-shot socket whose onclose only set state to "error" — no reconnect, no recovery path — so terminals stayed dead after minimizing and reopening the browser. Extract the socket lifecycle into TerminalConnection: exponential- backoff reconnect on unexpected close, plus visibilitychange / pageshow / resume / online listeners that reconnect immediately when the page comes back. The server already keys sessions by terminalId and adopts/respawns the PTY on reattach, so reopening the same URL resumes the session; ?replay=0 after first bytes avoids re-dumping scrollback xterm already holds.
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR extracts WebSocket terminal protocol handling into a new ChangesTerminal Connection Abstraction
Sequence Diagram(s)sequenceDiagram
participant WT as WebTerminal
participant TC as TerminalConnection
participant WS as WebSocket
WT->>TC: create & start()
TC->>TC: buildUrl with auth token
TC->>WS: new WebSocket(url)
Note over TC: state = connecting
TC-->>WT: onStateChange(connecting)
WS-->>TC: onopen
TC-->>WT: onStateChange(open)
loop User interaction
WT->>TC: send(inputMessage)
TC->>WS: JSON message
end
loop Server output
WS-->>TC: binary frames
TC-->>WT: onData callback
WT->>WT: write to xterm
end
WS-->>TC: close unexpectedly
Note over TC: exponential backoff
TC->>TC: schedule reconnect
TC-->>WT: onStateChange(reconnecting)
TC->>WS: reconnect attempt
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Ready to review this PR? Stage has broken it down into 3 individual chapters for you:
Chapters generated by Stage for commit b7cf83c on May 18, 2026 2:24pm UTC. |
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
Greptile SummaryThis PR extracts the raw one-shot WebSocket in
Confidence Score: 4/5Safe to merge — the reconnect logic is well-guarded and the cleanup path is correct; the two findings are minor efficiency and defensive-coding concerns that do not affect correctness. The generation counter correctly prevents double-opens when concurrent resume events both race through handleResume before the first socket is created, but each still triggers a redundant getAuthToken() fetch. The teardownSocket omission of socket.onerror = null is harmless today since onerror is never assigned, but leaves a gap if the code is extended. Neither issue affects the observable reconnect behavior. TerminalConnection.ts — review the handleResume concurrent-call path and the teardownSocket cleanup for completeness.
|
| Filename | Overview |
|---|---|
| apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts | New class encapsulating WebSocket lifecycle with exponential-backoff reconnect and page-visibility recovery; logic is sound but teardownSocket doesn't null socket.onerror, and concurrent resume events cause duplicate getAuthToken() calls that the generation guard discards |
| apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/WebTerminal.tsx | Refactored to delegate socket lifecycle to TerminalConnection; terminal and resize setup moved out of async IIFE; cleanup is clean and correct |
Sequence Diagram
sequenceDiagram
participant Browser as Browser / OS
participant TC as TerminalConnection
participant WS as WebSocket
participant Server as Host Service
Browser->>TC: start()
TC->>TC: register visibility/pageshow/online/resume listeners
TC->>WS: new WebSocket(url)
WS-->>Server: connect
Server-->>WS: "{type:attached}"
WS-->>TC: onmessage to onControl(attached)
TC-->>Browser: setState(open), sendResize()
Note over Browser,TC: Mobile tab backgrounded, browser freezes tab and closes socket
WS-->>TC: onclose
TC->>TC: scheduleReconnect() document.hidden skip timer emit reconnecting
Browser->>TC: visibilitychange tab foregrounded
TC->>TC: "handleResume() reconnectAttempt=0"
TC->>WS: "new WebSocket(url + replay=0)"
WS-->>Server: reconnect
Server-->>WS: "{type:attached} + missed output"
WS-->>TC: onControl(attached)
TC-->>Browser: setState(open)
Note over TC,WS: All 12 timer retries exhausted
TC-->>Browser: setState(error) Disconnected
Browser->>TC: online event network back
TC->>TC: "handleResume() reconnectAttempt=0 connect()"
TC-->>Browser: setState(reconnecting)
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:177-179
`teardownSocket` nulls `onmessage` and `onclose` but skips `onerror`. The `onerror` handler is never set in this file today, so this is harmless — but `socket.close()` can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an `onerror` for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.
```suggestion
socket.onmessage = null;
socket.onclose = null;
socket.onerror = null;
try {
```
### Issue 2 of 2
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:216-230
**Concurrent resume events trigger parallel `getAuthToken()` calls**
When the device wakes, `visibilitychange` and `online` can fire in the same event loop turn. Both calls to `handleResume` see `this.socket === null` (the WebSocket isn't created until after the async `buildUrl()` resolves), so both invoke `connect()`. The second call increments the generation and invalidates the first, so only one socket is ever opened — but both `getAuthToken()` fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on `handleResume` would avoid the wasted work.
Reviews (1): Last reviewed commit: "fix(web): recover terminal websockets af..." | Re-trigger Greptile
| socket.onmessage = null; | ||
| socket.onclose = null; | ||
| try { |
There was a problem hiding this comment.
teardownSocket nulls onmessage and onclose but skips onerror. The onerror handler is never set in this file today, so this is harmless — but socket.close() can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an onerror for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.
| socket.onmessage = null; | |
| socket.onclose = null; | |
| try { | |
| socket.onmessage = null; | |
| socket.onclose = null; | |
| socket.onerror = null; | |
| try { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
Line: 177-179
Comment:
`teardownSocket` nulls `onmessage` and `onclose` but skips `onerror`. The `onerror` handler is never set in this file today, so this is harmless — but `socket.close()` can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an `onerror` for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.
```suggestion
socket.onmessage = null;
socket.onclose = null;
socket.onerror = null;
try {
```
How can I resolve this? If you propose a fix, please make it concise.| private handleResume = () => { | ||
| if (this.disposed || this.terminated) return; | ||
| if (typeof document !== "undefined" && document.hidden) return; | ||
| this.reconnectAttempt = 0; | ||
| const socket = this.socket; | ||
| if ( | ||
| socket && | ||
| (socket.readyState === WebSocket.OPEN || | ||
| socket.readyState === WebSocket.CONNECTING) | ||
| ) { | ||
| return; | ||
| } | ||
| this.cancelReconnect(); | ||
| void this.connect(); | ||
| }; |
There was a problem hiding this comment.
Concurrent resume events trigger parallel
getAuthToken() calls
When the device wakes, visibilitychange and online can fire in the same event loop turn. Both calls to handleResume see this.socket === null (the WebSocket isn't created until after the async buildUrl() resolves), so both invoke connect(). The second call increments the generation and invalidates the first, so only one socket is ever opened — but both getAuthToken() fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on handleResume would avoid the wasted work.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
Line: 216-230
Comment:
**Concurrent resume events trigger parallel `getAuthToken()` calls**
When the device wakes, `visibilitychange` and `online` can fire in the same event loop turn. Both calls to `handleResume` see `this.socket === null` (the WebSocket isn't created until after the async `buildUrl()` resolves), so both invoke `connect()`. The second call increments the generation and invalidates the first, so only one socket is ever opened — but both `getAuthToken()` fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on `handleResume` would avoid the wasted work.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts">
<violation number="1" location="apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:103">
P2: This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.
(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| let url: string; | ||
| try { | ||
| url = await this.buildUrl(); | ||
| } catch { |
There was a problem hiding this comment.
P2: This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.
(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts, line 103:
<comment>This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.
(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.) </comment>
<file context>
@@ -0,0 +1,237 @@
+ let url: string;
+ try {
+ url = await this.buildUrl();
+ } catch {
+ if (generation !== this.generation || this.disposed) return;
+ this.scheduleReconnect();
</file context>
…4685) Mobile Chrome freezes backgrounded tabs and closes their websockets. WebTerminal opened a single one-shot socket whose onclose only set state to "error" — no reconnect, no recovery path — so terminals stayed dead after minimizing and reopening the browser. Extract the socket lifecycle into TerminalConnection: exponential- backoff reconnect on unexpected close, plus visibilitychange / pageshow / resume / online listeners that reconnect immediately when the page comes back. The server already keys sessions by terminalId and adopts/respawns the PTY on reattach, so reopening the same URL resumes the session; ?replay=0 after first bytes avoids re-dumping scrollback xterm already holds.
Already integrated, superseded by current versions, or net-empty in this fork: superset-sh#3881 superset-sh#3887 superset-sh#3917 superset-sh#3925 superset-sh#3940 superset-sh#3956 superset-sh#3961 superset-sh#3974 superset-sh#4017 superset-sh#4048 superset-sh#4049 superset-sh#4055 superset-sh#4063 superset-sh#4070 superset-sh#4092 superset-sh#4110 superset-sh#4138 superset-sh#4159 superset-sh#4163 superset-sh#4164 superset-sh#4209 superset-sh#4210 superset-sh#4249 superset-sh#4349 superset-sh#4405 superset-sh#4462 superset-sh#4464 superset-sh#4494 superset-sh#4495 superset-sh#4500 superset-sh#4535 superset-sh#4541 superset-sh#4566 superset-sh#4580 superset-sh#4589 superset-sh#4593 superset-sh#4603 superset-sh#4637 superset-sh#4642 superset-sh#4655 superset-sh#4657 superset-sh#4659 superset-sh#4685 superset-sh#4692 superset-sh#4745 superset-sh#4789 superset-sh#4797 superset-sh#4824 superset-sh#4835 superset-sh#4847 superset-sh#4885 superset-sh#4896.
Problem
On mobile Chrome, after minimizing the browser and reopening it, workspace terminal websocket connections are dropped and never recover — the terminal sits dead with a "Disconnected." banner.
Root cause
WebTerminal.tsxopened a single one-shotWebSocket. Its entire failure handling was:No reconnect, no backoff, and no handling of page-lifecycle events. Mobile Chrome aggressively freezes backgrounded tabs and closes their websockets while frozen. When the tab resumes, the queued
closeevent fires (or the socket is already dead) — and the component just flips to"error"permanently. There was no path back to a live connection.The desktop app doesn't hit this because Electron windows aren't frozen the way mobile browser tabs are; its terminal transport already reconnects on close. The web client never got an equivalent.
Fix (web-only — no server changes)
Extracted the socket lifecycle out of the component into
TerminalConnection:exitor a fatal servererrormarks the connection terminated and suppresses reconnect.visibilitychange,pageshow(bfcache restore),resume(Page Lifecycle API), andonline. When the page comes back, the attempt counter resets and it reconnects immediately if the socket isn't open, instead of waiting out a backoff timer that was frozen with the tab.document.hidden— a frozen tab can't run them; the visibility listener drives recovery on resume.terminalIdand adopts (or respawns) the PTY on reattach, so reopening the same URL resumes the session. After the first PTY bytes land, reconnects pass?replay=0(an existing server query param) to skip re-dumping scrollback xterm already holds; the in-memory buffer still replays output produced while disconnected."reconnecting"UI state so the user sees "Reconnecting…" instead of a dead "Disconnected." banner.Note on heartbeat
A heartbeat/ping was considered but not added — a real heartbeat needs the server to answer with
pong, and this change is intentionally web-only. The desktop terminal transport (which works reliably) also has no heartbeat; reconnect-on-close + visibility triggers cover the reported failure. If idle-connection NAT drops show up later, aping/pongpair on the host-service is the follow-up.Testing
Manual, on mobile Chrome (and desktop via DevTools):
onlineevent) the terminal recovers.Application → close the WSor kill the relay — the client retries with growing backoff and recovers when the endpoint returns.dispose()removes all listeners and closes the socket; no leaks or stray reconnects.bun run lintandbun run typecheckpass.Out of scope
RemoteTerminal.tsx(the public remote-control viewer) has the same one-shot pattern and no visibility handling. It's a separate, ephemeral, unauthenticated endpoint without the adopt/respawn resumption — left for a follow-up.Summary by cubic
Fixes dead terminal websockets on mobile after backgrounding by adding reconnect and page-lifecycle recovery. Terminals now auto-reconnect and resume sessions with a clear "Reconnecting…" state.
TerminalConnectionwith exponential-backoff reconnect on unexpected close.visibilitychange,pageshow,resume, andonline; no reconnect timers while the page is hidden.terminalId; after first bytes, reconnects pass?replay=0to avoid duplicate scrollback.Written for commit b7cf83c. Summary will update on new commits. Review in cubic
Summary by CodeRabbit