Skip to content

fix(web): recover terminal websockets after mobile background/resume#4685

Merged
saddlepaddle merged 1 commit into
mainfrom
fix/web-websocket-lifecycle
May 18, 2026
Merged

fix(web): recover terminal websockets after mobile background/resume#4685
saddlepaddle merged 1 commit into
mainfrom
fix/web-websocket-lifecycle

Conversation

@saddlepaddle
Copy link
Copy Markdown
Collaborator

@saddlepaddle saddlepaddle commented May 18, 2026

Problem

On mobile Chrome, after minimizing the browser and reopening it, workspace terminal websocket connections are dropped and never recover — the terminal sits dead with a "Disconnected." banner.

Root cause

WebTerminal.tsx opened a single one-shot WebSocket. Its entire failure handling was:

socket.onclose = () => {
  setState((previous) =>
    previous === "open" || previous === "connecting" ? "error" : previous,
  );
};

No reconnect, no backoff, and no handling of page-lifecycle events. Mobile Chrome aggressively freezes backgrounded tabs and closes their websockets while frozen. When the tab resumes, the queued close event fires (or the socket is already dead) — and the component just flips to "error" permanently. There was no path back to a live connection.

The desktop app doesn't hit this because Electron windows aren't frozen the way mobile browser tabs are; its terminal transport already reconnects on close. The web client never got an equivalent.

Fix (web-only — no server changes)

Extracted the socket lifecycle out of the component into TerminalConnection:

  • Exponential-backoff reconnect on any unexpected close (500ms → 10s, 12 attempts), mirroring the desktop terminal transport. A clean PTY exit or a fatal server error marks the connection terminated and suppresses reconnect.
  • Page-lifecycle recovery — listeners on visibilitychange, pageshow (bfcache restore), resume (Page Lifecycle API), and online. When the page comes back, the attempt counter resets and it reconnects immediately if the socket isn't open, instead of waiting out a backoff timer that was frozen with the tab.
  • No reconnect timers scheduled while document.hidden — a frozen tab can't run them; the visibility listener drives recovery on resume.
  • Session resumption — the host-service already keys terminal sessions by terminalId and adopts (or respawns) the PTY on reattach, so reopening the same URL resumes the session. After the first PTY bytes land, reconnects pass ?replay=0 (an existing server query param) to skip re-dumping scrollback xterm already holds; the in-memory buffer still replays output produced while disconnected.
  • New "reconnecting" UI state so the user sees "Reconnecting…" instead of a dead "Disconnected." banner.

Note on heartbeat

A heartbeat/ping was considered but not added — a real heartbeat needs the server to answer with pong, and this change is intentionally web-only. The desktop terminal transport (which works reliably) also has no heartbeat; reconnect-on-close + visibility triggers cover the reported failure. If idle-connection NAT drops show up later, a ping/pong pair on the host-service is the follow-up.

Testing

Manual, on mobile Chrome (and desktop via DevTools):

  1. Open a workspace terminal, confirm it attaches and echoes input.
  2. Background test — minimize the browser / switch apps for 30s+, reopen. The terminal reconnects automatically (brief "Reconnecting…"), session resumes, and any output produced while away is replayed.
  3. Network test — toggle airplane mode briefly; on reconnect (online event) the terminal recovers.
  4. DevTools: with the terminal open, run Application → close the WS or kill the relay — the client retries with growing backoff and recovers when the endpoint returns.
  5. Switch terminals / unmount — dispose() removes all listeners and closes the socket; no leaks or stray reconnects.

bun run lint and bun run typecheck pass.

Out of scope

RemoteTerminal.tsx (the public remote-control viewer) has the same one-shot pattern and no visibility handling. It's a separate, ephemeral, unauthenticated endpoint without the adopt/respawn resumption — left for a follow-up.


Open in Stage

Summary by cubic

Fixes dead terminal websockets on mobile after backgrounding by adding reconnect and page-lifecycle recovery. Terminals now auto-reconnect and resume sessions with a clear "Reconnecting…" state.

  • Bug Fixes
    • Added TerminalConnection with exponential-backoff reconnect on unexpected close.
    • Recover on visibilitychange, pageshow, resume, and online; no reconnect timers while the page is hidden.
    • Resume the same PTY session via terminalId; after first bytes, reconnects pass ?replay=0 to avoid duplicate scrollback.
    • New "reconnecting" UI state; "Disconnected." no longer sticks after background/resume.
    • Clean teardown on unmount: closes the socket and removes listeners.
    • Web-only change; no server updates required.

Written for commit b7cf83c. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features
    • Terminal now displays "Reconnecting…" status when connection is temporarily lost.
    • Enhanced connection recovery when switching between tabs or regaining connectivity.
    • Improved terminal session stability with automatic reconnection capabilities.

Review Change Stack

Mobile Chrome freezes backgrounded tabs and closes their websockets.
WebTerminal opened a single one-shot socket whose onclose only set
state to "error" — no reconnect, no recovery path — so terminals
stayed dead after minimizing and reopening the browser.

Extract the socket lifecycle into TerminalConnection: exponential-
backoff reconnect on unexpected close, plus visibilitychange /
pageshow / resume / online listeners that reconnect immediately when
the page comes back. The server already keys sessions by terminalId
and adopts/respawns the PTY on reattach, so reopening the same URL
resumes the session; ?replay=0 after first bytes avoids re-dumping
scrollback xterm already holds.
@capy-ai
Copy link
Copy Markdown

capy-ai Bot commented May 18, 2026

Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c24db0bc-ceb6-4327-8177-1fd441cce4f2

📥 Commits

Reviewing files that changed from the base of the PR and between da559de and b7cf83c.

📒 Files selected for processing (2)
  • apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
  • apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/WebTerminal.tsx

📝 Walkthrough

Walkthrough

This PR extracts WebSocket terminal protocol handling into a new TerminalConnection class, which manages connection state, typed messaging, exponential-backoff reconnect with page-visibility recovery, and lifecycle ownership. WebTerminal is refactored to use TerminalConnection instead of managing WebSocket directly, simplifying its internal logic and delegating all transport concerns.

Changes

Terminal Connection Abstraction

Layer / File(s) Summary
TerminalConnection class with protocol and state contracts
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
Introduces TerminalConnection: typed message contracts (TerminalServerMessage, TerminalClientMessage), connection state type (connecting/reconnecting/error), WebSocket URL building with auth and query params, binary/JSON frame routing, exponential-backoff reconnect with attempt limits, page-visibility recovery (visibilitychange, online, pageshow, resume events), and public start(), send(), dispose() lifecycle methods.
WebTerminal integration with TerminalConnection
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/WebTerminal.tsx
Replaces direct WebSocket management with TerminalConnection callbacks: wires binary data to xterm Terminal writes, control messages to state updates (handling attached/exit/error), terminal onData to TerminalConnection.send(), and ResizeObserver/visualViewport listeners. Updates ConnectionState to include reconnecting. Cleanup disposes the connection and terminal, removing prior WebSocket-specific teardown.

Sequence Diagram(s)

sequenceDiagram
  participant WT as WebTerminal
  participant TC as TerminalConnection
  participant WS as WebSocket
  WT->>TC: create & start()
  TC->>TC: buildUrl with auth token
  TC->>WS: new WebSocket(url)
  Note over TC: state = connecting
  TC-->>WT: onStateChange(connecting)
  WS-->>TC: onopen
  TC-->>WT: onStateChange(open)
  loop User interaction
    WT->>TC: send(inputMessage)
    TC->>WS: JSON message
  end
  loop Server output
    WS-->>TC: binary frames
    TC-->>WT: onData callback
    WT->>WT: write to xterm
  end
  WS-->>TC: close unexpectedly
  Note over TC: exponential backoff
  TC->>TC: schedule reconnect
  TC-->>WT: onStateChange(reconnecting)
  TC->>WS: reconnect attempt
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • superset-sh/superset#4647: The earlier implementation of relay WebSocket-based terminal viewer logic in WebTerminal.tsx that this refactor extracts and abstracts into TerminalConnection.

Poem

🐰 A socket springs free from its terminal cage,
Now wrapped in a class, of state and of stage,
With reconnect that bounces like rabbits at play,
Visibility watched for the next hopping day,
WebTerminal hops lighter—the protocol's stowed away! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: recovering terminal websockets on mobile after background/resume, which is the primary change in this PR.
Description check ✅ Passed The description is comprehensive and well-structured, covering the problem, root cause, fix details, testing, and scope boundaries. It follows the template structure with clear sections on the issue and solution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/web-websocket-lifecycle

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@stage-review
Copy link
Copy Markdown

stage-review Bot commented May 18, 2026

Ready to review this PR? Stage has broken it down into 3 individual chapters for you:

Title
1 Extract terminal socket lifecycle to TerminalConnection
2 Update terminal component to use TerminalConnection
3 Add reconnecting status to the UI
Open in Stage

Chapters generated by Stage for commit b7cf83c on May 18, 2026 2:24pm UTC.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch

Thank you for your contribution! 🎉

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR extracts the raw one-shot WebSocket in WebTerminal.tsx into a dedicated TerminalConnection class that adds exponential-backoff reconnect (500 ms → 10 s, 12 timer-based attempts) and page-lifecycle recovery via visibilitychange, pageshow, resume, and online listeners, fixing the dead-terminal regression on mobile Chrome after backgrounding.

  • TerminalConnection.ts (new): owns the full socket lifecycle — generation-guarded async connect, backoff scheduling (skipped when document.hidden, driven by visibility on resume), ?replay=0 on reattach once xterm holds scrollback, and clean dispose() that removes all listeners and closes the socket.
  • WebTerminal.tsx (refactored): terminal and resize setup moved out of the async IIFE to run eagerly; state machine gains a \"reconnecting\" UI slot; onControl and onStateChange callbacks wire the two layers together without leaking refs.

Confidence Score: 4/5

Safe to merge — the reconnect logic is well-guarded and the cleanup path is correct; the two findings are minor efficiency and defensive-coding concerns that do not affect correctness.

The generation counter correctly prevents double-opens when concurrent resume events both race through handleResume before the first socket is created, but each still triggers a redundant getAuthToken() fetch. The teardownSocket omission of socket.onerror = null is harmless today since onerror is never assigned, but leaves a gap if the code is extended. Neither issue affects the observable reconnect behavior.

TerminalConnection.ts — review the handleResume concurrent-call path and the teardownSocket cleanup for completeness.

Important Files Changed

Filename Overview
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts New class encapsulating WebSocket lifecycle with exponential-backoff reconnect and page-visibility recovery; logic is sound but teardownSocket doesn't null socket.onerror, and concurrent resume events cause duplicate getAuthToken() calls that the generation guard discards
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/WebTerminal.tsx Refactored to delegate socket lifecycle to TerminalConnection; terminal and resize setup moved out of async IIFE; cleanup is clean and correct

Sequence Diagram

sequenceDiagram
    participant Browser as Browser / OS
    participant TC as TerminalConnection
    participant WS as WebSocket
    participant Server as Host Service

    Browser->>TC: start()
    TC->>TC: register visibility/pageshow/online/resume listeners
    TC->>WS: new WebSocket(url)
    WS-->>Server: connect
    Server-->>WS: "{type:attached}"
    WS-->>TC: onmessage to onControl(attached)
    TC-->>Browser: setState(open), sendResize()

    Note over Browser,TC: Mobile tab backgrounded, browser freezes tab and closes socket
    WS-->>TC: onclose
    TC->>TC: scheduleReconnect() document.hidden skip timer emit reconnecting

    Browser->>TC: visibilitychange tab foregrounded
    TC->>TC: "handleResume() reconnectAttempt=0"
    TC->>WS: "new WebSocket(url + replay=0)"
    WS-->>Server: reconnect
    Server-->>WS: "{type:attached} + missed output"
    WS-->>TC: onControl(attached)
    TC-->>Browser: setState(open)

    Note over TC,WS: All 12 timer retries exhausted
    TC-->>Browser: setState(error) Disconnected
    Browser->>TC: online event network back
    TC->>TC: "handleResume() reconnectAttempt=0 connect()"
    TC-->>Browser: setState(reconnecting)
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:177-179
`teardownSocket` nulls `onmessage` and `onclose` but skips `onerror`. The `onerror` handler is never set in this file today, so this is harmless — but `socket.close()` can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an `onerror` for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.

```suggestion
		socket.onmessage = null;
		socket.onclose = null;
		socket.onerror = null;
		try {
```

### Issue 2 of 2
apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:216-230
**Concurrent resume events trigger parallel `getAuthToken()` calls**

When the device wakes, `visibilitychange` and `online` can fire in the same event loop turn. Both calls to `handleResume` see `this.socket === null` (the WebSocket isn't created until after the async `buildUrl()` resolves), so both invoke `connect()`. The second call increments the generation and invalidates the first, so only one socket is ever opened — but both `getAuthToken()` fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on `handleResume` would avoid the wasted work.

Reviews (1): Last reviewed commit: "fix(web): recover terminal websockets af..." | Re-trigger Greptile

Comment on lines +177 to +179
socket.onmessage = null;
socket.onclose = null;
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 teardownSocket nulls onmessage and onclose but skips onerror. The onerror handler is never set in this file today, so this is harmless — but socket.close() can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an onerror for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.

Suggested change
socket.onmessage = null;
socket.onclose = null;
try {
socket.onmessage = null;
socket.onclose = null;
socket.onerror = null;
try {
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
Line: 177-179

Comment:
`teardownSocket` nulls `onmessage` and `onclose` but skips `onerror`. The `onerror` handler is never set in this file today, so this is harmless — but `socket.close()` can synchronously dispatch an error event in some environments before the connection closes, and a future contributor adding an `onerror` for diagnostics would have a stale callback fire after teardown. Nulling it here keeps the cleanup symmetrical.

```suggestion
		socket.onmessage = null;
		socket.onclose = null;
		socket.onerror = null;
		try {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +216 to +230
private handleResume = () => {
if (this.disposed || this.terminated) return;
if (typeof document !== "undefined" && document.hidden) return;
this.reconnectAttempt = 0;
const socket = this.socket;
if (
socket &&
(socket.readyState === WebSocket.OPEN ||
socket.readyState === WebSocket.CONNECTING)
) {
return;
}
this.cancelReconnect();
void this.connect();
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Concurrent resume events trigger parallel getAuthToken() calls

When the device wakes, visibilitychange and online can fire in the same event loop turn. Both calls to handleResume see this.socket === null (the WebSocket isn't created until after the async buildUrl() resolves), so both invoke connect(). The second call increments the generation and invalidates the first, so only one socket is ever opened — but both getAuthToken() fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on handleResume would avoid the wasted work.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts
Line: 216-230

Comment:
**Concurrent resume events trigger parallel `getAuthToken()` calls**

When the device wakes, `visibilitychange` and `online` can fire in the same event loop turn. Both calls to `handleResume` see `this.socket === null` (the WebSocket isn't created until after the async `buildUrl()` resolves), so both invoke `connect()`. The second call increments the generation and invalidates the first, so only one socket is ever opened — but both `getAuthToken()` fetches run concurrently and the first one's result is silently discarded. A lightweight in-flight flag on `handleResume` would avoid the wasted work.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts">

<violation number="1" location="apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts:103">
P2: This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.

(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

let url: string;
try {
url = await this.buildUrl();
} catch {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.

(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/src/app/workspaces/[workspaceId]/components/WebTerminal/TerminalConnection.ts, line 103:

<comment>This catch block swallows connection-setup errors and immediately retries, which makes persistent auth/URL failures hard to diagnose in production. Log or otherwise surface the caught error before scheduling reconnect.

(Based on your team's feedback about handling async failures explicitly and avoiding silent catches.) </comment>

<file context>
@@ -0,0 +1,237 @@
+		let url: string;
+		try {
+			url = await this.buildUrl();
+		} catch {
+			if (generation !== this.generation || this.disposed) return;
+			this.scheduleReconnect();
</file context>

@saddlepaddle saddlepaddle merged commit 05deba5 into main May 18, 2026
24 of 25 checks passed
@saddlepaddle saddlepaddle deleted the fix/web-websocket-lifecycle branch May 18, 2026 14:45
sazabi Bot pushed a commit that referenced this pull request May 20, 2026
…4685)

Mobile Chrome freezes backgrounded tabs and closes their websockets.
WebTerminal opened a single one-shot socket whose onclose only set
state to "error" — no reconnect, no recovery path — so terminals
stayed dead after minimizing and reopening the browser.

Extract the socket lifecycle into TerminalConnection: exponential-
backoff reconnect on unexpected close, plus visibilitychange /
pageshow / resume / online listeners that reconnect immediately when
the page comes back. The server already keys sessions by terminalId
and adopts/respawns the PTY on reattach, so reopening the same URL
resumes the session; ?replay=0 after first bytes avoids re-dumping
scrollback xterm already holds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant