Skip to content

fix(relay): terminal WS URL prefix + pin to one fly machine#3599

Merged
saddlepaddle merged 2 commits into
mainfrom
satya/relay-ws-terminal-fix
Apr 21, 2026
Merged

fix(relay): terminal WS URL prefix + pin to one fly machine#3599
saddlepaddle merged 2 commits into
mainfrom
satya/relay-ws-terminal-fix

Conversation

@saddlepaddle
Copy link
Copy Markdown
Collaborator

@saddlepaddle saddlepaddle commented Apr 21, 2026

Summary

Two fixes from a relay debugging session (details in SUPER-427).

  • fix(workspace-client): useWorkspaceWsUrl was building terminal WS URLs with new URL(path, hostUrl). WHATWG URL semantics resolve absolute paths against the origin, so with hostUrl = "https://relay.superset.sh/hosts/<hostId>" and path = "/terminal/<id>", the result was wss://relay.superset.sh/terminal/<id> — the /hosts/<hostId> prefix got stripped, and the relay returned 404 because it only routes /hosts/:hostId/*. Observed in fly logs: GET /terminal/<id> 404 back-to-back with a successful POST /hosts/<hostId>/trpc/terminal.ensureSession 200. Swap to ${hostUrl}${path} so the prefix survives.
  • fix(relay): TunnelManager.tunnels is a per-process Map, so with 2+ fly replicas, /hosts/:hostId/* requests landing on a machine that doesn't own the tunnel return 503 "Host not connected". Prod was running 2 machines and failing ~50% of proxy traffic. Scaled live (fly scale count 1); this commit adds max_machines_running = 1 so a future deploy doesn't resurrect the second machine. Proper fix (shared tunnel state over Redis/NATS) is SUPER-427.

Test plan

  • bun run lint clean
  • bun run --cwd packages/workspace-client typecheck clean
  • bun run --cwd apps/relay typecheck clean
  • Manual: after desktop release, open a v2 remote workspace terminal — fly logs should show GET /hosts/<hostId>/terminal/<id> 101 (WS upgrade) instead of GET /terminal/<id> 404
  • Manual: confirm prod relay stays at 1 machine across next deploy (fly status -a superset-relay)

Summary by cubic

Fix terminal WebSocket routing by preserving the /hosts/:hostId prefix and pin the relay to a single Fly machine to prevent 404/503s. Mitigation for SUPER-427.

  • Bug Fixes
    • packages/workspace-client: Build WS URLs via string concat so /hosts/:hostId is preserved; avoids 404s from dropped prefixes.
    • apps/relay: Set max_machines_running = 1 in fly.toml to keep a single replica and prevent 503s from split in-memory tunnel state.

Written for commit 59fd756. Summary will update on new commits.

Summary by CodeRabbit

  • Chores
    • Updated deployment infrastructure configuration to ensure stable operation with controlled resource allocation.
    • Improved internal WebSocket connection handling.

new URL(path, hostUrl) resolves absolute paths against the origin, dropping
hostUrl's path component. For remote workspaces hostUrl is
"https://relay.superset.sh/hosts/<hostId>"; building the terminal WS URL
via new URL("/terminal/<id>", hostUrl) yielded
"wss://relay.superset.sh/terminal/<id>", which the relay 404s (only
/hosts/:hostId/* is routed). Swap to string concat so the prefix survives.
TunnelManager.tunnels is an in-process Map — with 2+ replicas a proxy
request routed to the replica that doesn't own the tunnel returns
503 "Host not connected". Prod was running 2 machines and serving ~half
of all /hosts/:hostId/* traffic as 503. Scaled down live; this commit
codifies it so the next deploy doesn't recreate the second machine.

Longer-term fix (shared tunnel state via Redis pub/sub) tracked in
SUPER-427.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 208662b1-a61a-4b91-a1ea-69dcd061a566

📥 Commits

Reviewing files that changed from the base of the PR and between 6e204ba and 59fd756.

📒 Files selected for processing (2)
  • apps/relay/fly.toml
  • packages/workspace-client/src/providers/WorkspaceClientProvider/WorkspaceClientProvider.tsx

📝 Walkthrough

Walkthrough

Two configuration and logic adjustments were made: a Fly.io deployment constraint limiting concurrent machines to one, and a modification to websocket URL construction in the workspace client provider using string concatenation instead of URL resolution.

Changes

Cohort / File(s) Summary
Deployment Configuration
apps/relay/fly.toml
Added max_machines_running = 1 constraint under [http_service] configuration to limit concurrent machine instances.
Websocket URL Construction
packages/workspace-client/src/providers/WorkspaceClientProvider/WorkspaceClientProvider.tsx
Modified useWorkspaceWsUrl to construct websocket URLs via string concatenation (new URL(\${hostUrl}${path}`)) instead of URL resolution (new URL(path, hostUrl)`). Protocol rewriting and token parameter handling remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 One machine hops along the path,
No more concurrent aftermath,
The websocket strings now concatenate,
With strings combined, we seal our fate,
Simpler routes for hops to take! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch satya/relay-ws-terminal-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@saddlepaddle saddlepaddle merged commit daf0e16 into main Apr 21, 2026
13 of 14 checks passed
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR delivers two targeted production fixes for the relay service: a WHATWG URL semantics bug that caused terminal WebSocket connections to 404, and a Fly.io multi-machine configuration issue that caused ~50% of proxy traffic to fail with 503s.

Changes:

  • WorkspaceClientProvider.tsx: useWorkspaceWsUrl previously called new URL(path, hostUrl) — because path is an absolute path (e.g. /terminal/<id>), WHATWG base-URL resolution discards the hostUrl's path component entirely, yielding wss://relay.superset.sh/terminal/<id> instead of wss://relay.superset.sh/hosts/<hostId>/terminal/<id>. The fix uses string concatenation (${hostUrl}${path}) to preserve the full prefix.
  • fly.toml: Adds max_machines_running = 1 alongside the existing min_machines_running = 1 to pin prod to one machine as a stopgap until shared tunnel state (SUPER-427) is implemented.

Confidence Score: 5/5

Safe to merge — both fixes are surgical, well-evidenced by Fly logs, and directly address confirmed production failures with no regressions.

The URL fix is correct: all call-sites pass an absolute path and hostUrl has no trailing slash, so concatenation produces the right URL. max_machines_running = 1 is the correct Fly.io knob and is consistent with min_machines_running = 1. No new logic paths, no changed interfaces.

No files require special attention.

Important Files Changed

Filename Overview
packages/workspace-client/src/providers/WorkspaceClientProvider/WorkspaceClientProvider.tsx Fixes terminal WS URL construction: replaces new URL(path, hostUrl) with new URL(${hostUrl}${path}) so the relay's /hosts/:hostId/ routing prefix is preserved.
apps/relay/fly.toml Adds max_machines_running = 1 to pin the relay to a single Fly machine, preventing the ~50% proxy failure caused by TunnelManager.tunnels being a per-process Map with no shared state.

Reviews (1): Last reviewed commit: "fix(relay): cap to one fly machine" | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

🚀 Preview Deployment

🔗 Preview Links

Service Status Link
Neon Database (Neon) View Branch
Fly.io Electric (Fly.io) View App
Vercel API (Vercel) Open Preview
Vercel Web (Vercel) Open Preview
Vercel Marketing (Vercel) Open Preview
Vercel Admin (Vercel) Open Preview
Vercel Docs (Vercel) Open Preview

Preview updates automatically with new commits

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant