fix(relay): terminal WS URL prefix + pin to one fly machine#3599
Conversation
new URL(path, hostUrl) resolves absolute paths against the origin, dropping hostUrl's path component. For remote workspaces hostUrl is "https://relay.superset.sh/hosts/<hostId>"; building the terminal WS URL via new URL("/terminal/<id>", hostUrl) yielded "wss://relay.superset.sh/terminal/<id>", which the relay 404s (only /hosts/:hostId/* is routed). Swap to string concat so the prefix survives.
TunnelManager.tunnels is an in-process Map — with 2+ replicas a proxy request routed to the replica that doesn't own the tunnel returns 503 "Host not connected". Prod was running 2 machines and serving ~half of all /hosts/:hostId/* traffic as 503. Scaled down live; this commit codifies it so the next deploy doesn't recreate the second machine. Longer-term fix (shared tunnel state via Redis pub/sub) tracked in SUPER-427.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughTwo configuration and logic adjustments were made: a Fly.io deployment constraint limiting concurrent machines to one, and a modification to websocket URL construction in the workspace client provider using string concatenation instead of URL resolution. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR delivers two targeted production fixes for the relay service: a WHATWG URL semantics bug that caused terminal WebSocket connections to 404, and a Fly.io multi-machine configuration issue that caused ~50% of proxy traffic to fail with 503s. Changes:
Confidence Score: 5/5Safe to merge — both fixes are surgical, well-evidenced by Fly logs, and directly address confirmed production failures with no regressions. The URL fix is correct: all call-sites pass an absolute path and hostUrl has no trailing slash, so concatenation produces the right URL. max_machines_running = 1 is the correct Fly.io knob and is consistent with min_machines_running = 1. No new logic paths, no changed interfaces. No files require special attention.
|
| Filename | Overview |
|---|---|
| packages/workspace-client/src/providers/WorkspaceClientProvider/WorkspaceClientProvider.tsx | Fixes terminal WS URL construction: replaces new URL(path, hostUrl) with new URL(${hostUrl}${path}) so the relay's /hosts/:hostId/ routing prefix is preserved. |
| apps/relay/fly.toml | Adds max_machines_running = 1 to pin the relay to a single Fly machine, preventing the ~50% proxy failure caused by TunnelManager.tunnels being a per-process Map with no shared state. |
Reviews (1): Last reviewed commit: "fix(relay): cap to one fly machine" | Re-trigger Greptile
🚀 Preview Deployment🔗 Preview Links
Preview updates automatically with new commits |
Summary
Two fixes from a relay debugging session (details in SUPER-427).
fix(workspace-client):useWorkspaceWsUrlwas building terminal WS URLs withnew URL(path, hostUrl). WHATWG URL semantics resolve absolute paths against the origin, so withhostUrl = "https://relay.superset.sh/hosts/<hostId>"andpath = "/terminal/<id>", the result waswss://relay.superset.sh/terminal/<id>— the/hosts/<hostId>prefix got stripped, and the relay returned 404 because it only routes/hosts/:hostId/*. Observed in fly logs:GET /terminal/<id> 404back-to-back with a successfulPOST /hosts/<hostId>/trpc/terminal.ensureSession 200. Swap to${hostUrl}${path}so the prefix survives.fix(relay):TunnelManager.tunnelsis a per-processMap, so with 2+ fly replicas,/hosts/:hostId/*requests landing on a machine that doesn't own the tunnel return503 "Host not connected". Prod was running 2 machines and failing ~50% of proxy traffic. Scaled live (fly scale count 1); this commit addsmax_machines_running = 1so a future deploy doesn't resurrect the second machine. Proper fix (shared tunnel state over Redis/NATS) is SUPER-427.Test plan
bun run lintcleanbun run --cwd packages/workspace-client typecheckcleanbun run --cwd apps/relay typecheckcleanGET /hosts/<hostId>/terminal/<id> 101(WS upgrade) instead ofGET /terminal/<id> 404fly status -a superset-relay)Summary by cubic
Fix terminal WebSocket routing by preserving the
/hosts/:hostIdprefix and pin the relay to a single Fly machine to prevent 404/503s. Mitigation for SUPER-427.packages/workspace-client: Build WS URLs via string concat so/hosts/:hostIdis preserved; avoids 404s from dropped prefixes.apps/relay: Setmax_machines_running = 1infly.tomlto keep a single replica and prevent 503s from split in-memory tunnel state.Written for commit 59fd756. Summary will update on new commits.
Summary by CodeRabbit