fix(host-service): unstrand tunnel reconnect; wire relay Sentry#4537
Conversation
Two recent crashes left no trace because RELAY_SENTRY_DSN was never deployed. Adds Sentry Logs via consoleLoggingIntegration, plus process- and Hono-level error handlers that keep the relay alive instead of exiting, so a single bug doesn't drop every host tunnel.
WebSocket.close(1001) is server-reserved per the WHATWG spec; Undici throws InvalidAccessError when clients use it. The throw bubbled out of socket.onclose before scheduleReconnect ran, stranding any host with active local channels after a relay disconnect — users had to relaunch the desktop app to recover.
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughEnables Sentry console logs and uncaught-exception integration; adds process-level suppressed logging for uncaught exceptions/rejections (no Sentry capture there); centralizes Hono request error reporting to Sentry and returns JSON 500; makes tunnel WebSocket cleanup close with code 1000 and ignore close-time errors. ChangesError Handling and Resilience
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a real reconnect-stranding bug in the host-side tunnel client (
Confidence Score: 3/5The tunnel-client fix is safe to merge on its own, but the relay Sentry changes may not achieve their stated goal of keeping the process alive on uncaught exceptions. The tunnel fix resolves the reconnect bug cleanly. The relay Sentry wiring has a real defect: the array form of integrations adds on top of defaults, so the default onUncaughtExceptionIntegration fires alongside the new no-exit one and may still call process.exit() unconditionally. apps/relay/src/sentry.ts needs the function form of integrations to remove the default OnUncaughtException and OnUnhandledRejection copies before adding the customised replacements.
|
| Filename | Overview |
|---|---|
| packages/host-service/src/tunnel/tunnel-client.ts | Fixes reconnect-stranding bug by changing ws.close() code from 1001 to 1000 and wrapping in try/catch; the root-cause fix is sound |
| apps/relay/src/sentry.ts | Adds consoleLoggingIntegration and onUncaughtExceptionIntegration, but both are auto-enabled by default in @sentry/node so the array form installs duplicates rather than replacing the defaults |
| apps/relay/src/index.ts | Adds process-level uncaughtException/unhandledRejection handlers and app.onError hook; may not prevent exit if default Sentry integrations still call process.exit() unconditionally |
Sequence Diagram
sequenceDiagram
participant R as Relay
participant TC as TunnelClient (host)
participant LC as LocalChannel WS
R->>TC: drops WebSocket (relay deploy / crash)
TC->>TC: socket.onclose fires
TC->>TC: stopWatchdog()
TC->>TC: cleanupChannels()
loop each active local channel
TC->>LC: ws.close(1000, Tunnel disconnected) [was 1001 threw InvalidAccessError]
LC-->>TC: ok or caught exception
end
TC->>TC: scheduleReconnect() now always reached
TC->>R: reconnect after backoff delay
R-->>TC: new WebSocket accepted
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/relay/src/sentry.ts:15-22
Both `onUncaughtExceptionIntegration` and `onUnhandledRejectionIntegration` are **auto-enabled by default** in `@sentry/node`. Passing them in the `integrations: [...]` array form adds a second copy on top of the defaults rather than replacing them. The default copy has `exitEvenIfOtherHandlersAreRegistered: true`, so it will still call `process.exit()` unconditionally — directly defeating the intent of this PR. Every uncaught exception will also be captured twice by Sentry (once by each instance). Use the function form to filter out the defaults first, then supply the customised replacements.
```suggestion
integrations: (defaults) => [
...defaults.filter(
(i) =>
i.name !== "OnUncaughtException" &&
i.name !== "OnUnhandledRejection",
),
Sentry.consoleLoggingIntegration({
levels: ["log", "info", "warn", "error"],
}),
Sentry.onUncaughtExceptionIntegration({
exitEvenIfOtherHandlersAreRegistered: false,
}),
Sentry.onUnhandledRejectionIntegration({
mode: "none",
}),
],
```
### Issue 2 of 2
packages/host-service/src/tunnel/tunnel-client.ts:303-310
The empty `catch {}` silently discards any error that isn't the now-fixed 1001 code issue. Since the primary fix is the `1000` code change, the try/catch is purely defensive, but a silent swallow means future `ws.close()` problems on cleanup will be invisible in logs.
```suggestion
private cleanupChannels(): void {
for (const channel of this.localChannels.values()) {
try {
channel.ws.close(1000, "Tunnel disconnected");
} catch (err) {
console.warn("[host-service:tunnel] error closing local channel ws", err);
}
}
this.localChannels.clear();
}
```
Reviews (1): Last reviewed commit: "fix(host-service): use valid close code ..." | Re-trigger Greptile
| integrations: [ | ||
| Sentry.consoleLoggingIntegration({ | ||
| levels: ["log", "info", "warn", "error"], | ||
| }), | ||
| Sentry.onUncaughtExceptionIntegration({ | ||
| exitEvenIfOtherHandlersAreRegistered: false, | ||
| }), | ||
| ], |
There was a problem hiding this comment.
Both
onUncaughtExceptionIntegration and onUnhandledRejectionIntegration are auto-enabled by default in @sentry/node. Passing them in the integrations: [...] array form adds a second copy on top of the defaults rather than replacing them. The default copy has exitEvenIfOtherHandlersAreRegistered: true, so it will still call process.exit() unconditionally — directly defeating the intent of this PR. Every uncaught exception will also be captured twice by Sentry (once by each instance). Use the function form to filter out the defaults first, then supply the customised replacements.
| integrations: [ | |
| Sentry.consoleLoggingIntegration({ | |
| levels: ["log", "info", "warn", "error"], | |
| }), | |
| Sentry.onUncaughtExceptionIntegration({ | |
| exitEvenIfOtherHandlersAreRegistered: false, | |
| }), | |
| ], | |
| integrations: (defaults) => [ | |
| ...defaults.filter( | |
| (i) => | |
| i.name !== "OnUncaughtException" && | |
| i.name !== "OnUnhandledRejection", | |
| ), | |
| Sentry.consoleLoggingIntegration({ | |
| levels: ["log", "info", "warn", "error"], | |
| }), | |
| Sentry.onUncaughtExceptionIntegration({ | |
| exitEvenIfOtherHandlersAreRegistered: false, | |
| }), | |
| Sentry.onUnhandledRejectionIntegration({ | |
| mode: "none", | |
| }), | |
| ], |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/relay/src/sentry.ts
Line: 15-22
Comment:
Both `onUncaughtExceptionIntegration` and `onUnhandledRejectionIntegration` are **auto-enabled by default** in `@sentry/node`. Passing them in the `integrations: [...]` array form adds a second copy on top of the defaults rather than replacing them. The default copy has `exitEvenIfOtherHandlersAreRegistered: true`, so it will still call `process.exit()` unconditionally — directly defeating the intent of this PR. Every uncaught exception will also be captured twice by Sentry (once by each instance). Use the function form to filter out the defaults first, then supply the customised replacements.
```suggestion
integrations: (defaults) => [
...defaults.filter(
(i) =>
i.name !== "OnUncaughtException" &&
i.name !== "OnUnhandledRejection",
),
Sentry.consoleLoggingIntegration({
levels: ["log", "info", "warn", "error"],
}),
Sentry.onUncaughtExceptionIntegration({
exitEvenIfOtherHandlersAreRegistered: false,
}),
Sentry.onUnhandledRejectionIntegration({
mode: "none",
}),
],
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Verified against @sentry/core@10.46.0 source (integration.js: filterDuplicates + getIntegrationsToSetup): the array form does dedupe by name, and a user instance replaces a default of the same name (defaults are tagged isDefaultInstance = true; the merge rule "never want a default instance to overwrite an existing user instance" runs in our favor since user integrations come after defaults in the merged array).
Also verified @sentry/node-core@9.47.1 onUncaughtExceptionIntegration source: its built-in default for exitEvenIfOtherHandlersAreRegistered is false, not true. So even the default wouldn't unconditionally exit.
So the array form is fine here — single instance with our config, no exit unless no other handlers are attached.
You did sniff out a real adjacent issue though: Sentry's integration unconditionally calls captureException itself, and our process.on('uncaughtException'|'unhandledRejection') handlers were also calling captureSentryException — double captures. Fixed in 9827893 by slimming the handlers to just log (they still need to exist so Sentry's "other handlers attached" check stays true, keeping the process alive).
| private cleanupChannels(): void { | ||
| for (const channel of this.localChannels.values()) { | ||
| channel.ws.close(1001, "Tunnel disconnected"); | ||
| try { | ||
| channel.ws.close(1000, "Tunnel disconnected"); | ||
| } catch {} | ||
| } | ||
| this.localChannels.clear(); | ||
| } |
There was a problem hiding this comment.
The empty
catch {} silently discards any error that isn't the now-fixed 1001 code issue. Since the primary fix is the 1000 code change, the try/catch is purely defensive, but a silent swallow means future ws.close() problems on cleanup will be invisible in logs.
| private cleanupChannels(): void { | |
| for (const channel of this.localChannels.values()) { | |
| channel.ws.close(1001, "Tunnel disconnected"); | |
| try { | |
| channel.ws.close(1000, "Tunnel disconnected"); | |
| } catch {} | |
| } | |
| this.localChannels.clear(); | |
| } | |
| private cleanupChannels(): void { | |
| for (const channel of this.localChannels.values()) { | |
| try { | |
| channel.ws.close(1000, "Tunnel disconnected"); | |
| } catch (err) { | |
| console.warn("[host-service:tunnel] error closing local channel ws", err); | |
| } | |
| } | |
| this.localChannels.clear(); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/tunnel/tunnel-client.ts
Line: 303-310
Comment:
The empty `catch {}` silently discards any error that isn't the now-fixed 1001 code issue. Since the primary fix is the `1000` code change, the try/catch is purely defensive, but a silent swallow means future `ws.close()` problems on cleanup will be invisible in logs.
```suggestion
private cleanupChannels(): void {
for (const channel of this.localChannels.values()) {
try {
channel.ws.close(1000, "Tunnel disconnected");
} catch (err) {
console.warn("[host-service:tunnel] error closing local channel ws", err);
}
}
this.localChannels.clear();
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — applied in 9827893. With consoleLoggingIntegration wired, that console.warn flows to Sentry Logs automatically.
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
Sentry's onUncaughtException/onUnhandledRejection integrations already call captureException, so the process.on handlers were double-capturing. Slim them to just log — they still exist so Sentry's "other handlers exist" check keeps the process alive. Also surface unexpected channel close errors via console.warn (which flows to Sentry Logs now).
Changes since v0.2.16: - host-service: connect-phase deadline + reconnect-guaranteed onclose so the tunnel can't get stranded mid-handshake; failed connects always schedule a retry. (#4539) - host-service: unstrand tunnel reconnect path and wire relay Sentry so reconnect failures surface instead of silently looping. (#4537) - build: bundle @mastra/duckdb and the @duckdb/node-bindings-* natives across darwin-arm64 / darwin-x64 / linux-x64 / linux-arm64 targets so duckdb-backed code paths work in the standalone CLI bundle. Push cli-v0.2.17 after this lands to fire the release pipeline.
Changes since v0.2.16: - host-service: connect-phase deadline + reconnect-guaranteed onclose so the tunnel can't get stranded mid-handshake; failed connects always schedule a retry. (#4539) - host-service: unstrand tunnel reconnect path and wire relay Sentry so reconnect failures surface instead of silently looping. (#4537) - build: bundle @mastra/duckdb and the @duckdb/node-bindings-* natives across darwin-arm64 / darwin-x64 / linux-x64 / linux-arm64 targets so duckdb-backed code paths work in the standalone CLI bundle. Push cli-v0.2.17 after this lands to fire the release pipeline.
…rset-sh#4537) * chore(relay): wire Sentry logs and uncaught-throw safety net Two recent crashes left no trace because RELAY_SENTRY_DSN was never deployed. Adds Sentry Logs via consoleLoggingIntegration, plus process- and Hono-level error handlers that keep the relay alive instead of exiting, so a single bug doesn't drop every host tunnel. * fix(host-service): use valid close code in tunnel cleanupChannels WebSocket.close(1001) is server-reserved per the WHATWG spec; Undici throws InvalidAccessError when clients use it. The throw bubbled out of socket.onclose before scheduleReconnect ran, stranding any host with active local channels after a relay disconnect — users had to relaunch the desktop app to recover. * review: drop duplicate Sentry capture; log cleanupChannels failures Sentry's onUncaughtException/onUnhandledRejection integrations already call captureException, so the process.on handlers were double-capturing. Slim them to just log — they still exist so Sentry's "other handlers exist" check keeps the process alive. Also surface unexpected channel close errors via console.warn (which flows to Sentry Logs now).
Summary
Investigating why my hosts stopped reconnecting after a relay deploy, found two issues — one a real bug, one missing observability.
Bug fix: host tunnel never reconnects after relay drops it (when active local channels exist).
TunnelClient.cleanupChannelscallschannel.ws.close(1001, ...). 1001 is server-reserved per the WHATWG WebSocket spec; Undici (Node's built-in WebSocket) throwsInvalidAccessErrorwhen clients send it. The throw bubbled out ofsocket.onclosebeforescheduleReconnect()ran, leaving the TunnelClient in a permanent dead state —socket=null,connecting=false,reconnectTimer=null,closed=false. Only relaunching the desktop app (= freshconnectRelay()) recovered it. Only hits hosts with active local channels at disconnect time, which is why idle hosts in the relay logs looked fine.Caught via a real
[host-service] uncaughtException — staying up { DOMException [InvalidAccessError]: invalid code }in~/.superset/host/<org>/host-service.logafter the deploy.Observability: wire relay Sentry.
Relay had
@sentry/nodeinitialized butRELAY_SENTRY_DSNwas never set in Fly secrets, so two recent crashes (exit_code=1, not OOM) left no stack trace. Adds:consoleLoggingIntegrationso everyconsole.*(including Hono request log lines, with tokens already redacted byredactingLogger) ships to Sentry Logs.onUncaughtExceptionIntegration({ exitEvenIfOtherHandlersAreRegistered: false })+ top-levelprocess.on('uncaughtException'|'unhandledRejection')handlers — log + suppress instead of exiting, so a single uncaught throw doesn't drop every active tunnel.app.onError()Hono hook to capture errors thrown out of route handlers (Hono swallows these into a default 500 otherwise).RELAY_SENTRY_DSNhas been staged + deployed tosuperset-relayalready.Test plan
bun run --filter=@superset/relay typecheckpassesbun run --filter=@superset/host-service typecheckpassesbun run lintclean/healthreturns 200 in ~70ms; no Sentry init errors in fresh logs[host-service:tunnel] reconnecting in Xmsfollowed byconnected to relay for host ...)Adjacent finds (not in this PR)
apps/relay/src/tunnel.ts:171has the sameclientWs.close(1001, reason)pattern. Relay uses thewsnpm package via@hono/node-ws, which is more permissive than Undici, so it likely doesn't throw — but worth verifying.fly.tomlcomments reference a 6-region scale-out but actual state is one machine, so every relay deploy is a 100% outage window.socket.onclosebody in try/catch so any future throw inside it can't strand the client.Summary by cubic
Fixes a host reconnect bug and wires relay Sentry to keep tunnels stable and errors observable. Hosts now reconnect after relay restarts, and relay errors are captured without taking the process down.
Bug Fixes
scheduleReconnect().New Features
@sentry/nodewith console logging and uncaught-exception integrations; added Honoapp.onErrorto capture route errors.uncaughtException/unhandledRejectionhandlers now log only (avoids double-capture); relay stays up. EnabledRELAY_SENTRY_DSNonsuperset-relay.Written for commit 9827893. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Improvements