Skip to content

fix(server): close SSE bridge registration-order races + wire abort into MCP heartbeat#864

Merged
buremba merged 2 commits into
mainfrom
fix/sse-bridge-registration-order
May 18, 2026
Merged

fix(server): close SSE bridge registration-order races + wire abort into MCP heartbeat#864
buremba merged 2 commits into
mainfrom
fix/sse-bridge-registration-order

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 18, 2026

Follow-up to #845. Codex audit caught three race windows the original PR missed. Shipped as one bundled fix.

Findings addressed

1. packages/server/src/gateway/gateway/index.ts:239 — worker SSE

bindRequestAbortToStream(requestSignal, stream) was called BEFORE the async pauseWorker / registerWorker / resumeWorker block, but the sseWriter.onClose cleanup subscriber that removes the writer from WorkerConnectionManager wasn't registered until after that async work (around :280). If requestSignal aborted during the async setup window, the stream was aborted but no cleanup subscriber existed yet — a dead writer could still be added at :266 and never removed.

Fix: register the abort/close cleanup subscriber FIRST (guarding against the dead-writer-add), THEN bind the abort bridge, THEN do the async setup. Cleanup is guarded with an idempotent latch so it's safe to fire before or after the writer is added. Post-await checkpoints short-circuit when the latch has tripped.

2. packages/server/src/gateway/routes/public/agent.ts:905 — agent events SSE

Same shape — sseManager.addConnection(...) + initial writeSSE / backlog writes happened before the stream.onAbort(cleanup) + bridge registration at :933+. If abort or a write failure happened in that window, the manager registration was leaked.

Fix: register cleanup + bind the abort bridge before adding to SseManager. Same idempotent latch pattern; connectionAdded flag gates whether removeConnection runs.

3. packages/server/src/mcp-handler.ts:474withSSEHeartbeat

Wrapped SSE responses with a heartbeat setInterval and returned new Response(readable) without binding the request abort signal. The pre-existing close/error cleanup catches normal pipe-through closure but not abnormal disconnects (LB timeout, proxy kill, client hard-close) — the same root cause as #833 / #845.

Fix: thread the inbound request's AbortSignal into withSSEHeartbeat and bind it to the writable via the same bindRequestAbortToStream helper. The caller (handleAndMaybeConvert) passes req.signal through.

Validation

  • make typecheck — pre-existing unrelated errors on organizationId / WorkerTokenData exist on main; no new errors from this PR.
  • make build-packages — clean.
  • bun test src/__tests__/unit/sse-abort-bridge.test.ts — 8/8 pass (existing).
  • bun test src/__tests__/unit/sse-bridge-registration-order.test.ts — 7/7 pass (new regression coverage for each route's registration-order fix + a direct check that withSSEHeartbeat clears its interval on abrupt abort).
  • Broader bun test src/__tests__/unit/ — 184 pass, 0 fail, 16 skip.

Notes

  • The CI check-drift failure (owletto submodule behind) is unrelated and not blocking.
  • No backwards-compat shims, no @deprecated.

…nto MCP heartbeat

Follow-up to #845 — codex audit caught three race windows the original PR
missed.

1. `gateway/gateway/index.ts`: `WorkerGateway.handleStreamConnection`
   registered the `sseWriter.onClose` cleanup AFTER awaiting
   `pauseWorker` / `addConnection` / `registerWorker`. An abort fired in
   that window left a dead writer registered in `WorkerConnectionManager`.
   Fix: idempotent cleanup latch wired BEFORE the async setup; the abort
   bridge routes through it, and post-await checkpoints short-circuit
   when the latch tripped.

2. `gateway/routes/public/agent.ts`: the agent events SSE route called
   `sseManager.addConnection(...)` + initial `writeSSE` / backlog writes
   BEFORE wiring `stream.onAbort(cleanup)` and the abort bridge. An
   abort in that window leaked the manager registration.
   Fix: same idempotent latch — cleanup + abort bridge registered FIRST,
   manager.add() second, async writes inside the latch's try-finally.

3. `mcp-handler.ts`: `withSSEHeartbeat` wrapped SSE responses with a
   heartbeat `setInterval` but never bound the inbound request's
   `AbortSignal`. Abnormal disconnects (LB timeout, proxy kill, client
   hard-close) left the interval running forever — the same root cause
   as #833/#845.
   Fix: thread `req.signal` through `withSSEHeartbeat` and bind it to
   the writable via `bindRequestAbortToStream`.

Test plan: existing `sse-abort-bridge.test.ts` + new
`sse-bridge-registration-order.test.ts` covering each race window and a
direct check that the MCP heartbeat interval clears on abrupt abort.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Warning

Rate limit exceeded

@buremba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: cf6e32fc-d9c9-42aa-92c4-bcf8f7673918

📥 Commits

Reviewing files that changed from the base of the PR and between cc4dbe3 and 4ab3954.

📒 Files selected for processing (4)
  • packages/server/src/__tests__/unit/sse-bridge-registration-order.test.ts
  • packages/server/src/gateway/gateway/index.ts
  • packages/server/src/gateway/routes/public/agent.ts
  • packages/server/src/mcp-handler.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sse-bridge-registration-order

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

When the inbound request signal is already aborted at withSSEHeartbeat
entry, bindRequestAbortToStream() synchronously calls adapter.abort() ->
abortWriter() and latches 'terminated=true'. If setInterval() runs AFTER
the bind in that case, intervalId is undefined inside abortWriter() so
clearInterval() never fires and the heartbeat timer leaks.

Swap the order: setInterval first (intervalId defined), then bind. The
pre-aborted path now clears the interval immediately.

Regression test added covers the pre-aborted bind window.
@buremba buremba merged commit 110c046 into main May 18, 2026
19 of 20 checks passed
@buremba buremba deleted the fix/sse-bridge-registration-order branch May 18, 2026 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants