Skip to content

feat(sse): reconnect handler replays buffered events on lastSeenSeq (B7.2)#32685

Merged
dvargasfuertes merged 3 commits into
mainfrom
apollo/b7-sse-reconnect-handler
May 30, 2026
Merged

feat(sse): reconnect handler replays buffered events on lastSeenSeq (B7.2)#32685
dvargasfuertes merged 3 commits into
mainfrom
apollo/b7-sse-reconnect-handler

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

@vellum-apollo-bot vellum-apollo-bot Bot commented May 30, 2026

What

Server-side reconnect handler for the /v1/events SSE stream. The route now accepts an optional lastSeenSeq query param. When a client reconnects with a cursor scoped to a single conversation, the daemon replays any buffered events with seq > lastSeenSeq before going live. If the cursor is older than the ring's oldest entry, the connection just goes live — the client detects the gap from the next event's seq and refetches via the existing messages API.

This is Unit 2 of B7. Builds on the per-conversation seq stamping + bounded ring buffer landed in #32676 (B7.1). Unit 3 (the client side — persist lastSeenSeq in localStorage, send via query param on reconnect, detect seq jumps and refetch) is the next PR.

How

Request shape

GET /v1/events?conversationKey=<key>&lastSeenSeq=<n>

lastSeenSeq is a non-negative integer. Empty / non-integer / negative values return 400. Omitting the param falls through to the existing live-only behavior — no replay.

Replay drain in start()

When lastSeenSeq is set and the subscription is scoped to a conversation, the handler calls getReplayWindow(conversationId, lastSeenSeq) before the first heartbeat:

  • Returns an array → enqueue each event as an SSE frame, tracking the highest replayed seq as highWaterReplaySeq.
  • Returns null → cursor is older than the ring's oldest entry. Do nothing extra; connection just goes live. The client is expected to detect the gap from the seq jump on its first live event and refetch via the messages API.
  • Returns [] → cursor is in window but nothing newer; no-op.

Dedup against the live callback

broadcastMessage stamps and rings before publish, so any event publish-racing with the replay drain is guaranteed to be in the ring window we just drained. The live callback now drops any event whose seq <= highWaterReplaySeq to avoid double-delivery.

No new message type

No new ServerMessage variant — the protocol stays exactly as it was. The "I missed too much" detection lives entirely on the client side where the state machine for "should I refetch?" is more natural.

Tests

7 new tests in runtime-events-sse-reconnect.test.ts:

  1. Replay path: ring contains seqs 1..3; reconnect with lastSeenSeq=1; stream emits seq=2, seq=3, then heartbeat.
  2. Cursor-too-old path: 202 events pushed to trigger natural count-eviction (oldest becomes 3); reconnect with lastSeenSeq=0; first frame is the heartbeat (no extra signal, no replay).
  3. Omitted lastSeenSeq legacy live-only: ring has events; reconnect without cursor; stream emits heartbeat first.
  4. Dedup: replayed seq=2 followed by a live publish of the same event; live duplicate is dropped; a fresh seq=3 publish flows through.
    5–7. Malformed params: empty, non-integer (1.5), negative (-1) → BadRequestError.

Adjacent regression sweeps stay green:

  • 7 SSE/hub test files + stream-state + framing: 82/82 pass.

Follow-up (separate PR)

Noted on #32676 — the current replayable: false design (introduced in B7.1) skips ring buffering for targeted events. This means the intended recipient of a targeted publish will miss the event on reconnect. The right long-term fix is to store the publish-time targeting metadata (targetClientId, targetInterfaceId, targetCapability, excludeClientId) alongside each ring entry and re-apply the same filter at replay time. Out of scope here; tracked separately.

Out of scope

  • Client-side bookkeeping (lastSeenSeq map in localStorage, send the cursor on reconnect, detect seq jump and trigger refetch). Lands in B7.3.
  • Daemon-restart persistence of nextSeq. Not needed for the mid-turn refresh case.

…B7.2)

GET /v1/events now accepts an optional lastSeenSeq query param. When a
client reconnects with a cursor scoped to a single conversation, the
route handler drains the per-conversation ring buffer for events with
seq > lastSeenSeq before emitting the first heartbeat. When the cursor
is older than the ring's oldest entry, a single stream_resync_required
event is emitted so the client can fetch a snapshot via the normal
messages API and resume live from the next event.

A high-water dedup watermark on the live callback drops any event that
races into the subscription with seq <= the largest replayed seq --
broadcastMessage stamps and rings BEFORE publish, so in-flight events
mid-replay are guaranteed to already be in the window we just drained.

Adds the stream_resync_required ServerMessage variant in a new
message-types/stream.ts module, wired into the ServerMessage union via
message-protocol.ts. The resync event is emitted directly into the
reconnecting subscriber's stream (never via broadcastMessage), carries
no seq, and is never fanned out to other subscribers.

7 new tests cover: in-window replay, snapshot-resync fallback when the
ring has evicted past the cursor, omitted-param legacy live-only path,
dedup against a live event that duplicates a replayed seq, and three
malformed-param rejections (empty / non-integer / negative).

Follow-up (separate PR, noted on #32676): revisit the replayable:false
behavior for targeted events so the intended recipient of a targeted
publish doesn't miss it on reconnect. The right fix is to store the
publish-time targeting metadata alongside each ring entry and re-apply
the filter at replay time.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e967302456

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +474 to +475
const window = getReplayWindow(replayConversationId, lastSeenSeq);
if (window === null) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Signal resync when replay state is missing

When the assistant process restarts, or a conversation's in-memory ring has aged out and been deleted before any new event, getReplayWindow() returns [] rather than null. This path treats that as an in-window replay, sends only the heartbeat, and leaves a reconnecting client holding its old lastSeenSeq; subsequent events for the same persisted conversation restart at low seq values, so replay-aware clients have no resync signal and can miss or discard live updates. Please turn a missing/empty replay state with a nonzero cursor into stream_resync_required before going live.

Useful? React with 👍 / 👎.

… seq jump

Per review feedback: the daemon doesn't need to surface a special
'cursor too old' signal over the wire. The client already has snapshot
refetch paths (messages API), and it can detect the gap purely on its
own by comparing the seq of the first live event after reconnect
against its persisted lastSeenSeq. Removing the message type shrinks
the protocol surface and keeps the resync-from-DB policy entirely
client-side where the state machine for 'should I refetch?' is more
natural.

Changes:
- delete src/daemon/message-types/stream.ts and unwire from
  message-protocol.ts (no new ServerMessage variant)
- in /v1/events reconnect handler, when getReplayWindow returns null
  (cursor older than ring's oldest), do nothing -- connection goes
  live as if no cursor was passed
- buildAssistantEvent import on the routes file becomes unused;
  removed
- replace the 'snapshot-resync signal' test with one that asserts
  the cursor-too-old path connects live without any extra frame
  ahead of the heartbeat
- update lastSeenSeq query param description in OpenAPI

Adjacent regression sweep: 82/82 SSE/hub/stream-state/framing tests
still green.
@dvargasfuertes dvargasfuertes merged commit 96d0b3b into main May 30, 2026
13 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/b7-sse-reconnect-handler branch May 30, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant