Skip to content

B3: pre-allocate assistant row at llm_call_started boundary#32326

Merged
dvargasfuertes merged 4 commits into
mainfrom
apollo/agent-loop-pre-allocation-b3
May 29, 2026
Merged

B3: pre-allocate assistant row at llm_call_started boundary#32326
dvargasfuertes merged 4 commits into
mainfrom
apollo/agent-loop-pre-allocation-b3

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

What

Pre-allocate the assistant message row at the llm_call_started boundary instead of inserting it at message_complete. This is the persistence half of the streaming-id contract: B2 added assistant_turn_start + messageId on every wire event, but the row those ids referenced didn't actually exist until end-of-turn. B3 reserves the row up front and converts the terminal write to an in-place update.

Why

With the previous shape, the client received a messageId on every delta but had no row to anchor against in its local store until the LLM call completed — forcing index-based reconciliation that doesn't survive multi-LLM-call turns, retries, or cross-process resumes. Pre-allocating gives the client a stable DB id from the first byte, lets B4 (daemon floor bump) drop legacy fallbacks, and lets B5 rewrite the frontend reconcile path against ids instead of array indices.

Changes

Production

  • assistant/src/agent/loop.ts — add llm_call_started to AgentEvent and emit it inside AgentLoop.run() immediately before provider.sendMessage(), awaited. Carries the optional callSite tag so downstream handlers can decide whether to reserve for this call.

  • assistant/src/daemon/conversation-agent-loop-handlers.ts:

    • Extract buildAssistantChannelMetadata() so the channel / provenance / slackMeta envelope (previously built inside handleMessageComplete) can be stamped onto the row at reserve time. All inputs are stable across the LLM call; Slack channelTs is intentionally absent here and back-filled by deliverReplyViaCallback as today.
    • New handleLlmCallStarted(): runs the persistence pipeline with op: "reserve", stashes the resulting id on state.lastAssistantMessageId, and emits assistant_turn_start on the wire.
    • dispatchAgentEvent routes the new event type to the new handler.
    • handleMessageComplete flips its assistant persistence call from op: "add" to op: "updateContent" against state.lastAssistantMessageId. Throws if no row was reserved — production invariant: every message_complete is preceded by exactly one llm_call_started.
    • Every wire emission inside the call stamps messageId: state.lastAssistantMessageIdassistant_text_delta, assistant_thinking_delta, tool_use_start, tool_use_preview_start, tool_output_chunk variants, input_json_delta, tool_result, the handleMessageComplete buffer flush, and the two dispatchAgentEvent fallthroughs.
  • assistant/src/daemon/wake-target-adapter.ts — translate the new event type as a no-op (returns null). Wake-path persistence still goes through persistTailMessage rather than the reserve→updateContent pipeline; full wake-path parity is tracked as a B3 follow-up.

Tests

Every touched test file passes 100% standalone. 11 files, 163 tests:

File Result
conversation-agent-loop 60 ✓
conversation-agent-loop-overflow 6 ✓
conversation-abort-tool-results 2 ✓
conversation-provider-retry-repair 9 ✓
conversation-queue 49 ✓ (+ 1 todo)
conversation-slash-queue 4 ✓
conversation-slash-unknown 2 ✓
conversation-workspace-injection 10 ✓
conversation-workspace-tool-tracking 6 ✓
outbound-slack-persistence 7 ✓
persistence-secret-redaction 8 ✓

Three test-update recipes were used:

  1. Fixture priming (agent-loop + overflow) — inject await onEvent({ type: "llm_call_started" }) before each fixture's message_complete. Surgical edits for unique anchors, a one-shot python loop for fixtures with non-unique anchors.

  2. Mock stub addition — for tests that don't drive message_complete directly but go through the agent loop, add updateMessageContent: mock(() => {}) to the conversation-crud mock alongside the existing reserveMessage stub.

  3. Direct-handler recipe (outbound-slack + persistence-secret-redaction) — rewire the reserveMessage mock to push the reserved row (with id, content: "", and the production metadata envelope) into the same capture arrays that addMessage uses, so existing assertion helpers find it. updateMessageContent finds the row by id and updates it in-place. Import handleLlmCallStarted and insert an awaited call before each handleMessageComplete invocation.

For the queue tests with helper-driven event emission (resolveRun), widened PendingRun.onEvent and the AgentLoop.run mock signature to void | Promise<void>, made resolveRun async with awaited prime + usage + message_complete, and awaited every resolveRun call site. For five direct (non-helper) run.onEvent({type:'usage'})run.onEvent({type:'message_complete'}) sites in attachment / cancel-semantics tests, prepended an awaited llm_call_started prime.

Pre-existing test pollution

Running the full assistant test suite (or any large batch) reproduces the same ~100 unrelated failures (schedule-store, skill_load, migration-*, OAuth, etc.) on both this branch and on main at the B2 baseline — confirming the batched failures are pre-existing module-cache / env-var leak issues independent of this change. Numbers match exactly: 62 pass / 101 fail across the same 11 conversation files on both branches when batched, 163 / 0 on this branch when run standalone.

Follow-ups (not in this PR)

  • User row flip from op: "add" to reserve+update (mirrors what this PR does for assistant rows on the user side).
  • Full wake-path pre-allocation paritywake-target-adapter.ts returns null for llm_call_started today; wake-path persistence still goes through persistTailMessage.
  • B4 — daemon floor bump, hard cutover (removes legacy fallbacks).
  • B5 — frontend reconcile rewrite against ids instead of array indices (absorbs PR 2c).

AGENTS.md compliance

  • assistant/AGENTS.md "Test machinery isolation": all test changes remain inside src/__tests__/; no new imports from src/ into helpers or preloads; helpers continue to use bun:test + sibling-test scope.
  • software-engineering/unit-testing.md rule ci: add terraform apply workflow on platform changes #3 (mock.module audit): the new exported handleLlmCallStarted is imported only by tests that already mock the handlers module explicitly; no existing mock.module sites needed a stub-list update.

The wire-protocol additions in B2 (assistant_turn_start + messageId on
streaming events) were emitted but the row they reference didn't exist
yet — handleMessageComplete still INSERTed at end-of-turn, so the client
could not anchor deltas to a stable DB id during the turn. B3 reserves
the row up-front and converts the terminal write to an in-place update.

Production changes:

- assistant/src/agent/loop.ts: add 'llm_call_started' AgentEvent and emit
  it inside AgentLoop.run() immediately before provider.sendMessage(),
  awaited so the reserve completes before any streaming delta. The event
  carries the optional callSite tag so downstream handlers can decide
  whether to reserve for this call.

- assistant/src/daemon/conversation-agent-loop-handlers.ts:
  - extract buildAssistantChannelMetadata() so the channel/provenance/slackMeta
    envelope (previously built inside handleMessageComplete) can be stamped
    onto the row at reserve time. All inputs are stable across the LLM call;
    Slack channelTs is intentionally absent and back-filled by
    deliverReplyViaCallback as today.
  - new handleLlmCallStarted(): runs the persistence pipeline with
    op:'reserve', stashes the resulting id on state.lastAssistantMessageId,
    and emits 'assistant_turn_start' on the wire.
  - dispatchAgentEvent routes 'llm_call_started' to the new handler.
  - handleMessageComplete flips its assistant persistence call from
    op:'add' to op:'updateContent' against state.lastAssistantMessageId.
    Throws if no row was reserved — production invariant: every
    message_complete is preceded by exactly one llm_call_started.
  - every wire emission inside the call stamps messageId:
    state.lastAssistantMessageId — assistant_text_delta,
    assistant_thinking_delta, tool_use_start, tool_use_preview_start,
    tool_output_chunk variants, input_json_delta, tool_result, the
    handleMessageComplete buffer flush, and the two dispatchAgentEvent
    fallthroughs.

- assistant/src/daemon/wake-target-adapter.ts: translate the new
  AgentEvent type as no-op (returns null). Wake-path persistence still
  goes through persistTailMessage (an addMessage-shaped call) rather
  than the reserve→updateContent pipeline; full wake-path parity is a
  B3 follow-up.

Multi-LLM-call agent turns (LLM call → tool execution → LLM call) emit
one llm_call_started per call, so each call reserves its own row.
findDisplayTurnEndIndex already collapses consecutive assistant rows for
the merged history view, matching today's per-call DB layout.

Test updates:

- conversation-agent-loop.test.ts: 60/60. Primed 21 fixtures; mixed
  approach — surgical edits for fixtures with unique anchors, a
  one-shot python loop for 8 retry/tool-first fixtures with non-unique
  anchors. Widened the 'drains queue after completion' fixture's
  onEvent param from sync to void | Promise<void> so the prime could
  be awaited before message_complete dispatched.

- conversation-agent-loop-overflow.test.ts (6/6),
  conversation-abort-tool-results.test.ts (2/2),
  conversation-provider-retry-repair.test.ts (9/9),
  conversation-slash-unknown.test.ts (2/2),
  conversation-workspace-injection.test.ts (10/10),
  conversation-workspace-tool-tracking.test.ts (6/6):
  added updateMessageContent stub to the conversation-crud mock and
  (for overflow) primed fixtures via the prime-fixtures heuristic walker.

- conversation-queue.test.ts (49/49 + 1 todo) and
  conversation-slash-queue.test.ts (4/4): widened PendingRun.onEvent and
  the AgentLoop.run mock signature to void | Promise<void>, made
  resolveRun async with awaited prime/usage/message_complete, and
  awaited every resolveRun call site. For five direct (non-helper)
  run.onEvent sites in attachment / cancel-semantics tests, prepended
  an awaited llm_call_started prime so message_complete sees a
  reserved row.

- outbound-slack-persistence.test.ts (7/7) and
  persistence-secret-redaction.test.ts (8/8): full direct-handler
  recipe. Rewired the reserveMessage mock to push the reserved row
  (with id, content:"", and the production metadata envelope) into
  the same capture arrays that addMessage uses, so existing assertion
  helpers (lastAssistantPersisted etc.) find it. updateMessageContent
  finds the row by id and updates it in-place. Imported
  handleLlmCallStarted and inserted an awaited call before each
  handleMessageComplete invocation.

The unrelated test-pollution failures observed when running the full
suite (schedule-store, skill_load, migration-*, etc.) reproduce
identically on the B2 baseline — 62 pass / 101 fail across the same
11 conversation files — confirming they are pre-existing module-cache /
env-var leak issues independent of this change. Every touched file
passes 100% standalone.

### AGENTS.md compliance

- assistant/AGENTS.md "Test machinery isolation": all test changes
  remain inside src/__tests__/; no new imports from src/ into helpers
  or preloads; helpers continue to use bun:test + sibling-test scope.

- software-engineering/unit-testing.md rule #3 (mock.module audit): the
  new exported handleLlmCallStarted is imported only by tests that
  already mock the handlers module explicitly; no existing
  mock.module sites needed a stub-list update.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5749e25808

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

op: "add",
conversationId: deps.ctx.conversationId,
role: "assistant",
op: "updateContent",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore indexing for finalized assistant rows

When a normal assistant reply completes, this now finalizes the reserved row via updateContent; unlike the previous add persistence path, updateMessageContent only updates the SQLite content, while reserveMessage explicitly skipped both indexMessageNow and projectAssistantMessage. That means every newly generated assistant message is no longer indexed for memory/search or projected for attention tracking after this change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we look at this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-driven in 8d9c6fe78f. handleMessageComplete now explicitly calls indexMessageNow and projectAssistantMessage after updateMessageContent succeeds — both wrapped in their own try/catches so memory hiccups stay non-fatal to the turn (matches the previous addMessage behavior). The projector signal still gates a publishSyncInvalidation on the conversation :metadata tag only when attention state actually changes, so no flood on no-op turns.

Covered by two new tests in the B3 pre-allocation: indexing + cleanup describe block: one for the indexer + projector + metadata-publish happy path, and one verifying no metadata publish when the projector returns false (attention unchanged).

getMiddlewaresFor("persistence"),
defaultPersistenceTerminal,
{
op: "reserve",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse or delete reserved rows on failed LLM attempts

Because the row is reserved before provider.sendMessage, any call that exits through the error/provider-error path before message_complete now leaves this empty assistant row in the transcript. This affects retryable overflow/ordering repairs as well as terminal provider failures; in the latter case the existing synthetic-error branch persists a separate assistant message while state.lastAssistantMessageId still points at the empty reserved row, so later completion/sync events can reference the wrong message id.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in 8d9c6fe78f along two axes:

Provider-error branch (conversation-agent-loop.ts): now deleteMessageById(state.lastAssistantMessageId) before inserting the synthetic error row, then repoints state.lastAssistantMessageId at the error row's id so syncLastAssistantMessageToDisk targets a real persisted message. Direct CRUD delete (not pipeline op: "delete") since the row is uncommitted reservation state — pipeline op: "delete" mocks are widely stubbed as () => {} across the test suite and would TypeError on deleted.segmentIds.

Retry paths (handleLlmCallStarted): new state.assistantRowAwaitingFinalization: boolean invariant joins the existing throw guard. Set true after reserveMessage, cleared in handleMessageComplete after the successful updateContent. If the flag is still true when a subsequent handleLlmCallStarted fires (retry paths — overflow, ordering, image overflow), the previous stranded reservation is deleteMessageById'd before reserving a new row.

Two new tests in the same describe block cover both: one verifies stranded reservation cleanup on retry, one verifies the provider-error path deletes the orphan and that the final syncMessageToDisk call targets the real error-row id, not the orphan.

Comment thread assistant/src/agent/loop.ts Outdated
// `assistant_turn_start` wire event reaches the client BEFORE the
// provider starts streaming deltas — the deltas downstream will
// carry the freshly-reserved id.
await onEvent({ type: "llm_call_started", callSite });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this down as close to the llm call as possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved in 8d9c6fe78f. The llm_call_started emit is now right next to provider.sendMessage in agent/loop.ts so the reserve race window stays as tight as possible.

op: "add",
conversationId: deps.ctx.conversationId,
role: "assistant",
op: "updateContent",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we look at this?

op: "add",
conversationId: deps.ctx.conversationId,
role: "assistant",
op: "updateContent",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we add a reasonable debounce on update_content and did this PR start updating content on each chunk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No — updateContent fires exactly once at message_complete with the final content blocks. No per-chunk writes were introduced. Streaming deltas go to the wire only.

…d anchor

Addresses three review threads from PR #32326 round 1:

Codex P1 (indexer/projector drop-out): handleMessageComplete now re-drives
indexMessageNow + projectAssistantMessage after updateContent succeeds. The
pre-B3 addMessage path ran both as side-effects of insert; reserveMessage +
updateMessageContent are CRUD-only so the calls had to come back explicitly.
Projector signal still gates a publishSyncInvalidation on the conversation
:metadata tag (attention state change only — no flood on no-op turns). Both
calls are wrapped in their own try/catch — memory hiccups remain non-fatal
to the turn.

Codex P2 (synthetic provider-error orphan): Provider-error branch in
conversation-agent-loop.ts now deletes the stranded reserved assistant row
before inserting the synthetic error message, then repoints
state.lastAssistantMessageId at the error row's id so
syncLastAssistantMessageToDisk targets a real persisted message.

New invariant: state.assistantRowAwaitingFinalization: boolean joins the
existing throw guard. Set true in handleLlmCallStarted after reserve,
cleared in handleMessageComplete after the successful updateContent.
handleLlmCallStarted also uses it to clean up a stranded reservation from a
previous LLM call in this run (retry paths — overflow, ordering, image
overflow).

Reviewer thread (llm_call_started placement): moved emit closer to provider.sendMessage in
agent/loop.ts so the reserve race window stays as tight as possible.

Tests (conversation-agent-loop.test.ts, new describe block): 4 new tests
covering both indexer/projector paths, the strand cleanup, and the
provider-error orphan branch. 64/64 pass, typecheck clean.
…-allocation-b3

# Conflicts:
#	assistant/src/daemon/conversation-agent-loop-handlers.ts
…event sequence

CI surfaced one failure on the merged branch: the `emits continuation
surface event and exits on max_tokens` test asserted the full event sequence
via `events.map((e) => e.type).toEqual([...])` and did not include
`llm_call_started`, which B3 now emits as a real event right before
`provider.sendMessage` so the pre-allocated assistant row id can be
stamped onto downstream streaming events.

Sequence updated to add `llm_call_started` at the head — the existing
`countExitEvents` + `lastExitEvent` assertions still pass unchanged.

11/11 pass in agent-loop-exit-reason.test.ts.
@dvargasfuertes dvargasfuertes merged commit 9ad2c1b into main May 29, 2026
13 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/agent-loop-pre-allocation-b3 branch May 29, 2026 12:16
vellum-apollo-bot Bot added a commit that referenced this pull request May 30, 2026
B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` —
the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start
{ messageId }` from event zero of each LLM call. The web client wasn't
ready: the case in `use-stream-event-handler.ts` was a literal no-op with
a TODO promising to wire it, and `appendTextDelta` keyed on tail position
rather than the event's `messageId`.

When the 5s reconcile poll fired before `message_complete` (any turn that
takes >5s — the common case), it fetched the reserved row from
`/v1/assistants/{id}/messages/`. That row landed in client state as an
empty assistant message with no `isStreaming` flag. The next text delta
arrived carrying the matching `messageId` — but `tailIsStreamingAssistant`
returned false on the snapshot row, so `createStreamingBubble` pushed a
NEW row with the same id. Two siblings, same id, both leaked through the
sanitize pipeline (which only dedups *different* ids with matching
content).

Two changes:

1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and,
   if a row with the event's `messageId` already exists in `messages`,
   flips it to `isStreaming: true` so `appendTextDelta` won't see a
   non-streaming tail. No-op on the row array in the common case where
   SSE precedes reconcile.

2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant),
   looks up the matching assistant row by id and appends into it
   regardless of tail position. Falls back to tail-based decisioning
   when `messageId` is absent (pre-B2 daemons, until B4's floor bump
   pins them out).

Either change alone fixes the screenshot scenario. Both together cover
the case where the reserved row isn't at the tail when a delta lands
(e.g. mid-turn reconcile completes after a separate user message has
been optimistically appended).

Extracts the per-row append into `appendTextIntoRow` shared by both
branches and stamps `isStreaming: true` on the appended row — needed for
the reconcile-pulled-row case where the snapshot landed without the
flag.

Tests cover the screenshot scenario directly (reserved row pulled in
by reconcile → delta with matching id extends it, no duplicate), the
new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and
the common consecutive-same-id deltas case. `handleAssistantTurnStart`
has tests for the ref stamp, the flip-to-streaming behavior, the no-op
when no row matches, and the no-op when the row is already streaming.

Replaces the old "first-id-wins" test that was guarding a pre-B3 wire
contract — under B3, deltas across LLM calls are separated by
`message_complete`, so the synthetic scenario it covered no longer
matches the wire.
vellum-apollo-bot Bot added a commit that referenced this pull request May 30, 2026
B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` —
the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start
{ messageId }` from event zero of each LLM call. The web client wasn't
ready: the case in `use-stream-event-handler.ts` was a literal no-op with
a TODO promising to wire it, and `appendTextDelta` keyed on tail position
rather than the event's `messageId`.

When the 5s reconcile poll fired before `message_complete` (any turn that
takes >5s — the common case), it fetched the reserved row from
`/v1/assistants/{id}/messages/`. That row landed in client state as an
empty assistant message with no `isStreaming` flag. The next text delta
arrived carrying the matching `messageId` — but `tailIsStreamingAssistant`
returned false on the snapshot row, so `createStreamingBubble` pushed a
NEW row with the same id. Two siblings, same id, both leaked through the
sanitize pipeline (which only dedups *different* ids with matching
content).

Two changes:

1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and,
   if a row with the event's `messageId` already exists in `messages`,
   flips it to `isStreaming: true` so `appendTextDelta` won't see a
   non-streaming tail. No-op on the row array in the common case where
   SSE precedes reconcile.

2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant),
   looks up the matching assistant row by id and appends into it
   regardless of tail position. Falls back to tail-based decisioning
   when `messageId` is absent (pre-B2 daemons, until B4's floor bump
   pins them out).

Either change alone fixes the screenshot scenario. Both together cover
the case where the reserved row isn't at the tail when a delta lands
(e.g. mid-turn reconcile completes after a separate user message has
been optimistically appended).

Extracts the per-row append into `appendTextIntoRow` shared by both
branches and stamps `isStreaming: true` on the appended row — needed for
the reconcile-pulled-row case where the snapshot landed without the
flag.

Tests cover the screenshot scenario directly (reserved row pulled in
by reconcile → delta with matching id extends it, no duplicate), the
new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and
the common consecutive-same-id deltas case. `handleAssistantTurnStart`
has tests for the ref stamp, the flip-to-streaming behavior, the no-op
when no row matches, and the no-op when the row is already streaming.

Replaces the old "first-id-wins" test that was guarding a pre-B3 wire
contract — under B3, deltas across LLM calls are separated by
`message_complete`, so the synthetic scenario it covered no longer
matches the wire.
dvargasfuertes pushed a commit that referenced this pull request May 30, 2026
#32612)

B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` —
the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start
{ messageId }` from event zero of each LLM call. The web client wasn't
ready: the case in `use-stream-event-handler.ts` was a literal no-op with
a TODO promising to wire it, and `appendTextDelta` keyed on tail position
rather than the event's `messageId`.

When the 5s reconcile poll fired before `message_complete` (any turn that
takes >5s — the common case), it fetched the reserved row from
`/v1/assistants/{id}/messages/`. That row landed in client state as an
empty assistant message with no `isStreaming` flag. The next text delta
arrived carrying the matching `messageId` — but `tailIsStreamingAssistant`
returned false on the snapshot row, so `createStreamingBubble` pushed a
NEW row with the same id. Two siblings, same id, both leaked through the
sanitize pipeline (which only dedups *different* ids with matching
content).

Two changes:

1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and,
   if a row with the event's `messageId` already exists in `messages`,
   flips it to `isStreaming: true` so `appendTextDelta` won't see a
   non-streaming tail. No-op on the row array in the common case where
   SSE precedes reconcile.

2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant),
   looks up the matching assistant row by id and appends into it
   regardless of tail position. Falls back to tail-based decisioning
   when `messageId` is absent (pre-B2 daemons, until B4's floor bump
   pins them out).

Either change alone fixes the screenshot scenario. Both together cover
the case where the reserved row isn't at the tail when a delta lands
(e.g. mid-turn reconcile completes after a separate user message has
been optimistically appended).

Extracts the per-row append into `appendTextIntoRow` shared by both
branches and stamps `isStreaming: true` on the appended row — needed for
the reconcile-pulled-row case where the snapshot landed without the
flag.

Tests cover the screenshot scenario directly (reserved row pulled in
by reconcile → delta with matching id extends it, no duplicate), the
new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and
the common consecutive-same-id deltas case. `handleAssistantTurnStart`
has tests for the ref stamp, the flip-to-streaming behavior, the no-op
when no row matches, and the no-op when the row is already streaming.

Replaces the old "first-id-wins" test that was guarding a pre-B3 wire
contract — under B3, deltas across LLM calls are separated by
`message_complete`, so the synthetic scenario it covered no longer
matches the wire.

Co-authored-by: vellum-apollo-bot[bot] <242025090+vellum-apollo-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant