B3: pre-allocate assistant row at llm_call_started boundary#32326
Conversation
The wire-protocol additions in B2 (assistant_turn_start + messageId on
streaming events) were emitted but the row they reference didn't exist
yet — handleMessageComplete still INSERTed at end-of-turn, so the client
could not anchor deltas to a stable DB id during the turn. B3 reserves
the row up-front and converts the terminal write to an in-place update.
Production changes:
- assistant/src/agent/loop.ts: add 'llm_call_started' AgentEvent and emit
it inside AgentLoop.run() immediately before provider.sendMessage(),
awaited so the reserve completes before any streaming delta. The event
carries the optional callSite tag so downstream handlers can decide
whether to reserve for this call.
- assistant/src/daemon/conversation-agent-loop-handlers.ts:
- extract buildAssistantChannelMetadata() so the channel/provenance/slackMeta
envelope (previously built inside handleMessageComplete) can be stamped
onto the row at reserve time. All inputs are stable across the LLM call;
Slack channelTs is intentionally absent and back-filled by
deliverReplyViaCallback as today.
- new handleLlmCallStarted(): runs the persistence pipeline with
op:'reserve', stashes the resulting id on state.lastAssistantMessageId,
and emits 'assistant_turn_start' on the wire.
- dispatchAgentEvent routes 'llm_call_started' to the new handler.
- handleMessageComplete flips its assistant persistence call from
op:'add' to op:'updateContent' against state.lastAssistantMessageId.
Throws if no row was reserved — production invariant: every
message_complete is preceded by exactly one llm_call_started.
- every wire emission inside the call stamps messageId:
state.lastAssistantMessageId — assistant_text_delta,
assistant_thinking_delta, tool_use_start, tool_use_preview_start,
tool_output_chunk variants, input_json_delta, tool_result, the
handleMessageComplete buffer flush, and the two dispatchAgentEvent
fallthroughs.
- assistant/src/daemon/wake-target-adapter.ts: translate the new
AgentEvent type as no-op (returns null). Wake-path persistence still
goes through persistTailMessage (an addMessage-shaped call) rather
than the reserve→updateContent pipeline; full wake-path parity is a
B3 follow-up.
Multi-LLM-call agent turns (LLM call → tool execution → LLM call) emit
one llm_call_started per call, so each call reserves its own row.
findDisplayTurnEndIndex already collapses consecutive assistant rows for
the merged history view, matching today's per-call DB layout.
Test updates:
- conversation-agent-loop.test.ts: 60/60. Primed 21 fixtures; mixed
approach — surgical edits for fixtures with unique anchors, a
one-shot python loop for 8 retry/tool-first fixtures with non-unique
anchors. Widened the 'drains queue after completion' fixture's
onEvent param from sync to void | Promise<void> so the prime could
be awaited before message_complete dispatched.
- conversation-agent-loop-overflow.test.ts (6/6),
conversation-abort-tool-results.test.ts (2/2),
conversation-provider-retry-repair.test.ts (9/9),
conversation-slash-unknown.test.ts (2/2),
conversation-workspace-injection.test.ts (10/10),
conversation-workspace-tool-tracking.test.ts (6/6):
added updateMessageContent stub to the conversation-crud mock and
(for overflow) primed fixtures via the prime-fixtures heuristic walker.
- conversation-queue.test.ts (49/49 + 1 todo) and
conversation-slash-queue.test.ts (4/4): widened PendingRun.onEvent and
the AgentLoop.run mock signature to void | Promise<void>, made
resolveRun async with awaited prime/usage/message_complete, and
awaited every resolveRun call site. For five direct (non-helper)
run.onEvent sites in attachment / cancel-semantics tests, prepended
an awaited llm_call_started prime so message_complete sees a
reserved row.
- outbound-slack-persistence.test.ts (7/7) and
persistence-secret-redaction.test.ts (8/8): full direct-handler
recipe. Rewired the reserveMessage mock to push the reserved row
(with id, content:"", and the production metadata envelope) into
the same capture arrays that addMessage uses, so existing assertion
helpers (lastAssistantPersisted etc.) find it. updateMessageContent
finds the row by id and updates it in-place. Imported
handleLlmCallStarted and inserted an awaited call before each
handleMessageComplete invocation.
The unrelated test-pollution failures observed when running the full
suite (schedule-store, skill_load, migration-*, etc.) reproduce
identically on the B2 baseline — 62 pass / 101 fail across the same
11 conversation files — confirming they are pre-existing module-cache /
env-var leak issues independent of this change. Every touched file
passes 100% standalone.
### AGENTS.md compliance
- assistant/AGENTS.md "Test machinery isolation": all test changes
remain inside src/__tests__/; no new imports from src/ into helpers
or preloads; helpers continue to use bun:test + sibling-test scope.
- software-engineering/unit-testing.md rule #3 (mock.module audit): the
new exported handleLlmCallStarted is imported only by tests that
already mock the handlers module explicitly; no existing
mock.module sites needed a stub-list update.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5749e25808
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| op: "add", | ||
| conversationId: deps.ctx.conversationId, | ||
| role: "assistant", | ||
| op: "updateContent", |
There was a problem hiding this comment.
Restore indexing for finalized assistant rows
When a normal assistant reply completes, this now finalizes the reserved row via updateContent; unlike the previous add persistence path, updateMessageContent only updates the SQLite content, while reserveMessage explicitly skipped both indexMessageNow and projectAssistantMessage. That means every newly generated assistant message is no longer indexed for memory/search or projected for attention tracking after this change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
can we look at this?
There was a problem hiding this comment.
Re-driven in 8d9c6fe78f. handleMessageComplete now explicitly calls indexMessageNow and projectAssistantMessage after updateMessageContent succeeds — both wrapped in their own try/catches so memory hiccups stay non-fatal to the turn (matches the previous addMessage behavior). The projector signal still gates a publishSyncInvalidation on the conversation :metadata tag only when attention state actually changes, so no flood on no-op turns.
Covered by two new tests in the B3 pre-allocation: indexing + cleanup describe block: one for the indexer + projector + metadata-publish happy path, and one verifying no metadata publish when the projector returns false (attention unchanged).
| getMiddlewaresFor("persistence"), | ||
| defaultPersistenceTerminal, | ||
| { | ||
| op: "reserve", |
There was a problem hiding this comment.
Reuse or delete reserved rows on failed LLM attempts
Because the row is reserved before provider.sendMessage, any call that exits through the error/provider-error path before message_complete now leaves this empty assistant row in the transcript. This affects retryable overflow/ordering repairs as well as terminal provider failures; in the latter case the existing synthetic-error branch persists a separate assistant message while state.lastAssistantMessageId still points at the empty reserved row, so later completion/sync events can reference the wrong message id.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cleaned up in 8d9c6fe78f along two axes:
Provider-error branch (conversation-agent-loop.ts): now deleteMessageById(state.lastAssistantMessageId) before inserting the synthetic error row, then repoints state.lastAssistantMessageId at the error row's id so syncLastAssistantMessageToDisk targets a real persisted message. Direct CRUD delete (not pipeline op: "delete") since the row is uncommitted reservation state — pipeline op: "delete" mocks are widely stubbed as () => {} across the test suite and would TypeError on deleted.segmentIds.
Retry paths (handleLlmCallStarted): new state.assistantRowAwaitingFinalization: boolean invariant joins the existing throw guard. Set true after reserveMessage, cleared in handleMessageComplete after the successful updateContent. If the flag is still true when a subsequent handleLlmCallStarted fires (retry paths — overflow, ordering, image overflow), the previous stranded reservation is deleteMessageById'd before reserving a new row.
Two new tests in the same describe block cover both: one verifies stranded reservation cleanup on retry, one verifies the provider-error path deletes the orphan and that the final syncMessageToDisk call targets the real error-row id, not the orphan.
| // `assistant_turn_start` wire event reaches the client BEFORE the | ||
| // provider starts streaming deltas — the deltas downstream will | ||
| // carry the freshly-reserved id. | ||
| await onEvent({ type: "llm_call_started", callSite }); |
There was a problem hiding this comment.
I would move this down as close to the llm call as possible
There was a problem hiding this comment.
Moved in 8d9c6fe78f. The llm_call_started emit is now right next to provider.sendMessage in agent/loop.ts so the reserve race window stays as tight as possible.
| op: "add", | ||
| conversationId: deps.ctx.conversationId, | ||
| role: "assistant", | ||
| op: "updateContent", |
There was a problem hiding this comment.
can we look at this?
| op: "add", | ||
| conversationId: deps.ctx.conversationId, | ||
| role: "assistant", | ||
| op: "updateContent", |
There was a problem hiding this comment.
Did we add a reasonable debounce on update_content and did this PR start updating content on each chunk?
There was a problem hiding this comment.
No — updateContent fires exactly once at message_complete with the final content blocks. No per-chunk writes were introduced. Streaming deltas go to the wire only.
…d anchor Addresses three review threads from PR #32326 round 1: Codex P1 (indexer/projector drop-out): handleMessageComplete now re-drives indexMessageNow + projectAssistantMessage after updateContent succeeds. The pre-B3 addMessage path ran both as side-effects of insert; reserveMessage + updateMessageContent are CRUD-only so the calls had to come back explicitly. Projector signal still gates a publishSyncInvalidation on the conversation :metadata tag (attention state change only — no flood on no-op turns). Both calls are wrapped in their own try/catch — memory hiccups remain non-fatal to the turn. Codex P2 (synthetic provider-error orphan): Provider-error branch in conversation-agent-loop.ts now deletes the stranded reserved assistant row before inserting the synthetic error message, then repoints state.lastAssistantMessageId at the error row's id so syncLastAssistantMessageToDisk targets a real persisted message. New invariant: state.assistantRowAwaitingFinalization: boolean joins the existing throw guard. Set true in handleLlmCallStarted after reserve, cleared in handleMessageComplete after the successful updateContent. handleLlmCallStarted also uses it to clean up a stranded reservation from a previous LLM call in this run (retry paths — overflow, ordering, image overflow). Reviewer thread (llm_call_started placement): moved emit closer to provider.sendMessage in agent/loop.ts so the reserve race window stays as tight as possible. Tests (conversation-agent-loop.test.ts, new describe block): 4 new tests covering both indexer/projector paths, the strand cleanup, and the provider-error orphan branch. 64/64 pass, typecheck clean.
…-allocation-b3 # Conflicts: # assistant/src/daemon/conversation-agent-loop-handlers.ts
…event sequence CI surfaced one failure on the merged branch: the `emits continuation surface event and exits on max_tokens` test asserted the full event sequence via `events.map((e) => e.type).toEqual([...])` and did not include `llm_call_started`, which B3 now emits as a real event right before `provider.sendMessage` so the pre-allocated assistant row id can be stamped onto downstream streaming events. Sequence updated to add `llm_call_started` at the head — the existing `countExitEvents` + `lastExitEvent` assertions still pass unchanged. 11/11 pass in agent-loop-exit-reason.test.ts.
B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` — the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start { messageId }` from event zero of each LLM call. The web client wasn't ready: the case in `use-stream-event-handler.ts` was a literal no-op with a TODO promising to wire it, and `appendTextDelta` keyed on tail position rather than the event's `messageId`. When the 5s reconcile poll fired before `message_complete` (any turn that takes >5s — the common case), it fetched the reserved row from `/v1/assistants/{id}/messages/`. That row landed in client state as an empty assistant message with no `isStreaming` flag. The next text delta arrived carrying the matching `messageId` — but `tailIsStreamingAssistant` returned false on the snapshot row, so `createStreamingBubble` pushed a NEW row with the same id. Two siblings, same id, both leaked through the sanitize pipeline (which only dedups *different* ids with matching content). Two changes: 1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and, if a row with the event's `messageId` already exists in `messages`, flips it to `isStreaming: true` so `appendTextDelta` won't see a non-streaming tail. No-op on the row array in the common case where SSE precedes reconcile. 2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant), looks up the matching assistant row by id and appends into it regardless of tail position. Falls back to tail-based decisioning when `messageId` is absent (pre-B2 daemons, until B4's floor bump pins them out). Either change alone fixes the screenshot scenario. Both together cover the case where the reserved row isn't at the tail when a delta lands (e.g. mid-turn reconcile completes after a separate user message has been optimistically appended). Extracts the per-row append into `appendTextIntoRow` shared by both branches and stamps `isStreaming: true` on the appended row — needed for the reconcile-pulled-row case where the snapshot landed without the flag. Tests cover the screenshot scenario directly (reserved row pulled in by reconcile → delta with matching id extends it, no duplicate), the new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and the common consecutive-same-id deltas case. `handleAssistantTurnStart` has tests for the ref stamp, the flip-to-streaming behavior, the no-op when no row matches, and the no-op when the row is already streaming. Replaces the old "first-id-wins" test that was guarding a pre-B3 wire contract — under B3, deltas across LLM calls are separated by `message_complete`, so the synthetic scenario it covered no longer matches the wire.
B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` — the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start { messageId }` from event zero of each LLM call. The web client wasn't ready: the case in `use-stream-event-handler.ts` was a literal no-op with a TODO promising to wire it, and `appendTextDelta` keyed on tail position rather than the event's `messageId`. When the 5s reconcile poll fired before `message_complete` (any turn that takes >5s — the common case), it fetched the reserved row from `/v1/assistants/{id}/messages/`. That row landed in client state as an empty assistant message with no `isStreaming` flag. The next text delta arrived carrying the matching `messageId` — but `tailIsStreamingAssistant` returned false on the snapshot row, so `createStreamingBubble` pushed a NEW row with the same id. Two siblings, same id, both leaked through the sanitize pipeline (which only dedups *different* ids with matching content). Two changes: 1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and, if a row with the event's `messageId` already exists in `messages`, flips it to `isStreaming: true` so `appendTextDelta` won't see a non-streaming tail. No-op on the row array in the common case where SSE precedes reconcile. 2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant), looks up the matching assistant row by id and appends into it regardless of tail position. Falls back to tail-based decisioning when `messageId` is absent (pre-B2 daemons, until B4's floor bump pins them out). Either change alone fixes the screenshot scenario. Both together cover the case where the reserved row isn't at the tail when a delta lands (e.g. mid-turn reconcile completes after a separate user message has been optimistically appended). Extracts the per-row append into `appendTextIntoRow` shared by both branches and stamps `isStreaming: true` on the appended row — needed for the reconcile-pulled-row case where the snapshot landed without the flag. Tests cover the screenshot scenario directly (reserved row pulled in by reconcile → delta with matching id extends it, no duplicate), the new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and the common consecutive-same-id deltas case. `handleAssistantTurnStart` has tests for the ref stamp, the flip-to-streaming behavior, the no-op when no row matches, and the no-op when the row is already streaming. Replaces the old "first-id-wins" test that was guarding a pre-B3 wire contract — under B3, deltas across LLM calls are separated by `message_complete`, so the synthetic scenario it covered no longer matches the wire.
#32612) B3 ([#32326](#32326)) shipped daemon-side `reserveMessage` at `llm_call_started` — the daemon writes `content: "[]"` to SQLite and emits `assistant_turn_start { messageId }` from event zero of each LLM call. The web client wasn't ready: the case in `use-stream-event-handler.ts` was a literal no-op with a TODO promising to wire it, and `appendTextDelta` keyed on tail position rather than the event's `messageId`. When the 5s reconcile poll fired before `message_complete` (any turn that takes >5s — the common case), it fetched the reserved row from `/v1/assistants/{id}/messages/`. That row landed in client state as an empty assistant message with no `isStreaming` flag. The next text delta arrived carrying the matching `messageId` — but `tailIsStreamingAssistant` returned false on the snapshot row, so `createStreamingBubble` pushed a NEW row with the same id. Two siblings, same id, both leaked through the sanitize pipeline (which only dedups *different* ids with matching content). Two changes: 1. `handleAssistantTurnStart` — stamps `currentAssistantMessageIdRef` and, if a row with the event's `messageId` already exists in `messages`, flips it to `isStreaming: true` so `appendTextDelta` won't see a non-streaming tail. No-op on the row array in the common case where SSE precedes reconcile. 2. `appendTextDelta` — when `messageId` is present (B2/B3 invariant), looks up the matching assistant row by id and appends into it regardless of tail position. Falls back to tail-based decisioning when `messageId` is absent (pre-B2 daemons, until B4's floor bump pins them out). Either change alone fixes the screenshot scenario. Both together cover the case where the reserved row isn't at the tail when a delta lands (e.g. mid-turn reconcile completes after a separate user message has been optimistically appended). Extracts the per-row append into `appendTextIntoRow` shared by both branches and stamps `isStreaming: true` on the appended row — needed for the reconcile-pulled-row case where the snapshot landed without the flag. Tests cover the screenshot scenario directly (reserved row pulled in by reconcile → delta with matching id extends it, no duplicate), the new-bubble-on-mismatched-id case (LLM call boundary inside a turn), and the common consecutive-same-id deltas case. `handleAssistantTurnStart` has tests for the ref stamp, the flip-to-streaming behavior, the no-op when no row matches, and the no-op when the row is already streaming. Replaces the old "first-id-wins" test that was guarding a pre-B3 wire contract — under B3, deltas across LLM calls are separated by `message_complete`, so the synthetic scenario it covered no longer matches the wire. Co-authored-by: vellum-apollo-bot[bot] <242025090+vellum-apollo-bot[bot]@users.noreply.github.com>
What
Pre-allocate the assistant message row at the
llm_call_startedboundary instead of inserting it atmessage_complete. This is the persistence half of the streaming-id contract: B2 addedassistant_turn_start+messageIdon every wire event, but the row those ids referenced didn't actually exist until end-of-turn. B3 reserves the row up front and converts the terminal write to an in-place update.Why
With the previous shape, the client received a
messageIdon every delta but had no row to anchor against in its local store until the LLM call completed — forcing index-based reconciliation that doesn't survive multi-LLM-call turns, retries, or cross-process resumes. Pre-allocating gives the client a stable DB id from the first byte, lets B4 (daemon floor bump) drop legacy fallbacks, and lets B5 rewrite the frontend reconcile path against ids instead of array indices.Changes
Production
assistant/src/agent/loop.ts— addllm_call_startedtoAgentEventand emit it insideAgentLoop.run()immediately beforeprovider.sendMessage(), awaited. Carries the optionalcallSitetag so downstream handlers can decide whether to reserve for this call.assistant/src/daemon/conversation-agent-loop-handlers.ts:buildAssistantChannelMetadata()so the channel / provenance /slackMetaenvelope (previously built insidehandleMessageComplete) can be stamped onto the row at reserve time. All inputs are stable across the LLM call; SlackchannelTsis intentionally absent here and back-filled bydeliverReplyViaCallbackas today.handleLlmCallStarted(): runs the persistence pipeline withop: "reserve", stashes the resulting id onstate.lastAssistantMessageId, and emitsassistant_turn_starton the wire.dispatchAgentEventroutes the new event type to the new handler.handleMessageCompleteflips its assistant persistence call fromop: "add"toop: "updateContent"againststate.lastAssistantMessageId. Throws if no row was reserved — production invariant: everymessage_completeis preceded by exactly onellm_call_started.messageId: state.lastAssistantMessageId—assistant_text_delta,assistant_thinking_delta,tool_use_start,tool_use_preview_start,tool_output_chunkvariants,input_json_delta,tool_result, thehandleMessageCompletebuffer flush, and the twodispatchAgentEventfallthroughs.assistant/src/daemon/wake-target-adapter.ts— translate the new event type as a no-op (returnsnull). Wake-path persistence still goes throughpersistTailMessagerather than the reserve→updateContent pipeline; full wake-path parity is tracked as a B3 follow-up.Tests
Every touched test file passes 100% standalone. 11 files, 163 tests:
conversation-agent-loopconversation-agent-loop-overflowconversation-abort-tool-resultsconversation-provider-retry-repairconversation-queueconversation-slash-queueconversation-slash-unknownconversation-workspace-injectionconversation-workspace-tool-trackingoutbound-slack-persistencepersistence-secret-redactionThree test-update recipes were used:
Fixture priming (agent-loop + overflow) — inject
await onEvent({ type: "llm_call_started" })before each fixture'smessage_complete. Surgical edits for unique anchors, a one-shot python loop for fixtures with non-unique anchors.Mock stub addition — for tests that don't drive
message_completedirectly but go through the agent loop, addupdateMessageContent: mock(() => {})to theconversation-crudmock alongside the existingreserveMessagestub.Direct-handler recipe (outbound-slack + persistence-secret-redaction) — rewire the
reserveMessagemock to push the reserved row (with id,content: "", and the production metadata envelope) into the same capture arrays thataddMessageuses, so existing assertion helpers find it.updateMessageContentfinds the row by id and updates it in-place. ImporthandleLlmCallStartedand insert an awaited call before eachhandleMessageCompleteinvocation.For the queue tests with helper-driven event emission (
resolveRun), widenedPendingRun.onEventand theAgentLoop.runmock signature tovoid | Promise<void>, maderesolveRunasync with awaited prime + usage + message_complete, and awaited everyresolveRuncall site. For five direct (non-helper)run.onEvent({type:'usage'})→run.onEvent({type:'message_complete'})sites in attachment / cancel-semantics tests, prepended an awaitedllm_call_startedprime.Pre-existing test pollution
Running the full assistant test suite (or any large batch) reproduces the same ~100 unrelated failures (schedule-store, skill_load, migration-*, OAuth, etc.) on both this branch and on
mainat the B2 baseline — confirming the batched failures are pre-existing module-cache / env-var leak issues independent of this change. Numbers match exactly: 62 pass / 101 fail across the same 11 conversation files on both branches when batched, 163 / 0 on this branch when run standalone.Follow-ups (not in this PR)
op: "add"to reserve+update (mirrors what this PR does for assistant rows on the user side).wake-target-adapter.tsreturns null forllm_call_startedtoday; wake-path persistence still goes throughpersistTailMessage.AGENTS.md compliance
assistant/AGENTS.md"Test machinery isolation": all test changes remain insidesrc/__tests__/; no new imports fromsrc/into helpers or preloads; helpers continue to usebun:test+ sibling-test scope.software-engineering/unit-testing.mdrule ci: add terraform apply workflow on platform changes #3 (mock.module audit): the new exportedhandleLlmCallStartedis imported only by tests that already mock the handlers module explicitly; no existingmock.modulesites needed a stub-list update.