Skip to content

chore(wire): drop displayMessageId from message_complete + generation_handoff#32012

Merged
dvargasfuertes merged 1 commit into
mainfrom
apollo/collapse-assistant-message-id
May 25, 2026
Merged

chore(wire): drop displayMessageId from message_complete + generation_handoff#32012
dvargasfuertes merged 1 commit into
mainfrom
apollo/collapse-assistant-message-id

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

Workstream: Chat Message State Slimming → PR 2b.1 (1 of 3 in the wire/handler split that replaces #31994).

What

The terminal turn events (message_complete, generation_handoff) currently carry two id fields:

// daemon (today)
{
  messageId: state.lastAssistantMessageId,                              // most recent persisted row
  displayMessageId: state.firstAssistantMessageId ?? lastAssistantMessageId,  // merged-history anchor
}

The web client always prefers displayMessageId via the fallback event.displayMessageId ?? event.messageId. That fallback is the contract that keeps the live-stream id aligned with the merged history row the /v1/assistants/{id}/messages/ route returns. Two fields, one effective meaning.

This PR collapses them to one. The daemon now sends messageId = anchor directly; the wire displayMessageId field is removed. The web side drops the fallback.

Why

When a single agent turn makes multiple LLM calls, the daemon persists multiple assistant rows in messages and mergeConsecutiveAssistantMessages collapses them at read time to the first row (the "anchor"). The wire was leaking that pre-merge / post-merge ambiguity to the client. After this PR, messageId on terminal events is unambiguously the id the merged history view returns — same id whether you came from SSE or refresh.

Scope discipline

  • No state-shape changes on either side. firstAssistantMessageId / lastAssistantMessageId continue to live on EventHandlerState. Only the choice of which one gets stamped onto outbound events shifts.
  • No new fields anywhere. This is a pure collapse.
  • finalizeMessageComplete's param is still named displayMessageId?: even though it's now fed event.messageId. That rename + the function body refactor (role-based branch) is queued as the next PR in this split. Keeping this diff surgical.

Changes

Daemon — wire types (assistant/src/daemon/message-types/)

  • messages.ts: MessageComplete drops displayMessageId?:. JSDoc on messageId?: updated to call out anchor semantics.
  • conversations.ts: same for GenerationHandoff.

Daemon — emit sites (assistant/src/daemon/conversation-agent-loop.ts:3289-3333)

  • messageId now sourced from clientDisplayMessageId (= getClientDisplayMessageId(state)) for both terminal events. The separate displayMessageId spread blocks are gone.

Web — types + parser (apps/web/src/domains/chat/api/)

  • event-types.ts: MessageCompleteEvent and GenerationHandoffEvent drop displayMessageId?:.
  • event-parser.ts: stops parsing the legacy field.
  • Two new backwards-compat tests prove inbound legacy displayMessageId is silently ignored without breaking the rest of the payload (useful during rollout if a stale daemon is paired with a fresh web).

Web — handler (apps/web/src/domains/chat/utils/stream-handlers/message-handlers.ts)

  • handleMessageComplete and handleGenerationHandoff stop applying event.displayMessageId ?? event.messageId. They now pass event.messageId straight through.
  • Diagnostic field renamed displayMessageIdmessageId.

Testing

  • assistant tsc: ✅ clean
  • apps/web tsc: ✅ clean (pre-existing generated-API errors only — identical to main)
  • apps/web tests: ✅ 173/173 across event-parser.test.ts, stream-message-updaters.test.ts, reconcile.test.ts
  • assistant message-complete-display-id.test.ts: ✅ 1/1
  • assistant runtime-events-sse-parity.test.ts: ✅ 14/14
  • assistant annotate-risk-options.test.ts: ✅ 4/4
  • assistant conversation-queue.test.ts: ✅ 49/49 + 1 todo

What's next (not in this PR)

This is part of a 3-PR sequence replacing the original #31994:

  1. #this PR — Drop displayMessageId from wire (this).
  2. PR 2b.2finalizeMessageComplete simplified to a role-based branch. Tail user → push new assistant bubble. Tail assistant → append to existing tail, keep tail.id. displayMessageId param renamed to messageId (clean up the dissonance introduced here).
  3. PR 2b.3 — Stop calling startReconciliationLoop(epoch) from handleMessageComplete / handleGenerationHandoff. Reconcile only on initial load, conversation switch, and SSE reconnect (industry-consensus pattern). Kills the rerender shake.

Replaces

#31994 — drafted as reference. That PR added state (assistantTurnId, anchorContentWritten) which contradicted the workstream goal of slimming state, and bundled wire + impl + web changes too aggressively. This split lands the same end-state with zero state additions.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6147541994

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

finalizeMessageComplete(prev, {
content: event.content,
displayMessageId,
displayMessageId: event.messageId,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve legacy display-message fallback during rollout

Using event.messageId directly here drops compatibility with older daemons that still send messageId as the last persisted row and displayMessageId as the merged-history anchor. In a mixed-version deployment (fresh web, stale daemon), multi-call turns can finalize the live bubble under the wrong id, which then causes reconciliation/refresh to mismatch rows (duplicate or misplaced assistant content/attachments). Keep the fallback (displayMessageId ?? messageId) until daemon and web upgrades are guaranteed atomic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codex is right, this is a regression until we remove displayMessageId in finalizeMessageComplete. let's update this to finalizeMessageComplete(prev, event) and continue with the other improvements we identified in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5e23f80. Daemon reverted to messageId: state.lastAssistantMessageId. Web finalizeMessageComplete(prev, event) is now role-based: tail user → new bubble; tail assistant → keep tail.id. appendTextDelta line 119 flipped to id: last.id ?? messageId so the first id locks in across multi-LLM-call turns — the anchor is preserved client-side by arrival order.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5e23f80. Daemon reverted to messageId: state.lastAssistantMessageId. Web finalizeMessageComplete(prev, event) is now role-based: tail user → new bubble; tail assistant → keep tail.id. appendTextDelta line 119 flipped to id: last.id ?? messageId so the first id locks in across multi-LLM-call turns — the anchor is preserved client-side by arrival order.

finalizeMessageComplete(prev, {
content: event.content,
displayMessageId,
displayMessageId: event.messageId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codex is right, this is a regression until we remove displayMessageId in finalizeMessageComplete. let's update this to finalizeMessageComplete(prev, event) and continue with the other improvements we identified in this PR

const displayMessageId = event.displayMessageId ?? event.messageId;
ctx.setMessages((prev) => stopStreaming(prev, { displayMessageId }));
ctx.setMessages((prev) =>
stopStreaming(prev, { displayMessageId: event.messageId }),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, we should do stopStreaming(prev, event) and make sure we're not reading displayMessageId

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reply to Vargas comment 3299172021 (line 125 / stopStreaming): addressed in 5e23f80stopStreaming(prev, event) accepts the event but never reads displayMessageId (or anything else). Just flips isStreaming: false on the streaming assistant tail and keeps tail.id.

Comment on lines -3300 to -3302
...(state.lastAssistantMessageId
? { messageId: state.lastAssistantMessageId }
: {}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should still be state.lastAssistantMessageId, delete clientDisplayMessageId support in this method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5e23f80. Reverted both emit sites to messageId: state.lastAssistantMessageId and dropped clientDisplayMessageId + the getClientDisplayMessageId helper. With that helper gone, firstAssistantMessageId on EventHandlerState became dead state — removed from the type, init, and assignment site (state-slimming win). Anchor preservation now lives entirely on the web side via appendTextDelta's first-id-wins lock + role-based finalizeMessageComplete.

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor Author

Acknowledged on all three. The full picture I missed when separating these:

Why the daemon-side anchor is going away: With the role-based finalizeMessageComplete(prev, event) from the next PR, the anchor preservation moves to the client — naturally, by order of arrival. The daemon stops needing to compute it.

Why merging is necessary now (not just nice-to-have): Without the finalizeMessageComplete(prev, event) refactor, reverting the daemon would regress multi-LLM-call turns — bubble.id would stamp the inner row id and mismatch the merged server row. With the refactor, finalizeMessageComplete keeps tail.id when tail is assistant, so the anchor sticks.

One missing piece I want to flag: appendTextDelta:119 today does id: messageId ?? last.id — overwriting bubble.id on every text_delta. In a multi-LLM-call turn this flips bubble.id from row-A to row-B between calls. Combined with "keep tail.id at finalize", that would leave bubble.id = row-B (the inner row), not row-A (the anchor). I'll flip that line to id: last.id ?? messageId (first id wins, never overwritten) so the anchor is preserved by arrival order through the whole turn. Same one-line fix, no scope creep.

This PR will then carry:

  1. Revert daemon emit sites to messageId: state.lastAssistantMessageId. Drop clientDisplayMessageId import + local computation.
  2. appendTextDelta line 119: id: last.id ?? messageId (anchor preservation).
  3. finalizeMessageComplete(prev, event) rewritten role-based: tail user → new bubble with event.messageId; tail assistant → finalize tool calls + isStreaming:false, append content/attachments, keep tail.id.
  4. stopStreaming(prev, event) simplified similarly.
  5. handleMessageCompletefinalizeMessageComplete(prev, event). handleGenerationHandoffstopStreaming(prev, event).

Rolling now.

@vellum-apollo-bot vellum-apollo-bot Bot force-pushed the apollo/collapse-assistant-message-id branch from 6147541 to 5e23f80 Compare May 25, 2026 17:01
@vellum-apollo-bot
Copy link
Copy Markdown
Contributor Author

Rolled — 5e23f80. Summary of the merged-scope PR:

Daemon revert (per line 3302 feedback):

  • Both emit sites back to messageId: state.lastAssistantMessageId. No anchor logic on the wire anymore.
  • Dropped clientDisplayMessageId local + getClientDisplayMessageId import. With the helper gone, firstAssistantMessageId on EventHandlerState became dead state — removed from type / init / assignment too. Pure state-slimming win.

Web — anchor preservation moves client-side:

  1. appendTextDelta:119 — flipped id: messageId ?? last.idid: last.id ?? messageId. First id wins, never overwritten. Multi-LLM-call turns now lock bubble.id to the first text_delta's id naturally.
  2. finalizeMessageComplete(prev, event) — role-based: tail user → new bubble with event.messageId; tail assistant → finalize + keep tail.id. Mirrors the daemon's server-side merge by arrival order.
  3. stopStreaming(prev, event) — accepts the event but reads nothing from it. Just flips isStreaming.
  4. Handlers use the new signatures directly — no id-conversion glue.

Tests: new coverage for the role-based shape, the appendTextDelta anchor lock, multi-LLM-call append, and inbound legacy-displayMessageId tolerance. Obsolete message-complete-display-id daemon test deleted (its subject is gone).

Quality gates: assistant tsc clean. Web tsc clean (only the same pre-existing generated-API errors that exist on main). 43/43 stream-message-updaters tests pass. 71/71 event-parser tests pass. Broader chat-domain suite shows the same 12 fail / 2 errors baseline as main — pre-existing isolation flakiness, not introduced here.

PR queue update: 2b.3 (remove startReconciliationLoop on message_complete) stays separate.

Copy link
Copy Markdown
Contributor

@dvargasfuertes dvargasfuertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix failing CI runs

Comment on lines +120 to +124
// First id wins — never overwritten. In a multi-LLM-call turn the
// daemon advances its `messageId` between calls (each call persists a
// new assistant row), but the merged history view collapses them to
// the first row's id. Locking bubble.id to the first id keeps the
// client view aligned with that merge.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped in af96713.

Comment on lines +222 to +231
opts?: { displayMessageId?: string },
_event?: GenerationHandoffEvent,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can delete this arg now actually

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in af96713stopStreaming(prev) is now arg-free. Handler call site updated; _event stays on the handleGenerationHandoff signature since the dispatcher passes it in, but underscored to mark it unused.

Today the daemon emits two ids on each terminal turn event:
- `messageId` = the most recent persisted assistant row id
- `displayMessageId` = the merged-history anchor (first row id)

The web client always prefers `displayMessageId` via the fallback
`event.displayMessageId ?? event.messageId`. That fallback is the contract
that keeps live-stream ids aligned with the post-merge history rows the
messages route returns. Two fields, one effective meaning.

Collapse to one — and move anchor preservation entirely client-side:

**Wire.** Drop `displayMessageId` from `MessageComplete` + `GenerationHandoff`
on both daemon and web types. The daemon now emits the single
`messageId: state.lastAssistantMessageId` (no anchor logic), shedding
`firstAssistantMessageId` from `EventHandlerState` and the
`getClientDisplayMessageId` helper that was its sole reader.

**appendTextDelta.** Flip `id: messageId ?? last.id` to
`id: last.id ?? messageId` — first id wins, never overwritten. In a
multi-LLM-call turn the daemon advances its row id between calls, but the
bubble's id stays locked to the first id seen. This is the anchor.

**finalizeMessageComplete.** Rewritten role-based — accepts the event
directly (`finalizeMessageComplete(prev, event)`):
- tail user (or empty) → push a new finalized assistant bubble with
  `event.messageId`
- tail assistant → flip `isStreaming: false`, finalize running tool calls,
  merge in content/attachments, **keep `tail.id`**. Subsequent
  `message_complete` events from later LLM calls in the same agent turn
  fold into the same bubble — mirroring the daemon's server-side merge.

**stopStreaming.** Same simplification — accepts the event, flips
`isStreaming: false`, never stamps `event.messageId` onto the bubble.

**Handlers.** `handleMessageComplete` → `finalizeMessageComplete(prev, event)`.
`handleGenerationHandoff` → `stopStreaming(prev, event)`. No id-conversion
glue at the call sites anymore.

Tests cover the new role-based shape, the anchor-lock in `appendTextDelta`,
the multi-LLM-call append case, and inbound legacy-`displayMessageId`
tolerance.

Workstream: Chat Message State Slimming (PR 2b.1, now also folding in 2b.2
per review feedback on the prior commit).
@vellum-apollo-bot vellum-apollo-bot Bot force-pushed the apollo/collapse-assistant-message-id branch from 5e23f80 to af96713 Compare May 25, 2026 18:57
@dvargasfuertes dvargasfuertes merged commit 1e3b2ad into main May 25, 2026
18 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/collapse-assistant-message-id branch May 25, 2026 19:13
vellum-apollo-bot Bot pushed a commit that referenced this pull request May 25, 2026
…iant

PR 2b.3 of the chat-state-slimming workstream. With the anchor invariant
landed in 2b.1 (#32012) — assistant emits messageId = state.lastAssistantMessageId,
client preserves the anchor via first-id-wins lock in appendTextDelta and
role-based finalize in finalizeMessageComplete — the post-turn reconciliation
loop that fired on every message_complete and activity_state idle is now
redundant. The live client state is already correct; the refetch existed
only to paper over id drift.

Removes startReconciliationLoop calls from:
- handleAssistantActivityState (idle phase)
- handleMessageComplete

Drops the now-unused epoch parameter from both handler signatures.

The dispatcher still uses epoch for stale-epoch guards on the dispatch
side, and reconciliation still fires on load / switch / SSE reopen /
POST-resolve confirmation. cancelReconciliation calls remain since
those still-spawned loops can be cancelled by text deltas and
generation handoff.
dvargasfuertes pushed a commit that referenced this pull request May 25, 2026
…iant (#32024)

PR 2b.3 of the chat-state-slimming workstream. With the anchor invariant
landed in 2b.1 (#32012) — assistant emits messageId = state.lastAssistantMessageId,
client preserves the anchor via first-id-wins lock in appendTextDelta and
role-based finalize in finalizeMessageComplete — the post-turn reconciliation
loop that fired on every message_complete and activity_state idle is now
redundant. The live client state is already correct; the refetch existed
only to paper over id drift.

Removes startReconciliationLoop calls from:
- handleAssistantActivityState (idle phase)
- handleMessageComplete

Drops the now-unused epoch parameter from both handler signatures.

The dispatcher still uses epoch for stale-epoch guards on the dispatch
side, and reconciliation still fires on load / switch / SSE reopen /
POST-resolve confirmation. cancelReconciliation calls remain since
those still-spawned loops can be cancelled by text deltas and
generation handoff.

Co-authored-by: vellum-apollo-bot[bot] <206299977+vellum-apollo-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant