Skip to content

fix(web): sanitize — repair dangling tool calls on older assistant messages#32018

Merged
dvargasfuertes merged 1 commit into
mainfrom
apollo/sanitize-dangling-tool-calls
May 25, 2026
Merged

fix(web): sanitize — repair dangling tool calls on older assistant messages#32018
dvargasfuertes merged 1 commit into
mainfrom
apollo/sanitize-dangling-tool-calls

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

What

Adds Hack #4repairDanglingToolCalls to sanitizeDisplayMessages.

Occasionally a tool_result SSE event is lost between the daemon and the client (network drop, reconnect race, server-side fanout glitch). The tool call stays status: "running" forever in the client's DisplayMessage[], even though the assistant clearly continued — there is a subsequent assistant message in the transcript, which is only possible if the LLM provider received the tool result on the server side.

The render layer shows these stuck calls as a permanent spinner on an older message bubble — misleading: the tool DID complete, the client just never saw the result.

How

For every assistant message that is not the last assistant in the transcript, any tool call with status === "running" is rewritten via COW to:

status:  "error"
isError: true
result:  "Tool call completed on the server, but the result never reached the client. Subsequent assistant activity confirms the tool returned — this is a client-side data loss, not a tool failure."

Predicate (must ALL hold)

  • parent message is role: "assistant",
  • parent message is not the last assistant in the array (the last one could still be streaming — its dangling tools might legitimately resolve),
  • tool call's status === "running" (matches the UI's canonical isRunning predicate from tool-call-chip.tsx:261).

Why a subsequent USER message doesn't qualify

Only a later assistant message proves the LLM provider received the result and continued generating. A trailing user message could just be a queued send.

Pipeline placement

Runs after removeDuplicateTrailingAssistant so the dedup filter's pairwise result equality check sees the original (still undefined) values and can correctly identify the duplicate. If both duplicate trailing assistants carry the same dangling tools, dedup drops one and the remaining one becomes the last assistant — at which point this step conservatively skips it.

Tests

11 new cases under sanitizeDisplayMessages · repair dangling tool calls:

  • happy path (older assistant + later assistant → patched)
  • do not patch the last assistant (could still be streaming)
  • do not patch when only a subsequent USER message exists (no assistant proof)
  • patches across an intervening user message
  • leaves status: "completed" tool calls alone
  • leaves status: "error" tool calls alone
  • sibling tool calls on the same message (only the running one patched)
  • multiple older assistants in a row (all patched)
  • input messages + tool-call objects are not mutated
  • message identity preserved when nothing is dangling (COW guarantee at the element level)
  • empty array returns empty

Integration test strengthened to assert the patched tool's result contains "client-side data loss".

35/35 tests pass · 54 expect() calls · lint clean · audit clean · typecheck matches main baseline (10 pre-existing errors in unrelated files).

SHORT TERM until

The assistant backend reliably delivers tool_result SSE events (or the reconcile pass closes the gap by treating dangling tools as authoritative client-side state to repair against /v1/history).

…ssages

Occasionally a 'tool_result' SSE event is lost between the daemon and
the client (network drop, reconnect race, server-side fanout glitch).
The tool call stays 'status: running' forever in the client's
DisplayMessage[], even though the assistant clearly continued — there
is a subsequent assistant message in the transcript, which is only
possible if the LLM provider received the tool result on the server
side.

The render layer shows these stuck calls as a permanent spinner on an
older message bubble, which is misleading: the tool DID complete, the
client just never saw the result.

This adds 'repairDanglingToolCalls' as Hack #4 in the sanitize
pipeline. For every assistant message that is NOT the last assistant
in the transcript, any tool call with 'status: running' is rewritten
to:

  status:  'error'
  isError: true
  result:  '<SYNTHETIC_DANGLING_RESULT>'  (explains the client-side
                                           data loss for diagnosis)

Pipeline placement: AFTER 'removeDuplicateTrailingAssistant' so the
dedup filter's pairwise 'result' equality check sees the original
(still undefined) values and can correctly identify the duplicate.

Why the last assistant is never patched: its dangling tools might
still resolve via in-flight stream events.

Why a subsequent USER message doesn't qualify as proof: only a later
ASSISTANT message proves the LLM provider received the result and
continued. A trailing user message could just be a queued send.

Tests: 11 new cases under 'sanitizeDisplayMessages · repair dangling
tool calls' covering the happy path, last-assistant guard, user-only
trailing, sibling tool calls, multiple older assistants, identity
preservation, empty input. Integration test extended to assert the
patched tool call's result contains 'client-side data loss'.

35/35 tests pass. Lint + audit clean. Typecheck matches the main
baseline (10 pre-existing errors in unrelated files).
@dvargasfuertes dvargasfuertes merged commit 6ff557f into main May 25, 2026
7 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/sanitize-dangling-tool-calls branch May 25, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant