fix(web): sanitize — repair dangling tool calls on older assistant messages by vellum-apollo-bot[bot] · Pull Request #32018 · vellum-ai/vellum-assistant

vellum-apollo-bot · 2026-05-25T16:58:24Z

What

Adds Hack #4 — repairDanglingToolCalls to sanitizeDisplayMessages.

Occasionally a tool_result SSE event is lost between the daemon and the client (network drop, reconnect race, server-side fanout glitch). The tool call stays status: "running" forever in the client's DisplayMessage[], even though the assistant clearly continued — there is a subsequent assistant message in the transcript, which is only possible if the LLM provider received the tool result on the server side.

The render layer shows these stuck calls as a permanent spinner on an older message bubble — misleading: the tool DID complete, the client just never saw the result.

How

For every assistant message that is not the last assistant in the transcript, any tool call with status === "running" is rewritten via COW to:

status:  "error"
isError: true
result:  "Tool call completed on the server, but the result never reached the client. Subsequent assistant activity confirms the tool returned — this is a client-side data loss, not a tool failure."

Predicate (must ALL hold)

parent message is role: "assistant",
parent message is not the last assistant in the array (the last one could still be streaming — its dangling tools might legitimately resolve),
tool call's status === "running" (matches the UI's canonical isRunning predicate from tool-call-chip.tsx:261).

Why a subsequent USER message doesn't qualify

Only a later assistant message proves the LLM provider received the result and continued generating. A trailing user message could just be a queued send.

Pipeline placement

Runs after removeDuplicateTrailingAssistant so the dedup filter's pairwise result equality check sees the original (still undefined) values and can correctly identify the duplicate. If both duplicate trailing assistants carry the same dangling tools, dedup drops one and the remaining one becomes the last assistant — at which point this step conservatively skips it.

Tests

11 new cases under sanitizeDisplayMessages · repair dangling tool calls:

happy path (older assistant + later assistant → patched)
do not patch the last assistant (could still be streaming)
do not patch when only a subsequent USER message exists (no assistant proof)
patches across an intervening user message
leaves status: "completed" tool calls alone
leaves status: "error" tool calls alone
sibling tool calls on the same message (only the running one patched)
multiple older assistants in a row (all patched)
input messages + tool-call objects are not mutated
message identity preserved when nothing is dangling (COW guarantee at the element level)
empty array returns empty

Integration test strengthened to assert the patched tool's result contains "client-side data loss".

35/35 tests pass · 54 expect() calls · lint clean · audit clean · typecheck matches main baseline (10 pre-existing errors in unrelated files).

SHORT TERM until

The assistant backend reliably delivers tool_result SSE events (or the reconcile pass closes the gap by treating dangling tools as authoritative client-side state to repair against /v1/history).

…ssages Occasionally a 'tool_result' SSE event is lost between the daemon and the client (network drop, reconnect race, server-side fanout glitch). The tool call stays 'status: running' forever in the client's DisplayMessage[], even though the assistant clearly continued — there is a subsequent assistant message in the transcript, which is only possible if the LLM provider received the tool result on the server side. The render layer shows these stuck calls as a permanent spinner on an older message bubble, which is misleading: the tool DID complete, the client just never saw the result. This adds 'repairDanglingToolCalls' as Hack #4 in the sanitize pipeline. For every assistant message that is NOT the last assistant in the transcript, any tool call with 'status: running' is rewritten to: status: 'error' isError: true result: '<SYNTHETIC_DANGLING_RESULT>' (explains the client-side data loss for diagnosis) Pipeline placement: AFTER 'removeDuplicateTrailingAssistant' so the dedup filter's pairwise 'result' equality check sees the original (still undefined) values and can correctly identify the duplicate. Why the last assistant is never patched: its dangling tools might still resolve via in-flight stream events. Why a subsequent USER message doesn't qualify as proof: only a later ASSISTANT message proves the LLM provider received the result and continued. A trailing user message could just be a queued send. Tests: 11 new cases under 'sanitizeDisplayMessages · repair dangling tool calls' covering the happy path, last-assistant guard, user-only trailing, sibling tool calls, multiple older assistants, identity preservation, empty input. Integration test extended to assert the patched tool call's result contains 'client-side data loss'. 35/35 tests pass. Lint + audit clean. Typecheck matches the main baseline (10 pre-existing errors in unrelated files).

dvargasfuertes approved these changes May 25, 2026

View reviewed changes

dvargasfuertes merged commit 6ff557f into main May 25, 2026
7 checks passed

dvargasfuertes deleted the apollo/sanitize-dangling-tool-calls branch May 25, 2026 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(web): sanitize — repair dangling tool calls on older assistant messages#32018

fix(web): sanitize — repair dangling tool calls on older assistant messages#32018
dvargasfuertes merged 1 commit into
mainfrom
apollo/sanitize-dangling-tool-calls

vellum-apollo-bot Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vellum-apollo-bot Bot commented May 25, 2026

What

How

Predicate (must ALL hold)

Why a subsequent USER message doesn't qualify

Pipeline placement

Tests

SHORT TERM until

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant