fix(web): fail stale tool calls when the assistant restarts mid-execution by vellum-apollo-bot[bot] · Pull Request #32599 · vellum-ai/vellum-assistant

vellum-apollo-bot · 2026-05-29T19:21:50Z

Summary

Catch the case where the assistant daemon restarts (or silently dies) mid-tool-execution and never delivers the tool_result SSE for the call it was running. The web client's tool-call bubble would otherwise spin forever — Vargas's screenshot from May 29: a bash chip stuck at "Verifying no stale refs + running typecheck" across reload because the daemon went away before the tool result hit the wire.

This adds Hack #5 — fail stale tool calls to sanitizeDisplayMessages. Predicate:

status === "running"
startedAt is set
pendingConfirmation is null/undefined (a tool awaiting user approval is correctly stalled — its execution clock hasn't started)
now - max(startedAt, lastProgressAt ?? 0) > effectiveTimeoutMs + STALE_GRACE_MS

effectiveTimeoutMs prefers progressTimeoutSec (the daemon-advertised per-tool value), falling back to DEFAULT_TOOL_EXECUTION_TIMEOUT_SEC. That fallback constant is now exposed from @vellumai/assistant-api so backend enforcement and frontend detection reference the same wire-contract default.

STALE_GRACE_MS = 30_000 absorbs daemon-side delivery lag between the server-side timeout firing and the synthetic error result actually crossing the SSE.

How this differs from Hack #4

Hack #4 (repair dangling tool calls) only fires when a subsequent assistant message proves the tool completed server-side. Hack #5 is the timeout-based fallback for the case where no such proof ever arrives.

Stale-tool ticker

useMemo([messages]) only re-evaluates when messages change, so the silent-daemon-death case (no new messages) would never trigger the predicate. chat-route-content now bumps a staleToolNowMs state every 15s only while at least one tool call is running — outside that window the interval never starts, so idle conversations have zero extra renders.

Changes

assistant/src/api/constants/tool-execution.ts (new): canonical DEFAULT_TOOL_EXECUTION_TIMEOUT_SEC = 120.
assistant/src/api/index.ts: re-export.
assistant/src/config/schemas/timeouts.ts + assistant/src/tools/execution-timeout.ts: consume the constant in place of inline 120 literals.
apps/web/src/domains/chat/utils/sanitize-display-messages.ts: Hack Create top level setup.sh script #5 implementation. Optional nowMs parameter on sanitizeDisplayMessages for deterministic tests.
apps/web/src/domains/chat/utils/sanitize-display-messages.test.ts: 11 new tests.
apps/web/src/domains/chat/components/chat-route-content.tsx: 15-second stale-tool ticker, scoped to "while a tool is running".

Test plan

cd apps/web && bun test src/domains/chat/utils/sanitize-display-messages.test.ts — 46 pass (35 prior + 11 new).
cd assistant && bun test src/__tests__/tool-execute-pipeline.test.ts src/__tests__/recording-handler.test.ts src/__tests__/tool-execution-pipeline.benchmark.test.ts — all pass; default-timeout consumers still see 120s.
cd apps/web && bun run typecheck && bun run lint — clean.
cd assistant && bun run typecheck — clean.

Follow-up

The current synthetic message is purely client-side decoration. Server-side, the daemon should ideally persist an "in-flight tool" record so the next process can emit a synthetic tool_result on boot — at which point Hack #5 (and Hack #4) become removable. Tracked separately.

Expose DEFAULT_TOOL_EXECUTION_TIMEOUT_SEC from @vellumai/assistant-api and consume it in sanitizeDisplayMessages as a new Hack #5: any running tool call whose elapsed time exceeds the configured execution timeout (plus a small grace window) is rewritten to a synthetic client-side error. The case this catches: the assistant restarts mid-tool-execution and never delivers the tool_result SSE for the call it was running. The client retains status: "running" forever, leaving a permanent spinner on the last bubble across reloads. Hack #4 (repair dangling tool calls) only fires when a SUBSEQUENT assistant message proves the tool completed server-side. Hack #5 is the timeout-based fallback for the case where no such proof ever arrives. - assistant/src/api/constants/tool-execution.ts (new): canonical default constant. - assistant/src/api/index.ts: re-export. - assistant/src/config/schemas/timeouts.ts + assistant/src/tools/execution-timeout.ts: consume the constant in place of inline 120 literals. - apps/web/src/domains/chat/utils/sanitize-display-messages.ts: Hack #5 — failStaleToolCalls. Predicate gates on running status, startedAt presence, no pendingConfirmation, and now - max(startedAt, lastProgressAt) > effectiveTimeoutMs + STALE_GRACE_MS. effectiveTimeoutMs prefers progressTimeoutSec (the daemon-advertised per-tool value) and falls back to the canonical default. - Tests: 11 new cases covering threshold semantics, progress-based liveness, pending-confirmation safety, missing-startedAt safety, last-assistant inclusion, sibling preservation, and COW identity.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfebba4c2c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T19:23:28Z

+    tc.progressTimeoutSec && tc.progressTimeoutSec > 0
+      ? tc.progressTimeoutSec
+      : DEFAULT_TOOL_EXECUTION_TIMEOUT_SEC;


Use the effective timeout before failing tools

For any deployment with timeouts.toolExecutionTimeoutSec raised above 120s, or for bash/host_bash calls that pass timeout_seconds above 120s (the executor allows shell calls up to shellMaxTimeoutSec, default 600s, in assistant/src/tools/executor.ts), this fallback still marks the chip as failed after only the hard-coded default plus grace. I only found tool_progress parsed/handled on the web side, not emitted by the assistant, so many legitimate long-running calls will have no progressTimeoutSec and will be shown as a client-side timeout while the assistant can still be executing and may later return a real result.

Useful? React with 👍 / 👎.

Confirmed the analysis — searched both directions: grep -rn '"tool_progress"' assistant/ returns nothing; only the web side parses the event. So today the progressTimeoutSec branch is dead code in the deployed system and the 120s+grace fallback is the only path.

Impact:

Default-config deployments (toolExecutionTimeoutSec = 120, bash without an explicit timeout_seconds): predicate is correct.

Raised toolExecutionTimeoutSec server-side: false positives.

bash / host_bash with input.timeout_seconds > 120s (executor clamps to shellMaxTimeoutSec, default 600s — see executor.ts:508-516): false positives.

Proposed follow-up (separate PR so this lands clean): add effectiveTimeoutSec?: number to ToolUseStartEvent, populated on the daemon side from computePerToolTimeoutMs. The web stream-message-updater stamps it onto the tool call alongside startedAt. The sanitizer then prefers this value over the hard-coded fallback, and the predicate becomes correct regardless of server config or per-call timeout_seconds.

The current PR is still strictly better than the status quo (infinite spinner → user reload) for the default-config case, which is what Vargas's screenshot was. Holding the wire-contract change for the follow-up unless he wants it folded in here.

dvargasfuertes · 2026-05-29T19:35:22Z

+  const hasRunningTool = useMemo(
+    () =>
+      messages.some(
+        (m) => m.toolCalls?.some((tc) => tc.status === "running") ?? false,
+      ),
+    [messages],
+  );
+  const [staleToolNowMs, setStaleToolNowMs] = useState(() => Date.now());
+  useEffect(() => {
+    if (!hasRunningTool) return;
+    const STALE_TOOL_TICK_MS = 15_000;
+    const id = setInterval(
+      () => setStaleToolNowMs(Date.now()),
+      STALE_TOOL_TICK_MS,
+    );
+    return () => clearInterval(id);
+  }, [hasRunningTool]);


I don't expect this to be here, I'm fine with this being just in sanitizeDisplayMessages. I recognize we don't catch the case of the exact moment where we go from not timeout to timeout

Done — force-pushed c7bfd87082 and the ticker block is gone. The sanitizer will re-evaluate on the next messages change (next tool call, next assistant text, page reload), which covers the practical case of the user coming back to a stuck conversation. Accepting the at-transition miss as documented.

dvargasfuertes · 2026-05-29T19:52:07Z

+
 export function sanitizeDisplayMessages(
  messages: DisplayMessage[],
+  options: SanitizeDisplayMessagesOptions = {},


delete these options here

Done in 855d0fc45e — dropped the SanitizeDisplayMessagesOptions interface and the second parameter. sanitizeDisplayMessages is back to a one-arg signature reading Date.now() inline. Tests pin the clock via a mockNow(...) helper backed by spyOn(Date, "now"), restored in afterEach. All 46 tests pass.

dvargasfuertes · 2026-05-29T19:52:18Z

    removeInvalidMessages,
    removeDuplicateTrailingAssistant,
    repairDanglingToolCalls,
+    (msgs) => failStaleToolCalls(msgs, nowMs),


and here, can call the now() within this method

Same commit — Date.now() is now called directly inside sanitizeDisplayMessages at the pipeline boundary, no options pass-through.

…tests Per review feedback: drop the SanitizeDisplayMessagesOptions interface and the optional second parameter on sanitizeDisplayMessages. Read Date.now() directly at the pipeline boundary instead. Tests now pin the clock via bun:test's `spyOn(Date, "now")` through a local `mockNow(...)` helper and restore in afterEach. Behavior is identical to the previous param-injection approach but the production signature stays one-arg.

dvargasfuertes · 2026-05-29T20:43:19Z

+
+function failStaleToolCalls(
+  messages: DisplayMessage[],
+  nowMs: number,


you didn't remove this arg and inline'd it

You're right, I half-did it last round. Fixed in 1577edfa29 — nowMs is gone from failStaleToolCalls' signature, Date.now() is called at the top of that function, and the pipeline now references failStaleToolCalls directly instead of wrapping it in a closure. sanitizeDisplayMessages no longer touches the wall clock at all.

Round 2 of the same feedback: don't thread `nowMs` from the pipeline. `failStaleToolCalls` now reads `Date.now()` at its own top, the pipeline can use `failStaleToolCalls` as a plain reference instead of a closure wrapper, and `sanitizeDisplayMessages` doesn't know or care about the wall clock.

vellum-apollo-bot Bot requested a review from siddseethepalli as a code owner May 29, 2026 19:21

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

dvargasfuertes requested changes May 29, 2026

View reviewed changes

vellum-apollo-bot Bot force-pushed the apollo/fail-stale-tool-calls branch from bfebba4 to c7bfd87 Compare May 29, 2026 19:43

dvargasfuertes requested changes May 29, 2026

View reviewed changes

dvargasfuertes reviewed May 29, 2026

View reviewed changes

dvargasfuertes approved these changes May 29, 2026

View reviewed changes

dvargasfuertes merged commit 381dcb3 into main May 29, 2026
18 checks passed

dvargasfuertes deleted the apollo/fail-stale-tool-calls branch May 29, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(web): fail stale tool calls when the assistant restarts mid-execution#32599

fix(web): fail stale tool calls when the assistant restarts mid-execution#32599
dvargasfuertes merged 3 commits into
mainfrom
apollo/fail-stale-tool-calls

vellum-apollo-bot Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Uh oh!

dvargasfuertes May 29, 2026

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Uh oh!

dvargasfuertes May 29, 2026

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Uh oh!

dvargasfuertes May 29, 2026

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Uh oh!

dvargasfuertes May 29, 2026

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vellum-apollo-bot Bot commented May 29, 2026

Summary

How this differs from Hack #4

Stale-tool ticker

Changes

Test plan

Follow-up

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dvargasfuertes May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dvargasfuertes May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dvargasfuertes May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dvargasfuertes May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vellum-apollo-bot Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant