fix(assistant): include messageId on canned-response message_complete events (LUM-1902)#32143
Conversation
| broadcastMessage({ | ||
| type: "message_complete", | ||
| conversationId, | ||
| messageId: persistedAssistant.id, |
There was a problem hiding this comment.
🚩 Same missing-messageId pattern exists in other code paths
The PR fixes the missing messageId on message_complete for the canned greeting path, but the same pattern exists in at least three other code paths in this file:
- Slash command path (
conversation-routes.ts:1782-1785):broadcastMessage({ type: "message_complete", conversationId })— nomessageId, andconversation.processingis stilltruewhen it fires (set tofalseat line 1787). TheaddMessagereturn at line 1727 is also not captured. - Compact command path (
conversation-routes.ts:1863): same pattern, processing istrueuntil line 1878. - Clean command path (
conversation-routes.ts:1943): same pattern.
Per the PR's own comment, the macOS client filters message_complete events lacking a messageId while a turn is in flight. These paths all have conversation.processing = true when the event fires, so they could exhibit the same stuck-indicator behavior on macOS. Worth a follow-up to apply the same fix consistently.
Was this helpful? React with 👍 or 👎 to provide feedback.
… (LUM-1902) The canned first-greeting path persisted the assistant row via addMessage() but discarded the returned id, so the message_complete broadcast lacked messageId. The macOS client filter at ChatActionHandler.swift:507 treats message_complete events lacking messageId as aux-style notifications while a turn is in flight and early-returns, so isSending never cleared and the 3-dot loading indicator stayed visible until the 60s watchdog kicked in. Capture the persisted assistant row id and pass it through. Architectural follow-up (other emission sites with the same bug shape, macOS filter cleanup) tracked in LUM-1904.
…-1902) Same bug shape as the canned greeting: the assistant row is persisted via addMessage() but the returned id is discarded, so the message_complete broadcast goes out without messageId and the macOS guard at ChatActionHandler.swift:507 drops the event whenever the streaming-buffer 50ms flush has fired between the delta and the complete — leaving the user stuck for the full 60s watchdog. Patches the same five paths #31994 will eventually subsume: - inline approval reply (conversation-routes.ts:422) - canned first greeting (conversation-routes.ts:1451) - slash command output (:1774) - /compact (:1855) - /clean (:1935) Centralized into a single `emitCannedMessageComplete` helper so the temporary fix is one helper + five one-line callers, easy to grep and inline-then-delete when #31994 lands. Helper carries the full LUM-1902 context comment so individual call sites stay tidy. The wake-target adapter (wake-target-adapter.ts:130) has the same bug shape but isn't a quick fix — AgentEvent.message_complete carries no messageId at the point of relay, so it needs the pre-allocated anchor treatment matching #31994's approach. Tracked in LUM-1904.
Linear ticket ids are internal references and don't help open-source contributors reading this file. The PR reference (#31994) stays since it's discoverable from the repo.
8f06e1b to
106f983
Compare
There was a problem hiding this comment.
✦ APPROVE
Value: Fixes LUM-1902 — the 3-dot stuck indicator that every new user hits on their first hatch. The race is real and the fix covers all five affected paths in one pass.
What this does: Five message_complete events emitted by canned-response paths (bypass the agent loop, so they never got an assistantTurnId) were going out without messageId. The macOS ChatActionHandler.swift:507 filter drops events that arrive after the 50ms streaming-buffer flush window — when daemon onboarding work between events crosses that threshold, isSending latches for the full 60s watchdog. Adding messageId: persistedAssistant.id gives the filter what it needs to resolve the turn.
Code review
Helper design is solid — one function, N one-line callers, grep anchor for when #31994 lands. The comment is thorough: explains the root cause, names the five sites, and gives a concrete removal instruction. Exactly the right shape for a temporary patch.
Each call site correctly captures addMessage's return value:
tryConsumeCanonicalGuardianReply(:422) ✅- canned first greeting (
:1451) ✅ - slash command output (
:1774) ✅ /compact(:1855) ✅/clean(:1855) ✅
Type Check ✅ confirms persistedAssistant.id is type-safe across all five sites — addMessage returns a typed row with .id: string, and each capture is inside the success path (the catch block handles any throw from addMessage itself). No null-dereference risk.
No anti-pattern hits — daemon TypeScript, no React/Zustand/hooks patterns in scope.
CI: Type Check ✅, FlexFrame ✅, OpenAPI ✅. Test + Lint still running at review time — no red flags in the diff that would cause either to fail.
Devin's review is on pre-rebase commit e42419cb63 (no longer in the branch history) — not applicable to HEAD. Consider triggering @devin review this PR + @codex review on the current HEAD for a clean second approval.
Merge gate: CI finishing + second bot approval needed.
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
* fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114) PlanCatalogEntry.price_cents was a required Int, but the platform's /v1/organizations/billing/plans/ Pro entry no longer emits price_cents (it moved to base_price_cents + machine_tiers/storage_tiers). The whole catalog decode hard-failed on the Pro entry, was swallowed by try? in SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to load plan information.' on the Plan card. Make price_cents optional (PlanCard only reads id/name/included_features, so no display impact) and update the wire-protocol test to the real tiered Pro payload so this drift can't recur silently. * fix(skills): remove broken document-writer skill, enhance document-editor (#32151) * fix(skills): remove broken document-writer skill, enhance document-editor The managed document-writer skill had no TOOLS.json and a broken include ("document" instead of "document-editor"), so skill_execute could never find document_create in the registry — causing "Unknown tool" errors on staging. Delete document-writer and fold its useful anti-pattern guidance into the bundled document-editor skill which already owns the TOOLS.json manifest. Closes JARVIS-961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skills): remove document-writer from catalog.json Stale catalog entry would cause autoInstallFromCatalog to try reinstalling the deleted skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate catalog.json with correct meet-join timestamp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143) * fix(assistant): include messageId on canned-greeting message_complete (LUM-1902) The canned first-greeting path persisted the assistant row via addMessage() but discarded the returned id, so the message_complete broadcast lacked messageId. The macOS client filter at ChatActionHandler.swift:507 treats message_complete events lacking messageId as aux-style notifications while a turn is in flight and early-returns, so isSending never cleared and the 3-dot loading indicator stayed visible until the 60s watchdog kicked in. Capture the persisted assistant row id and pass it through. Architectural follow-up (other emission sites with the same bug shape, macOS filter cleanup) tracked in LUM-1904. * fix(assistant): apply messageId fix to all canned-response paths (LUM-1902) Same bug shape as the canned greeting: the assistant row is persisted via addMessage() but the returned id is discarded, so the message_complete broadcast goes out without messageId and the macOS guard at ChatActionHandler.swift:507 drops the event whenever the streaming-buffer 50ms flush has fired between the delta and the complete — leaving the user stuck for the full 60s watchdog. Patches the same five paths #31994 will eventually subsume: - inline approval reply (conversation-routes.ts:422) - canned first greeting (conversation-routes.ts:1451) - slash command output (:1774) - /compact (:1855) - /clean (:1935) Centralized into a single `emitCannedMessageComplete` helper so the temporary fix is one helper + five one-line callers, easy to grep and inline-then-delete when #31994 lands. Helper carries the full LUM-1902 context comment so individual call sites stay tidy. The wake-target adapter (wake-target-adapter.ts:130) has the same bug shape but isn't a quick fix — AgentEvent.message_complete carries no messageId at the point of relay, so it needs the pre-allocated anchor treatment matching #31994's approach. Tracked in LUM-1904. * chore(assistant): scrub internal ticket reference from helper comment Linear ticket ids are internal references and don't help open-source contributors reading this file. The PR reference (#31994) stays since it's discoverable from the repo. --------- Co-authored-by: Claude <noreply@anthropic.com> * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095) * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup Replace linear-scan firstIndex(where:) with a pre-built [String: Int] dictionary in handleConversationListResponse and appendConversations. With ~1800 conversations (post-pagination PR #31924), the old O(n²) pattern performed ~3.24M string comparisons on @mainactor, blocking the main thread for ~1.6s and triggering AppHang reports (LUM-1901). The dictionary reduces this to ~3600 hash lookups — effectively O(n). Also removes dead code: a redundant snapshot.first(where:) that searched for a conversation already proven absent by the preceding firstIndex check. Closes LUM-1901 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: keep dictionary in sync when appending new conversations The old firstIndex(where:) searched the mutated snapshot (including just-appended items), so duplicate IDs in a single response would match and update in-place. The dictionary must be kept in sync after each append to preserve this behavior. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(assistant): clear processing flag if /clean user persist fails (#32115) Cherry-pick of a535818 — wraps the user-message persist inside the outer try/finally so a throw from addMessage still clears processing and drains the queue. Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(assistant): clear processing flag if /compact initial persist fails (#32128) The /compact slash branch set conversation.processing = true and then awaited the initial user-message addMessage OUTSIDE any guard. The fire-and-forget compaction IIFE owns the try/finally that resets the flag, but a throw from that initial persist (transient SQLite/disk error) never reaches it, leaving the conversation stuck in queued mode indefinitely. This is the same class of bug fixed for /clean in #32115. An outer try/finally (as used by /clean) is wrong here because compact returns 202 immediately and runs async, so it would clear the flag before compaction finished. Instead, guard just the synchronous pre-202 persist: on failure reset processing, drain the queue, and rethrow so the caller still surfaces the error. Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai>
* Release v0.8.5 * Cherry-pick fixes onto release/v0.8.5 (#32159) * fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114) PlanCatalogEntry.price_cents was a required Int, but the platform's /v1/organizations/billing/plans/ Pro entry no longer emits price_cents (it moved to base_price_cents + machine_tiers/storage_tiers). The whole catalog decode hard-failed on the Pro entry, was swallowed by try? in SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to load plan information.' on the Plan card. Make price_cents optional (PlanCard only reads id/name/included_features, so no display impact) and update the wire-protocol test to the real tiered Pro payload so this drift can't recur silently. * fix(skills): remove broken document-writer skill, enhance document-editor (#32151) * fix(skills): remove broken document-writer skill, enhance document-editor The managed document-writer skill had no TOOLS.json and a broken include ("document" instead of "document-editor"), so skill_execute could never find document_create in the registry — causing "Unknown tool" errors on staging. Delete document-writer and fold its useful anti-pattern guidance into the bundled document-editor skill which already owns the TOOLS.json manifest. Closes JARVIS-961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skills): remove document-writer from catalog.json Stale catalog entry would cause autoInstallFromCatalog to try reinstalling the deleted skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate catalog.json with correct meet-join timestamp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143) * fix(assistant): include messageId on canned-greeting message_complete (LUM-1902) The canned first-greeting path persisted the assistant row via addMessage() but discarded the returned id, so the message_complete broadcast lacked messageId. The macOS client filter at ChatActionHandler.swift:507 treats message_complete events lacking messageId as aux-style notifications while a turn is in flight and early-returns, so isSending never cleared and the 3-dot loading indicator stayed visible until the 60s watchdog kicked in. Capture the persisted assistant row id and pass it through. Architectural follow-up (other emission sites with the same bug shape, macOS filter cleanup) tracked in LUM-1904. * fix(assistant): apply messageId fix to all canned-response paths (LUM-1902) Same bug shape as the canned greeting: the assistant row is persisted via addMessage() but the returned id is discarded, so the message_complete broadcast goes out without messageId and the macOS guard at ChatActionHandler.swift:507 drops the event whenever the streaming-buffer 50ms flush has fired between the delta and the complete — leaving the user stuck for the full 60s watchdog. Patches the same five paths #31994 will eventually subsume: - inline approval reply (conversation-routes.ts:422) - canned first greeting (conversation-routes.ts:1451) - slash command output (:1774) - /compact (:1855) - /clean (:1935) Centralized into a single `emitCannedMessageComplete` helper so the temporary fix is one helper + five one-line callers, easy to grep and inline-then-delete when #31994 lands. Helper carries the full LUM-1902 context comment so individual call sites stay tidy. The wake-target adapter (wake-target-adapter.ts:130) has the same bug shape but isn't a quick fix — AgentEvent.message_complete carries no messageId at the point of relay, so it needs the pre-allocated anchor treatment matching #31994's approach. Tracked in LUM-1904. * chore(assistant): scrub internal ticket reference from helper comment Linear ticket ids are internal references and don't help open-source contributors reading this file. The PR reference (#31994) stays since it's discoverable from the repo. --------- Co-authored-by: Claude <noreply@anthropic.com> * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095) * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup Replace linear-scan firstIndex(where:) with a pre-built [String: Int] dictionary in handleConversationListResponse and appendConversations. With ~1800 conversations (post-pagination PR #31924), the old O(n²) pattern performed ~3.24M string comparisons on @mainactor, blocking the main thread for ~1.6s and triggering AppHang reports (LUM-1901). The dictionary reduces this to ~3600 hash lookups — effectively O(n). Also removes dead code: a redundant snapshot.first(where:) that searched for a conversation already proven absent by the preceding firstIndex check. Closes LUM-1901 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: keep dictionary in sync when appending new conversations The old firstIndex(where:) searched the mutated snapshot (including just-appended items), so duplicate IDs in a single response would match and update in-place. The dictionary must be kept in sync after each append to preserve this behavior. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(assistant): clear processing flag if /clean user persist fails (#32115) Cherry-pick of a535818 — wraps the user-message persist inside the outer try/finally so a throw from addMessage still clears processing and drains the queue. Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(assistant): clear processing flag if /compact initial persist fails (#32128) The /compact slash branch set conversation.processing = true and then awaited the initial user-message addMessage OUTSIDE any guard. The fire-and-forget compaction IIFE owns the try/finally that resets the flag, but a throw from that initial persist (transient SQLite/disk error) never reaches it, leaving the conversation stuck in queued mode indefinitely. This is the same class of bug fixed for /clean in #32115. An outer try/finally (as used by /clean) is wrong here because compact returns 202 immediately and runs async, so it would clear the flag before compaction finished. Instead, guard just the synchronous pre-202 persist: on failure reset processing, drain the queue, and rethrow so the caller still surfaces the error. Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Summary
Quick fix for LUM-1902. The reported symptom (3-dot loading indicator stuck after initial hatch) is one instance of a broader class — five paths in
conversation-routes.tsthat bypass the agent loop and emitmessage_completewithoutmessageId, hitting a race with the macOS streaming-buffer flush::1451) — the reported case, hit by every new user on hatch:422):1774)/compact(:1855)/clean(:1855)All five capture the persisted assistant row id and pass
messageIdto the broadcast. Centralized into a singleemitCannedMessageCompletehelper so the temporary patch is one helper + five one-line call sites, easy to grep and remove when #31994 lands (see "Relationship to #31994" below).Why it's stuck — actual causal chain (race, not deterministic)
The bug only manifests when SSE events are delivered far enough apart for a 50 ms timer to fire between them — heavier daemon work between events (onboarding artifact persistence, telemetry,
forceCompact(),forceClean(), etc.) reliably hits that window.isSending = true→ 3-dot indicator renders.202, then broadcastsuser_message_echo→assistant_text_delta→message_complete.handleAssistantTextDelta(ChatActionHandler.swift:463):isThinking = false(line 484)streamingDeltaBufferand callsscheduleStreamingFlush()— a 50 ms timer (ChatViewModel.swift:916)currentAssistantMessageIdstays nil until the flush fireshandleMessageCompletearrives. Filter atChatActionHandler.swift:507:currentAssistantMessageId == nilandisThinking == false→ guard's second clause is false → guard does not fire → handler proceeds. ✓currentAssistantMessageIdis set → guard fires → terminal cleanup is skipped →isSendingstays true until the 60-s watchdog atChatViewModel.swift:328. ✗No recovery on the canned paths: the normal agent loop emits
assistant_activity_state{phase:idle}which runs its own cleanup and schedules a 5-sscheduleIdleFallbackCleanup()(ChatActionHandler.swift:1583). The canned paths emit neither, so the 60-s watchdog is the only safety net.Populating
messageIdflips the guard's first clause to false regardless of timing, the guard no longer fires,isSendingclears.Scope decisions
Why include the other four sites: Same bug class, same trivial 2-line shape, and
/compact's "everything froze right after I compacted a long chat" UX is particularly bad. Inline approval, slash,/cleanare lower-frequency but identical-cost fixes. All five live in the same file, so we end up with one helper + five callers.Why a single helper: Per [reviewer guidance], temporary fixes should be confined to easy-to-rip-out surfaces.
emitCannedMessageCompletecarries the full LUM-1902 context comment in one place; each call site is a one-liner. When #31994 lands,grep emitCannedMessageCompletefinds every site, and the inline-then-delete is mechanical.Why not the wake-target adapter: Same bug shape but not a quick fix.
AgentEvent.message_complete(agent/loop.ts:121-124) carries{ type, message }— there's no persisted id to forward at the point of relay. Needs the pre-allocated anchor pattern (same approach #31994 took for the loop path, explicitly deferred for wakes viaTODO(PR-2b-followup)). Tracked in LUM-1904.Relationship to PR #31994 (apollo workstream)
#31994 (draft) "assistant,web: pre-allocate assistant turn anchor id + stamp SSE events (PR 2b)" pre-allocates an
assistantTurnIdat agent-loop entry and stamps every assistant-channel event (includingmessage_complete) with that anchor id. Its scope explicitly includes the same fiveconversation-routes.tsemit sites.When #31994 lands it subsumes this PR entirely. The diff will conflict at the five call sites; resolution is mechanical (delete the helper + each
emitCannedMessageComplete(...)call, take their stampedstate.assistantTurnIdversion). I posted a heads-up on #31994.This PR exists because #31994 is still in draft and LUM-1902 is hitting every new user on hatch — we need something we can ship on the next daemon roll.
Web impact: none
The web client's
handleMessageComplete(apps/web/src/domains/chat/utils/stream-handlers/message-handlers.ts:86) has nomessageId-based filter — it always callscompleteTurn(). AddingmessageIdto these events is a no-op for web:finalizeMessageComplete(stream-message-updaters.ts:173-210) keepstail.idwhen an assistant bubble is already streaming, soevent.messageIdis unused on these paths.Recent area history checked
fix(web): isSending honors activeTurnId so thinking indicator survives stale terminal events— adjacent web turn-store fix, doesn't touch server or macOS.ChatActionHandler.swift; carried the filter forward unchanged.isSendingtocurrentAssistantMessageId; fix: extend message_complete guard to cover thinking phase #23145 extended it to cover the thinking phase; fix(macos): filter aux message_complete from streaming guard, not just sound gate #25539 added thesource == "aux"clause. ThemessageId == nilheuristic is a load-bearing artifact from that evolution.Risk
MessageComplete.messageId: String?) and TS schemas already modelmessageId?. Zero compatibility risk.Out of scope
wake-target-adapter.ts:130) — needs anchor-id pattern, not a quick fix. Tracked in LUM-1904.messageId == nilheuristic from the macOS Swift filter — tracked in LUM-1904 (becomes moot post Electron cutover feat(macos): scaffold Electron desktop wrapper for apps/web #32081).Test plan
/compacton a long chat → indicator clears immediately on completion.bunx tsc --noEmitpasses forassistant/.https://claude.ai/code/session_01TtNewUqyRWwaE9eRo1QMZk