fix(assistant): always clear processing flag if /clean user persist fails (address Codex review on #31613) by siddseethepalli · Pull Request #32115 · vellum-ai/vellum-assistant

siddseethepalli · 2026-05-26T18:00:56Z

Summary

The /clean slash-command handler in conversation-routes.ts set conversation.processing = true and then persisted the user message via addMessage(...) outside the try/finally that clears the flag and drains the queue. If that persist threw (transient DB/disk failure), the function exited early with processing still true, leaving every subsequent send for that conversation stuck in queued mode indefinitely.

This wraps the entire /clean branch body in an outer try/finally so conversation.processing is always reset and the queue always drained on every failure path — including a throw from the initial user-message persist. The inner try/catch that broadcasts forceClean errors is preserved unchanged. Behavior on the success path is identical; an initial-persist throw still propagates (consistent with the regular send path) but no longer leaks the flag.

Addresses Codex review on #31613.

Test plan

bunx tsc --noEmit clean
bun test src/__tests__/conversation-routes-slash-commands.test.ts — 7 pass

…ls (#32128) The /compact slash branch set conversation.processing = true and then awaited the initial user-message addMessage OUTSIDE any guard. The fire-and-forget compaction IIFE owns the try/finally that resets the flag, but a throw from that initial persist (transient SQLite/disk error) never reaches it, leaving the conversation stuck in queued mode indefinitely. This is the same class of bug fixed for /clean in #32115. An outer try/finally (as used by /clean) is wrong here because compact returns 202 immediately and runs async, so it would clear the flag before compaction finished. Instead, guard just the synchronous pre-202 persist: on failure reset processing, drain the queue, and rethrow so the caller still surfaces the error. Co-authored-by: Vellum Assistant <assistant@vellum.ai>

@mainactor

* fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114) PlanCatalogEntry.price_cents was a required Int, but the platform's /v1/organizations/billing/plans/ Pro entry no longer emits price_cents (it moved to base_price_cents + machine_tiers/storage_tiers). The whole catalog decode hard-failed on the Pro entry, was swallowed by try? in SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to load plan information.' on the Plan card. Make price_cents optional (PlanCard only reads id/name/included_features, so no display impact) and update the wire-protocol test to the real tiered Pro payload so this drift can't recur silently. * fix(skills): remove broken document-writer skill, enhance document-editor (#32151) * fix(skills): remove broken document-writer skill, enhance document-editor The managed document-writer skill had no TOOLS.json and a broken include ("document" instead of "document-editor"), so skill_execute could never find document_create in the registry — causing "Unknown tool" errors on staging. Delete document-writer and fold its useful anti-pattern guidance into the bundled document-editor skill which already owns the TOOLS.json manifest. Closes JARVIS-961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skills): remove document-writer from catalog.json Stale catalog entry would cause autoInstallFromCatalog to try reinstalling the deleted skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate catalog.json with correct meet-join timestamp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143) * fix(assistant): include messageId on canned-greeting message_complete (LUM-1902) The canned first-greeting path persisted the assistant row via addMessage() but discarded the returned id, so the message_complete broadcast lacked messageId. The macOS client filter at ChatActionHandler.swift:507 treats message_complete events lacking messageId as aux-style notifications while a turn is in flight and early-returns, so isSending never cleared and the 3-dot loading indicator stayed visible until the 60s watchdog kicked in. Capture the persisted assistant row id and pass it through. Architectural follow-up (other emission sites with the same bug shape, macOS filter cleanup) tracked in LUM-1904. * fix(assistant): apply messageId fix to all canned-response paths (LUM-1902) Same bug shape as the canned greeting: the assistant row is persisted via addMessage() but the returned id is discarded, so the message_complete broadcast goes out without messageId and the macOS guard at ChatActionHandler.swift:507 drops the event whenever the streaming-buffer 50ms flush has fired between the delta and the complete — leaving the user stuck for the full 60s watchdog. Patches the same five paths #31994 will eventually subsume: - inline approval reply (conversation-routes.ts:422) - canned first greeting (conversation-routes.ts:1451) - slash command output (:1774) - /compact (:1855) - /clean (:1935) Centralized into a single `emitCannedMessageComplete` helper so the temporary fix is one helper + five one-line callers, easy to grep and inline-then-delete when #31994 lands. Helper carries the full LUM-1902 context comment so individual call sites stay tidy. The wake-target adapter (wake-target-adapter.ts:130) has the same bug shape but isn't a quick fix — AgentEvent.message_complete carries no messageId at the point of relay, so it needs the pre-allocated anchor treatment matching #31994's approach. Tracked in LUM-1904. * chore(assistant): scrub internal ticket reference from helper comment Linear ticket ids are internal references and don't help open-source contributors reading this file. The PR reference (#31994) stays since it's discoverable from the repo. --------- Co-authored-by: Claude <noreply@anthropic.com> * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095) * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup Replace linear-scan firstIndex(where:) with a pre-built [String: Int] dictionary in handleConversationListResponse and appendConversations. With ~1800 conversations (post-pagination PR #31924), the old O(n²) pattern performed ~3.24M string comparisons on @mainactor, blocking the main thread for ~1.6s and triggering AppHang reports (LUM-1901). The dictionary reduces this to ~3600 hash lookups — effectively O(n). Also removes dead code: a redundant snapshot.first(where:) that searched for a conversation already proven absent by the preceding firstIndex check. Closes LUM-1901 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: keep dictionary in sync when appending new conversations The old firstIndex(where:) searched the mutated snapshot (including just-appended items), so duplicate IDs in a single response would match and update in-place. The dictionary must be kept in sync after each append to preserve this behavior. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(assistant): clear processing flag if /clean user persist fails (#32115) Cherry-pick of a535818 — wraps the user-message persist inside the outer try/finally so a throw from addMessage still clears processing and drains the queue. Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(assistant): clear processing flag if /compact initial persist fails (#32128) The /compact slash branch set conversation.processing = true and then awaited the initial user-message addMessage OUTSIDE any guard. The fire-and-forget compaction IIFE owns the try/finally that resets the flag, but a throw from that initial persist (transient SQLite/disk error) never reaches it, leaving the conversation stuck in queued mode indefinitely. This is the same class of bug fixed for /clean in #32115. An outer try/finally (as used by /clean) is wrong here because compact returns 202 immediately and runs async, so it would clear the flag before compaction finished. Instead, guard just the synchronous pre-202 persist: on failure reset processing, drain the queue, and rethrow so the caller still surfaces the error. Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai>

@mainactor

* Release v0.8.5 * Cherry-pick fixes onto release/v0.8.5 (#32159) * fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114) PlanCatalogEntry.price_cents was a required Int, but the platform's /v1/organizations/billing/plans/ Pro entry no longer emits price_cents (it moved to base_price_cents + machine_tiers/storage_tiers). The whole catalog decode hard-failed on the Pro entry, was swallowed by try? in SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to load plan information.' on the Plan card. Make price_cents optional (PlanCard only reads id/name/included_features, so no display impact) and update the wire-protocol test to the real tiered Pro payload so this drift can't recur silently. * fix(skills): remove broken document-writer skill, enhance document-editor (#32151) * fix(skills): remove broken document-writer skill, enhance document-editor The managed document-writer skill had no TOOLS.json and a broken include ("document" instead of "document-editor"), so skill_execute could never find document_create in the registry — causing "Unknown tool" errors on staging. Delete document-writer and fold its useful anti-pattern guidance into the bundled document-editor skill which already owns the TOOLS.json manifest. Closes JARVIS-961 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skills): remove document-writer from catalog.json Stale catalog entry would cause autoInstallFromCatalog to try reinstalling the deleted skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate catalog.json with correct meet-join timestamp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143) * fix(assistant): include messageId on canned-greeting message_complete (LUM-1902) The canned first-greeting path persisted the assistant row via addMessage() but discarded the returned id, so the message_complete broadcast lacked messageId. The macOS client filter at ChatActionHandler.swift:507 treats message_complete events lacking messageId as aux-style notifications while a turn is in flight and early-returns, so isSending never cleared and the 3-dot loading indicator stayed visible until the 60s watchdog kicked in. Capture the persisted assistant row id and pass it through. Architectural follow-up (other emission sites with the same bug shape, macOS filter cleanup) tracked in LUM-1904. * fix(assistant): apply messageId fix to all canned-response paths (LUM-1902) Same bug shape as the canned greeting: the assistant row is persisted via addMessage() but the returned id is discarded, so the message_complete broadcast goes out without messageId and the macOS guard at ChatActionHandler.swift:507 drops the event whenever the streaming-buffer 50ms flush has fired between the delta and the complete — leaving the user stuck for the full 60s watchdog. Patches the same five paths #31994 will eventually subsume: - inline approval reply (conversation-routes.ts:422) - canned first greeting (conversation-routes.ts:1451) - slash command output (:1774) - /compact (:1855) - /clean (:1935) Centralized into a single `emitCannedMessageComplete` helper so the temporary fix is one helper + five one-line callers, easy to grep and inline-then-delete when #31994 lands. Helper carries the full LUM-1902 context comment so individual call sites stay tidy. The wake-target adapter (wake-target-adapter.ts:130) has the same bug shape but isn't a quick fix — AgentEvent.message_complete carries no messageId at the point of relay, so it needs the pre-allocated anchor treatment matching #31994's approach. Tracked in LUM-1904. * chore(assistant): scrub internal ticket reference from helper comment Linear ticket ids are internal references and don't help open-source contributors reading this file. The PR reference (#31994) stays since it's discoverable from the repo. --------- Co-authored-by: Claude <noreply@anthropic.com> * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095) * perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup Replace linear-scan firstIndex(where:) with a pre-built [String: Int] dictionary in handleConversationListResponse and appendConversations. With ~1800 conversations (post-pagination PR #31924), the old O(n²) pattern performed ~3.24M string comparisons on @mainactor, blocking the main thread for ~1.6s and triggering AppHang reports (LUM-1901). The dictionary reduces this to ~3600 hash lookups — effectively O(n). Also removes dead code: a redundant snapshot.first(where:) that searched for a conversation already proven absent by the preceding firstIndex check. Closes LUM-1901 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: keep dictionary in sync when appending new conversations The old firstIndex(where:) searched the mutated snapshot (including just-appended items), so duplicate IDs in a single response would match and update in-place. The dictionary must be kept in sync after each append to preserve this behavior. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(assistant): clear processing flag if /clean user persist fails (#32115) Cherry-pick of a535818 — wraps the user-message persist inside the outer try/finally so a throw from addMessage still clears processing and drains the queue. Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(assistant): clear processing flag if /compact initial persist fails (#32128) The /compact slash branch set conversation.processing = true and then awaited the initial user-message addMessage OUTSIDE any guard. The fire-and-forget compaction IIFE owns the try/finally that resets the flag, but a throw from that initial persist (transient SQLite/disk error) never reaches it, leaving the conversation stuck in queued mode indefinitely. This is the same class of bug fixed for /clean in #32115. An outer try/finally (as used by /clean) is wrong here because compact returns 202 immediately and runs async, so it would clear the flag before compaction finished. Instead, guard just the synchronous pre-202 persist: on failure reset processing, drain the queue, and rethrow so the caller still surfaces the error. Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ashlee Radka <ashlee@vellum.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: siddseethepalli <siddseethepalli@gmail.com> Co-authored-by: Vellum Assistant <assistant@vellum.ai>

fix(assistant): clear processing flag if /clean user persist fails

ad6d645

siddseethepalli self-assigned this May 26, 2026

siddseethepalli merged commit a535818 into main May 26, 2026

siddseethepalli deleted the swarm/f3a9/task-14 branch May 26, 2026 18:01

This was referenced May 26, 2026

feat(daemon): add /clean slash command (strip injections, preserve history) #31613

Merged

fix(assistant): clear processing flag if /compact initial persist fails #32128

Merged

noanflaherty mentioned this pull request May 26, 2026

Cherry-pick fixes onto release/v0.8.5 #32159

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(assistant): always clear processing flag if /clean user persist fails (address Codex review on #31613)#32115

fix(assistant): always clear processing flag if /clean user persist fails (address Codex review on #31613)#32115
siddseethepalli merged 1 commit into
mainfrom
swarm/f3a9/task-14

siddseethepalli commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant