Skip to content

fix(assistant): always clear processing flag if /clean user persist fails (address Codex review on #31613)#32115

Merged
siddseethepalli merged 1 commit into
mainfrom
swarm/f3a9/task-14
May 26, 2026
Merged

fix(assistant): always clear processing flag if /clean user persist fails (address Codex review on #31613)#32115
siddseethepalli merged 1 commit into
mainfrom
swarm/f3a9/task-14

Conversation

@siddseethepalli
Copy link
Copy Markdown
Contributor

Summary

The /clean slash-command handler in conversation-routes.ts set conversation.processing = true and then persisted the user message via addMessage(...) outside the try/finally that clears the flag and drains the queue. If that persist threw (transient DB/disk failure), the function exited early with processing still true, leaving every subsequent send for that conversation stuck in queued mode indefinitely.

This wraps the entire /clean branch body in an outer try/finally so conversation.processing is always reset and the queue always drained on every failure path — including a throw from the initial user-message persist. The inner try/catch that broadcasts forceClean errors is preserved unchanged. Behavior on the success path is identical; an initial-persist throw still propagates (consistent with the regular send path) but no longer leaks the flag.

Addresses Codex review on #31613.

Test plan

  • bunx tsc --noEmit clean
  • bun test src/__tests__/conversation-routes-slash-commands.test.ts — 7 pass

@siddseethepalli siddseethepalli self-assigned this May 26, 2026
@siddseethepalli siddseethepalli merged commit a535818 into main May 26, 2026
@siddseethepalli siddseethepalli deleted the swarm/f3a9/task-14 branch May 26, 2026 18:01
siddseethepalli added a commit that referenced this pull request May 26, 2026
…ls (#32128)

The /compact slash branch set conversation.processing = true and then
awaited the initial user-message addMessage OUTSIDE any guard. The
fire-and-forget compaction IIFE owns the try/finally that resets the
flag, but a throw from that initial persist (transient SQLite/disk
error) never reaches it, leaving the conversation stuck in queued mode
indefinitely. This is the same class of bug fixed for /clean in #32115.

An outer try/finally (as used by /clean) is wrong here because compact
returns 202 immediately and runs async, so it would clear the flag
before compaction finished. Instead, guard just the synchronous pre-202
persist: on failure reset processing, drain the queue, and rethrow so
the caller still surfaces the error.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
noanflaherty added a commit that referenced this pull request May 26, 2026
* fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114)

PlanCatalogEntry.price_cents was a required Int, but the platform's
/v1/organizations/billing/plans/ Pro entry no longer emits price_cents
(it moved to base_price_cents + machine_tiers/storage_tiers). The whole
catalog decode hard-failed on the Pro entry, was swallowed by try? in
SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to
load plan information.' on the Plan card.

Make price_cents optional (PlanCard only reads id/name/included_features,
so no display impact) and update the wire-protocol test to the real
tiered Pro payload so this drift can't recur silently.

* fix(skills): remove broken document-writer skill, enhance document-editor (#32151)

* fix(skills): remove broken document-writer skill, enhance document-editor

The managed document-writer skill had no TOOLS.json and a broken include
("document" instead of "document-editor"), so skill_execute could never
find document_create in the registry — causing "Unknown tool" errors on
staging.

Delete document-writer and fold its useful anti-pattern guidance into the
bundled document-editor skill which already owns the TOOLS.json manifest.

Closes JARVIS-961

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(skills): remove document-writer from catalog.json

Stale catalog entry would cause autoInstallFromCatalog to try
reinstalling the deleted skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate catalog.json with correct meet-join timestamp

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143)

* fix(assistant): include messageId on canned-greeting message_complete (LUM-1902)

The canned first-greeting path persisted the assistant row via addMessage()
but discarded the returned id, so the message_complete broadcast lacked
messageId. The macOS client filter at ChatActionHandler.swift:507 treats
message_complete events lacking messageId as aux-style notifications while
a turn is in flight and early-returns, so isSending never cleared and the
3-dot loading indicator stayed visible until the 60s watchdog kicked in.

Capture the persisted assistant row id and pass it through. Architectural
follow-up (other emission sites with the same bug shape, macOS filter
cleanup) tracked in LUM-1904.

* fix(assistant): apply messageId fix to all canned-response paths (LUM-1902)

Same bug shape as the canned greeting: the assistant row is persisted via
addMessage() but the returned id is discarded, so the message_complete
broadcast goes out without messageId and the macOS guard at
ChatActionHandler.swift:507 drops the event whenever the streaming-buffer
50ms flush has fired between the delta and the complete — leaving the user
stuck for the full 60s watchdog.

Patches the same five paths #31994 will eventually subsume:

  - inline approval reply (conversation-routes.ts:422)
  - canned first greeting (conversation-routes.ts:1451)
  - slash command output (:1774)
  - /compact (:1855)
  - /clean (:1935)

Centralized into a single `emitCannedMessageComplete` helper so the
temporary fix is one helper + five one-line callers, easy to grep and
inline-then-delete when #31994 lands. Helper carries the full LUM-1902
context comment so individual call sites stay tidy.

The wake-target adapter (wake-target-adapter.ts:130) has the same bug
shape but isn't a quick fix — AgentEvent.message_complete carries no
messageId at the point of relay, so it needs the pre-allocated anchor
treatment matching #31994's approach. Tracked in LUM-1904.

* chore(assistant): scrub internal ticket reference from helper comment

Linear ticket ids are internal references and don't help open-source
contributors reading this file. The PR reference (#31994) stays since
it's discoverable from the repo.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095)

* perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup

Replace linear-scan firstIndex(where:) with a pre-built [String: Int]
dictionary in handleConversationListResponse and appendConversations.

With ~1800 conversations (post-pagination PR #31924), the old O(n²)
pattern performed ~3.24M string comparisons on @mainactor, blocking the
main thread for ~1.6s and triggering AppHang reports (LUM-1901).

The dictionary reduces this to ~3600 hash lookups — effectively O(n).

Also removes dead code: a redundant snapshot.first(where:) that
searched for a conversation already proven absent by the preceding
firstIndex check.

Closes LUM-1901

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

* fix: keep dictionary in sync when appending new conversations

The old firstIndex(where:) searched the mutated snapshot (including
just-appended items), so duplicate IDs in a single response would match
and update in-place. The dictionary must be kept in sync after each
append to preserve this behavior.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(assistant): clear processing flag if /clean user persist fails (#32115)

Cherry-pick of a535818 — wraps the user-message persist inside the
outer try/finally so a throw from addMessage still clears processing
and drains the queue.

Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(assistant): clear processing flag if /compact initial persist fails (#32128)

The /compact slash branch set conversation.processing = true and then
awaited the initial user-message addMessage OUTSIDE any guard. The
fire-and-forget compaction IIFE owns the try/finally that resets the
flag, but a throw from that initial persist (transient SQLite/disk
error) never reaches it, leaving the conversation stuck in queued mode
indefinitely. This is the same class of bug fixed for /clean in #32115.

An outer try/finally (as used by /clean) is wrong here because compact
returns 202 immediately and runs async, so it would clear the flag
before compaction finished. Instead, guard just the synchronous pre-202
persist: on failure reset processing, drain the queue, and rethrow so
the caller still surfaces the error.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

---------

Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com>
Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ashlee Radka <ashlee@vellum.ai>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
vellum-automation Bot added a commit that referenced this pull request May 26, 2026
* Release v0.8.5

* Cherry-pick fixes onto release/v0.8.5 (#32159)

* fix(macos/billing): tolerate tiered Pro plan in catalog decode (#32114)

PlanCatalogEntry.price_cents was a required Int, but the platform's
/v1/organizations/billing/plans/ Pro entry no longer emits price_cents
(it moved to base_price_cents + machine_tiers/storage_tiers). The whole
catalog decode hard-failed on the Pro entry, was swallowed by try? in
SettingsBillingTab.loadSummary, and surfaced as a perpetual 'Unable to
load plan information.' on the Plan card.

Make price_cents optional (PlanCard only reads id/name/included_features,
so no display impact) and update the wire-protocol test to the real
tiered Pro payload so this drift can't recur silently.

* fix(skills): remove broken document-writer skill, enhance document-editor (#32151)

* fix(skills): remove broken document-writer skill, enhance document-editor

The managed document-writer skill had no TOOLS.json and a broken include
("document" instead of "document-editor"), so skill_execute could never
find document_create in the registry — causing "Unknown tool" errors on
staging.

Delete document-writer and fold its useful anti-pattern guidance into the
bundled document-editor skill which already owns the TOOLS.json manifest.

Closes JARVIS-961

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(skills): remove document-writer from catalog.json

Stale catalog entry would cause autoInstallFromCatalog to try
reinstalling the deleted skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate catalog.json with correct meet-join timestamp

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(assistant): include messageId on canned-response message_complete events (LUM-1902) (#32143)

* fix(assistant): include messageId on canned-greeting message_complete (LUM-1902)

The canned first-greeting path persisted the assistant row via addMessage()
but discarded the returned id, so the message_complete broadcast lacked
messageId. The macOS client filter at ChatActionHandler.swift:507 treats
message_complete events lacking messageId as aux-style notifications while
a turn is in flight and early-returns, so isSending never cleared and the
3-dot loading indicator stayed visible until the 60s watchdog kicked in.

Capture the persisted assistant row id and pass it through. Architectural
follow-up (other emission sites with the same bug shape, macOS filter
cleanup) tracked in LUM-1904.

* fix(assistant): apply messageId fix to all canned-response paths (LUM-1902)

Same bug shape as the canned greeting: the assistant row is persisted via
addMessage() but the returned id is discarded, so the message_complete
broadcast goes out without messageId and the macOS guard at
ChatActionHandler.swift:507 drops the event whenever the streaming-buffer
50ms flush has fired between the delta and the complete — leaving the user
stuck for the full 60s watchdog.

Patches the same five paths #31994 will eventually subsume:

  - inline approval reply (conversation-routes.ts:422)
  - canned first greeting (conversation-routes.ts:1451)
  - slash command output (:1774)
  - /compact (:1855)
  - /clean (:1935)

Centralized into a single `emitCannedMessageComplete` helper so the
temporary fix is one helper + five one-line callers, easy to grep and
inline-then-delete when #31994 lands. Helper carries the full LUM-1902
context comment so individual call sites stay tidy.

The wake-target adapter (wake-target-adapter.ts:130) has the same bug
shape but isn't a quick fix — AgentEvent.message_complete carries no
messageId at the point of relay, so it needs the pre-allocated anchor
treatment matching #31994's approach. Tracked in LUM-1904.

* chore(assistant): scrub internal ticket reference from helper comment

Linear ticket ids are internal references and don't help open-source
contributors reading this file. The PR reference (#31994) stays since
it's discoverable from the repo.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup (#32095)

* perf(macos): replace O(n²) conversation merge with O(1) dictionary lookup

Replace linear-scan firstIndex(where:) with a pre-built [String: Int]
dictionary in handleConversationListResponse and appendConversations.

With ~1800 conversations (post-pagination PR #31924), the old O(n²)
pattern performed ~3.24M string comparisons on @mainactor, blocking the
main thread for ~1.6s and triggering AppHang reports (LUM-1901).

The dictionary reduces this to ~3600 hash lookups — effectively O(n).

Also removes dead code: a redundant snapshot.first(where:) that
searched for a conversation already proven absent by the preceding
firstIndex check.

Closes LUM-1901

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

* fix: keep dictionary in sync when appending new conversations

The old firstIndex(where:) searched the mutated snapshot (including
just-appended items), so duplicate IDs in a single response would match
and update in-place. The dictionary must be kept in sync after each
append to preserve this behavior.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(assistant): clear processing flag if /clean user persist fails (#32115)

Cherry-pick of a535818 — wraps the user-message persist inside the
outer try/finally so a throw from addMessage still clears processing
and drains the queue.

Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(assistant): clear processing flag if /compact initial persist fails (#32128)

The /compact slash branch set conversation.processing = true and then
awaited the initial user-message addMessage OUTSIDE any guard. The
fire-and-forget compaction IIFE owns the try/finally that resets the
flag, but a throw from that initial persist (transient SQLite/disk
error) never reaches it, leaving the conversation stuck in queued mode
indefinitely. This is the same class of bug fixed for /clean in #32115.

An outer try/finally (as used by /clean) is wrong here because compact
returns 202 immediately and runs async, so it would clear the flag
before compaction finished. Instead, guard just the synchronous pre-202
persist: on failure reset processing, drain the queue, and rethrow so
the caller still surfaces the error.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

---------

Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com>
Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ashlee Radka <ashlee@vellum.ai>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Noa Flaherty <noa@vellum.ai>
Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com>
Co-authored-by: Alex Nork <48630278+alex-nork@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ashlee Radka <ashlee@vellum.ai>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant