Skip to content

fix: voice mode bugs — permission bypass, shutdown race, tap leak, prefix pollution, barge-in conflict#6099

Merged
AnitaKirkovska merged 1 commit into
mainfrom
anita/fix-voice-mode-bugs
Feb 21, 2026
Merged

fix: voice mode bugs — permission bypass, shutdown race, tap leak, prefix pollution, barge-in conflict#6099
AnitaKirkovska merged 1 commit into
mainfrom
anita/fix-voice-mode-bugs

Conversation

@AnitaKirkovska
Copy link
Copy Markdown
Contributor

@AnitaKirkovska AnitaKirkovska commented Feb 21, 2026

Summary

  • Fix permission bypass where denial phrases containing affirmative substrings (e.g. "No, don't do it" contains "do it") were incorrectly approved — now treats ambiguous as denial
  • Fix shutdown race condition where stopSpeaking() during deactivate() re-entered startListening() via synchronous ttsOnComplete callback
  • Fix audio tap leak when audioEngine.start() fails — clean up installed tap to prevent crash on next recording
  • Fix voice prefix ([Voice conversation — keep spoken responses brief...]) polluting chat messages and thread titles — now only used for IPC to daemon
  • Fix barge-in monitor conflict by reordering to start before finishTextStream() which may complete synchronously

Original prompt

fix: voice mode bugs — permission bypass, shutdown race, tap leak, prefix pollution, barge-in conflict

🤖 Generated with Claude Code


Open with Devin

1. Permission bypass: 'No, don't do it' was approved because 'do it' matched
   before 'no'/'don't'. Now treats ambiguous (both match) as denial for safety.
2. Shutdown race: stopSpeaking() during deactivate() fired ttsOnComplete which
   called startListening(), leaving service in inconsistent state. Now sets
   state=.off before shutdown.
3. Audio tap leak: if audioEngine.start() failed, the installed tap was never
   removed, causing a crash on the next startRecording() call.
4. Voice prefix pollution: the voice system instruction prefix was stored in
   chat messages and used for thread title generation. Now only sent via IPC.
5. Barge-in monitor conflict: startBargeInMonitor() ran after finishTextStream()
   which could complete synchronously (no ElevenLabs), causing double tap install.
   Reordered to start barge-in monitor before finishTextStream.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 038145891f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// whole transcript. For queued messages this is set in messageDequeued.
if !willBeQueued {
currentTurnUserText = text
currentTurnUserText = rawText
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve voice prefix for secret-blocked retries

Store currentTurnUserText as the transport text (or keep a separate transport copy) in voice mode; assigning rawText here means later recovery flows resend a different prompt than the one originally sent to the daemon. In the secret_blocked path, ChatViewModel+MessageHandling snapshots currentTurnUserText into secretBlockedMessageText, and sendAnyway() sends that value directly, so a voice-mode “Send anyway” retry drops the voice instruction prefix and changes response behavior (e.g., spoken brevity/tooling constraints no longer apply).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Voice prefix pollution fix missed refinementMessagePreview, which is displayed in UI

The PR systematically changes textrawText for all user-facing strings (chat messages at lines 275 and 307, thread titles at line 258, and currentTurnUserText at lines 289 and 344) to prevent the voice instruction prefix from polluting the UI. However, refinementMessagePreview at line 315 still uses text (which includes the voice prefix).

Root Cause and Impact

When voice mode is active and a workspace surface is open (activeSurfaceId != nil), messages are routed as workspace refinements. Line 315 sets refinementMessagePreview = text, where text contains the full voice prefix:

[Voice conversation — keep spoken responses brief...]

user's actual message

This refinementMessagePreview is displayed directly in the workspace activity feed UI (clients/macos/vellum-assistant/Features/Surfaces/WorkspaceActivityFeed.swift:17) via Text(preview). Users would see the raw system instruction prefix in the workspace overlay bubble instead of just their spoken message.

Every other UI-facing usage of the message text in sendMessage() was correctly changed to rawText by this PR, but this one instance was missed.

(Refers to line 315)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant