Skip to content

Make dictation engine start non-blocking with audio route resilience#23811

Merged
tkheyfets merged 5 commits into
mainfrom
devin/1775501077-dictation-snappy-fix
Apr 6, 2026
Merged

Make dictation engine start non-blocking with audio route resilience#23811
tkheyfets merged 5 commits into
mainfrom
devin/1775501077-dictation-snappy-fix

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 6, 2026

Eliminates the 2+ second main-thread stall during PTT dictation by making the audio engine start asynchronously (withCheckedContinuation) and moving DictationContextCapture to a detached Task, so the overlay/chime appear instantly and key-up events are never blocked by hardware init or AX timeouts.



Open with Devin

- Add installTapAndStartAsync to AudioEngineController for non-blocking
  engine start using Swift concurrency (withCheckedContinuation)
- Extract installTapAndStartImpl to share logic between sync/async paths
- Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode
  after Bluetooth device connect/disconnect and AirPods mode switches
- Restructure VoiceInputManager.beginRecording() to show recording UI
  and play activation chime immediately, then start engine async via Task
- Move DictationContextCapture off the critical path: engine starts
  concurrently on its audio queue while context capture runs on main
- Add SFSpeechRecognizer transient unavailability retry (recreate if
  isAvailable returns false after sleep/wake or heavy use)
- Handle edge case where PTT is released before async engine start
  completes (stopRecordingForDictation cleans up directly)

Co-Authored-By: tkheyfets <timur@vellum.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@tkheyfets tkheyfets marked this pull request as ready for review April 6, 2026 18:56
@tkheyfets
Copy link
Copy Markdown
Contributor

@codex review

chatgpt-codex-connector[bot]

This comment was marked as resolved.

When PTT is released before installTapAndStartAsync completes, the
isRecording guard now stops and removes the tap if the engine started
successfully, preventing the mic path from staying alive with no
active recording session.

Co-Authored-By: tkheyfets <timur@vellum.ai>
@tkheyfets
Copy link
Copy Markdown
Contributor

@codex review

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@tkheyfets
Copy link
Copy Markdown
Contributor

@codex review

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@tkheyfets
Copy link
Copy Markdown
Contributor

@codex review

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@tkheyfets
Copy link
Copy Markdown
Contributor

@codex review

@tkheyfets tkheyfets merged commit 7d8f70a into main Apr 6, 2026
6 checks passed
@tkheyfets tkheyfets deleted the devin/1775501077-dictation-snappy-fix branch April 6, 2026 20:00
noanflaherty pushed a commit that referenced this pull request Apr 6, 2026
…23811)

* Make dictation engine start non-blocking and improve audio resilience

- Add installTapAndStartAsync to AudioEngineController for non-blocking
  engine start using Swift concurrency (withCheckedContinuation)
- Extract installTapAndStartImpl to share logic between sync/async paths
- Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode
  after Bluetooth device connect/disconnect and AirPods mode switches
- Restructure VoiceInputManager.beginRecording() to show recording UI
  and play activation chime immediately, then start engine async via Task
- Move DictationContextCapture off the critical path: engine starts
  concurrently on its audio queue while context capture runs on main
- Add SFSpeechRecognizer transient unavailability retry (recreate if
  isAvailable returns false after sleep/wake or heavy use)
- Handle edge case where PTT is released before async engine start
  completes (stopRecordingForDictation cleans up directly)

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Tear down engine when async startup outlives recording session

When PTT is released before installTapAndStartAsync completes, the
isRecording guard now stops and removes the tap if the engine started
successfully, preventing the mic path from staying alive with no
active recording session.

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Add recording generation token and gate context capture on start success

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Guard stale teardown against active sessions and gate rewarm on mic auth

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Move context capture to Task.detached to avoid blocking main actor

Co-Authored-By: tkheyfets <timur@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Devin is archived and cannot be woken up. Please unarchive Devin if you want to continue using it.

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 stopContinuousRecording() missing recognitionTask == nil guard — can leave isRecording stuck and violate endAudio contract

stopRecordingForDictation() was correctly updated (lines 894-906) with a guard for recognitionTask == nil to handle the new async engine start flow where the recognition task hasn't been created yet. However, the structurally identical stopContinuousRecording() was not updated with the same guard.

If stopContinuousRecording() is called before the async installTapAndStartAsync completes: (1) recognitionTask is nil so no callback will deliver isFinal, (2) isRecording stays true permanently, (3) endAudio() is called on the request. When the async task eventually completes, it passes the generation check (same session), creates a recognition task with a request that already had endAudio() called, and the tap starts appending buffers after endAudio() — violating SFSpeechAudioBufferRecognitionRequest's contract.

Comparison with the fix in stopRecordingForDictation

stopRecordingForDictation() handles this at lines 894-906:

guard recognitionTask != nil else {
    recognitionRequest = nil
    isRecording = false
    // ... full cleanup
    return
}

stopContinuousRecording() at lines 260-277 has no such guard.

(Refers to lines 260-277)

Prompt for agents
The stopContinuousRecording() method at line 260 needs the same recognitionTask == nil guard that was added to stopRecordingForDictation() at lines 894-906. Without it, if stopContinuousRecording() is called while the async engine start is still in progress, isRecording remains true forever and endAudio() is called on a request before its recognition task exists.

After the hasInstalledTap = false line (line 272), add a guard checking recognitionTask != nil. If nil, perform direct cleanup: set recognitionRequest = nil, isRecording = false, reset amplitude state, and return early. The cleanup should mirror what stopRecordingForDictation does in its equivalent guard block (lines 894-906), but adapted for the continuous recording context (no overlay dismiss, no dictation context reset).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +732 to +742
guard self.isRecording, self.recordingGeneration == generation else {
// Only tear down if no session is currently active. When a newer
// session is running (isRecording true, generation mismatch),
// it owns the engine — tearing down here would remove its tap.
if success, !self.isRecording {
self.engineController.stopAndRemoveTap()
log.info("Engine started for stale generation \(generation) — tore down (no active session)")
} else if success {
log.info("Stale generation \(generation) completed — skipping teardown, session \(self.recordingGeneration) owns engine")
}
return
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale session's tap appends audio to request after endAudio() in rapid stop/start

In a rapid stop→start scenario, a stale session's audio tap can feed buffers to a SFSpeechAudioBufferRecognitionRequest after endAudio() has been called — violating the API contract explicitly noted at AudioEngineController.swift:188-189.

Detailed sequence
  1. Session A (gen=1): beginRecording() sets isRecording=true, creates request_A, launches async engine start
  2. User stops: stopRecording()tearDownAudioState() calls request_A.endAudio(), sets recognitionRequest=nil. But hasInstalledTap is false so the engine is NOT stopped.
  3. Session B (gen=2): beginRecording() creates request_B, launches another async engine start
  4. Audio queue executes A's installTapAndStartImpl: installs tap_A (closure captures request_A), starts engine
  5. tap_A fires on the audio thread, calling request_A.append(buffer) — after endAudio() was already called
  6. A's continuation resumes: stale check (gen=1≠2) → skips teardown because isRecording=true ("session B owns engine")
  7. Audio queue executes B's impl: removes tap_A, installs tap_B — but audio was already appended to the ended request

The old synchronous installTapAndStart maintained the invariant (remove tap → then endAudio) because by the time beginRecording() returned, the tap was installed and hasInstalledTap=true, so tearDownAudioState() always removed the tap first. The new async path breaks this invariant. The window is narrow (between engine start and the next serial queue item removing the tap), but with the 2+ second Bluetooth latency the PR is designed to handle, the user can easily trigger a stop/start within that window.

Prompt for agents
The stale generation handler at VoiceInputManager.swift:732-742 correctly detects stale sessions but doesn't prevent the stale tap from feeding audio to a request that had endAudio() called. The root cause is that tearDownAudioState() (line 237-246) calls recognitionRequest?.endAudio() when hasInstalledTap is false (async start pending), but the stale session's tap will later install and start feeding audio to the captured request.

Possible approaches:
1. In tearDownAudioState(), when hasInstalledTap is false but a request exists, avoid calling endAudio() — let the stale completion handler clean up instead. The stale handler would need to call endAudio on the captured request after stopAndRemoveTap.
2. Track whether endAudio has been called on the current request (e.g. a flag) and have the tap block check it before appending.
3. In the stale generation handler, when success is true and isRecording is true (newer session active), still call stopAndRemoveTap() to remove the stale tap immediately, rather than relying on the newer session's installTapAndStartAsync to eventually replace it. The newer session's impl will re-install its own tap when it runs on the serial queue.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +744 to +752
guard success else {
log.error("Audio engine failed to start — invalid format or engine error")
self.isRecording = false
self.onRecordingStateChanged?(false)
self.currentDictationContext = nil
self.recognitionRequest = nil
self.overlayWindow.dismiss()
self.resetAudioEngine()
return
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Engine start failure path doesn't play deactivation chime after activation chime was already played

The activation chime is now played immediately at VoiceInputManager.swift:672 (before the async engine start) for instant user feedback. But when the engine subsequently fails to start, the failure handler at lines 735-743 dismisses the overlay and resets state without playing the deactivation chime. The user hears the activation chime indicating recording has started, but gets no audible signal that it failed. In the pre-PR code, the chime was played after the engine was confirmed started, so engine failures never had an orphaned activation chime.

Suggested change
guard success else {
log.error("Audio engine failed to start — invalid format or engine error")
self.isRecording = false
self.onRecordingStateChanged?(false)
self.currentDictationContext = nil
self.recognitionRequest = nil
self.overlayWindow.dismiss()
self.resetAudioEngine()
return
guard success else {
log.error("Audio engine failed to start — invalid format or engine error")
self.isRecording = false
self.onRecordingStateChanged?(false)
self.currentDictationContext = nil
self.recognitionRequest = nil
self.overlayWindow.dismiss()
self.resetAudioEngine()
VoiceFeedback.playDeactivationChime()
return
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

recordingGeneration &+= 1
let generation = recordingGeneration
isRecording = true
onRecordingStateChanged?(true)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Activation chime now plays before engine is ready — verify no audio conflict

The activation chime (VoiceFeedback.playActivationChime() at line 672) now plays immediately when isRecording is set, BEFORE the audio engine starts asynchronously. Previously it played AFTER the engine was running (old line 738). This achieves the stated goal of instant feedback. However, if VoiceFeedback.playActivationChime() uses AVAudioPlayer or a system sound API that interacts with the audio session, it could potentially conflict with the audio engine starting shortly after. This is worth verifying but likely fine since system sounds typically use a separate audio path from AVAudioEngine's input node.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

noanflaherty added a commit that referenced this pull request Apr 6, 2026
* revert: disable Teleport feature flag by default (#23744) (#23815)

* fix: replace auxWhite-on-primaryBase with VButton across the app (#23802)

* fix: use VButton for inline surface action buttons

Replace raw Button with manual color functions in InlineSurfaceRouter
with the design system VButton component. The manual buttonForeground
used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which
resolves to #FDFDFC in dark mode, producing invisible white-on-white
text.

Closes LUM-730

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

* fix: replace auxWhite-on-primaryBase with VButton in additional locations

FileUploadSurfaceView: Upload/Cancel buttons used raw Button with
VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode.
Replaced with VButton(.primary) and VButton(.outlined).

JITPermissionView: Permission buttons used the same auxWhite pattern.
Replaced with VButton(.primary/.outlined, isFullWidth: true).

ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on
primaryBase fill. Changed to VColor.contentInset which adapts per
color scheme.

ChatGallerySection: Gallery demo of surface action pills mirrored
the old buggy pattern. Updated to use VButton so the gallery
accurately represents production rendering.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>

* Make dictation engine start non-blocking with audio route resilience (#23811)

* Make dictation engine start non-blocking and improve audio resilience

- Add installTapAndStartAsync to AudioEngineController for non-blocking
  engine start using Swift concurrency (withCheckedContinuation)
- Extract installTapAndStartImpl to share logic between sync/async paths
- Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode
  after Bluetooth device connect/disconnect and AirPods mode switches
- Restructure VoiceInputManager.beginRecording() to show recording UI
  and play activation chime immediately, then start engine async via Task
- Move DictationContextCapture off the critical path: engine starts
  concurrently on its audio queue while context capture runs on main
- Add SFSpeechRecognizer transient unavailability retry (recreate if
  isAvailable returns false after sleep/wake or heavy use)
- Handle edge case where PTT is released before async engine start
  completes (stopRecordingForDictation cleans up directly)

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Tear down engine when async startup outlives recording session

When PTT is released before installTapAndStartAsync completes, the
isRecording guard now stops and removes the tap if the engine started
successfully, preventing the mic path from staying alive with no
active recording session.

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Add recording generation token and gate context capture on start success

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Guard stale teardown against active sessions and gate rewarm on mic auth

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Move context capture to Task.detached to avoid blocking main actor

Co-Authored-By: tkheyfets <timur@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766)

After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the
format cached inside AVAudioInputNode diverges from the engine's actual
hardware format. Both outputFormat(forBus:) and a nil format argument to
installTap resolve to this stale value, causing:

  'Failed to create tap due to format mismatch,
   <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>'

Fix: call audioEngine.reset() before re-querying the format, then pass it
explicitly to installTap. This forces the engine to discard its cached graph
state and re-read the hardware, so the tap, node, and engine all agree.

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824)

* fix: pass transport metadata through POST /v1/messages to enable host environment hints

The HTTP message handler auto-creates conversations without transport
metadata, so applyTransportMetadata() returns early and host environment
hints (hostHomeDir, hostUsername) are never injected into the LLM context.
This causes the assistant to hallucinate the user's home directory path
from their display name instead of using the actual macOS username.

Thread transport metadata from the message request body through
SendMessageDeps.getOrCreateConversation() to the daemon, and send
hostHomeDir/hostUsername from the macOS client in every message request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace dynamic imports with static type imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dvargasfuertes pushed a commit that referenced this pull request Apr 6, 2026
* revert: disable Teleport feature flag by default (#23744) (#23815)

* fix: replace auxWhite-on-primaryBase with VButton across the app (#23802)

* fix: use VButton for inline surface action buttons

Replace raw Button with manual color functions in InlineSurfaceRouter
with the design system VButton component. The manual buttonForeground
used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which
resolves to #FDFDFC in dark mode, producing invisible white-on-white
text.

Closes LUM-730

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

* fix: replace auxWhite-on-primaryBase with VButton in additional locations

FileUploadSurfaceView: Upload/Cancel buttons used raw Button with
VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode.
Replaced with VButton(.primary) and VButton(.outlined).

JITPermissionView: Permission buttons used the same auxWhite pattern.
Replaced with VButton(.primary/.outlined, isFullWidth: true).

ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on
primaryBase fill. Changed to VColor.contentInset which adapts per
color scheme.

ChatGallerySection: Gallery demo of surface action pills mirrored
the old buggy pattern. Updated to use VButton so the gallery
accurately represents production rendering.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>

* Make dictation engine start non-blocking with audio route resilience (#23811)

* Make dictation engine start non-blocking and improve audio resilience

- Add installTapAndStartAsync to AudioEngineController for non-blocking
  engine start using Swift concurrency (withCheckedContinuation)
- Extract installTapAndStartImpl to share logic between sync/async paths
- Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode
  after Bluetooth device connect/disconnect and AirPods mode switches
- Restructure VoiceInputManager.beginRecording() to show recording UI
  and play activation chime immediately, then start engine async via Task
- Move DictationContextCapture off the critical path: engine starts
  concurrently on its audio queue while context capture runs on main
- Add SFSpeechRecognizer transient unavailability retry (recreate if
  isAvailable returns false after sleep/wake or heavy use)
- Handle edge case where PTT is released before async engine start
  completes (stopRecordingForDictation cleans up directly)

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Tear down engine when async startup outlives recording session

When PTT is released before installTapAndStartAsync completes, the
isRecording guard now stops and removes the tap if the engine started
successfully, preventing the mic path from staying alive with no
active recording session.

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Add recording generation token and gate context capture on start success

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Guard stale teardown against active sessions and gate rewarm on mic auth

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Move context capture to Task.detached to avoid blocking main actor

Co-Authored-By: tkheyfets <timur@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766)

After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the
format cached inside AVAudioInputNode diverges from the engine's actual
hardware format. Both outputFormat(forBus:) and a nil format argument to
installTap resolve to this stale value, causing:

  'Failed to create tap due to format mismatch,
   <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>'

Fix: call audioEngine.reset() before re-querying the format, then pass it
explicitly to installTap. This forces the engine to discard its cached graph
state and re-read the hardware, so the tap, node, and engine all agree.

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824)

* fix: pass transport metadata through POST /v1/messages to enable host environment hints

The HTTP message handler auto-creates conversations without transport
metadata, so applyTransportMetadata() returns early and host environment
hints (hostHomeDir, hostUsername) are never injected into the LLM context.
This causes the assistant to hallucinate the user's home directory path
from their display name instead of using the actual macOS username.

Thread transport metadata from the message request body through
SendMessageDeps.getOrCreateConversation() to the daemon, and send
hostHomeDir/hostUsername from the macOS client in every message request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace dynamic imports with static type imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dvargasfuertes pushed a commit that referenced this pull request Apr 6, 2026
* Release v0.6.1

* Cherry-pick fixes for v0.6.1 (#23785)

* Increase teleport import timeout from 2 to 5 minutes (#23749)

* increase teleport import timeout from 2 to 5 minutes

* fix: update platform import timeout error message to say 5 minutes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update billing tab copy: referral subtitle, remove earning cap note, move credit info to card subtitle (#23751)

* fix(macos): always collapse thinking blocks by default (#23750)

Thinking blocks were auto-expanding during streaming, showing a wall of
text. Remove the auto-expand logic so blocks always start collapsed.
Users can still manually expand them. The header already shows
"Thinking..." vs "Thought process" as a streaming indicator.

Closes LUM-729

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>

* [LUM-684/LUM-726] Fix dictation crash: pass nil format to installTap (#23754)

* Fix dictation crash: pass nil format to installTap, consolidate audio engine calls

Pass nil for the format parameter in AVAudioNode.installTap(onBus:bufferSize:format:block:)
so AVAudioEngine uses its own internal hardware format, which is always self-consistent.
This prevents NSInternalInconsistencyException crashes caused by format.sampleRate != hwFormat.sampleRate
when the cached format from outputFormat(forBus:) diverges from the engine's internal hardware
format after audio route changes (Bluetooth, USB mic, AirPods mode switch).

AudioEngineController.swift:
- installTapAndStart() now passes nil instead of explicit format to installTap
- Removed 6 now-unused methods: inputNodeFormat(), installTap(bufferSize:format:block:),
  removeTap(), prepare(), start(), prepareAndStart()

OpenAIVoiceService.swift:
- startRecording(): replaced separate inputNodeFormat/installTap/prepare/start chain
  with single installTapAndStart() call
- startBargeInMonitor(): same migration to installTapAndStart()
- Removed error-path removeTap() call (handled internally by installTapAndStart)

Resolves: LUM-684, LUM-726
Co-Authored-By: tkheyfets <timur@vellum.ai>

* fix: use explicit block: parameter in guard statements for installTapAndStart

Swift doesn't support trailing closure syntax with guard statements,
causing compilation errors. Use explicit block: parameter label instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace contrast buttons with primary style (#23753)

Remove all production usages of .contrast button style in favor of .primary.
Fixes white-on-white button visibility issues in chat composer.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Inject host environment via transport hints (#23779)

* refactor: discriminated union for transport metadata, remove iOS proxy setup (#23776)

* feat: inject interface ID and macOS host environment into transport hints (#23777)

* feat: send hostHomeDir and hostUsername from macOS client (#23778)

* fix: remove iOS from proxy restoration in conversation-process.ts (#23782)

---------

Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Tirman Sidhu <tirmansidhu@gmail.com>

* [skip ci] Cherry-pick fixes for v0.6.1 (#23820)

* revert: disable Teleport feature flag by default (#23744) (#23815)

* fix: replace auxWhite-on-primaryBase with VButton across the app (#23802)

* fix: use VButton for inline surface action buttons

Replace raw Button with manual color functions in InlineSurfaceRouter
with the design system VButton component. The manual buttonForeground
used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which
resolves to #FDFDFC in dark mode, producing invisible white-on-white
text.

Closes LUM-730

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

* fix: replace auxWhite-on-primaryBase with VButton in additional locations

FileUploadSurfaceView: Upload/Cancel buttons used raw Button with
VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode.
Replaced with VButton(.primary) and VButton(.outlined).

JITPermissionView: Permission buttons used the same auxWhite pattern.
Replaced with VButton(.primary/.outlined, isFullWidth: true).

ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on
primaryBase fill. Changed to VColor.contentInset which adapts per
color scheme.

ChatGallerySection: Gallery demo of surface action pills mirrored
the old buggy pattern. Updated to use VButton so the gallery
accurately represents production rendering.

Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>

* Make dictation engine start non-blocking with audio route resilience (#23811)

* Make dictation engine start non-blocking and improve audio resilience

- Add installTapAndStartAsync to AudioEngineController for non-blocking
  engine start using Swift concurrency (withCheckedContinuation)
- Extract installTapAndStartImpl to share logic between sync/async paths
- Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode
  after Bluetooth device connect/disconnect and AirPods mode switches
- Restructure VoiceInputManager.beginRecording() to show recording UI
  and play activation chime immediately, then start engine async via Task
- Move DictationContextCapture off the critical path: engine starts
  concurrently on its audio queue while context capture runs on main
- Add SFSpeechRecognizer transient unavailability retry (recreate if
  isAvailable returns false after sleep/wake or heavy use)
- Handle edge case where PTT is released before async engine start
  completes (stopRecordingForDictation cleans up directly)

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Tear down engine when async startup outlives recording session

When PTT is released before installTapAndStartAsync completes, the
isRecording guard now stops and removes the tap if the engine started
successfully, preventing the mic path from staying alive with no
active recording session.

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Add recording generation token and gate context capture on start success

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Guard stale teardown against active sessions and gate rewarm on mic auth

Co-Authored-By: tkheyfets <timur@vellum.ai>

* Move context capture to Task.detached to avoid blocking main actor

Co-Authored-By: tkheyfets <timur@vellum.ai>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766)

After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the
format cached inside AVAudioInputNode diverges from the engine's actual
hardware format. Both outputFormat(forBus:) and a nil format argument to
installTap resolve to this stale value, causing:

  'Failed to create tap due to format mismatch,
   <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>'

Fix: call audioEngine.reset() before re-querying the format, then pass it
explicitly to installTap. This forces the engine to discard its cached graph
state and re-read the hardware, so the tap, node, and engine all agree.

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: tkheyfets <timur@vellum.ai>

* fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824)

* fix: pass transport metadata through POST /v1/messages to enable host environment hints

The HTTP message handler auto-creates conversations without transport
metadata, so applyTransportMetadata() returns early and host environment
hints (hostHomeDir, hostUsername) are never injected into the LLM context.
This causes the assistant to hallucinate the user's home directory path
from their display name instead of using the actual macOS username.

Thread transport metadata from the message request body through
SendMessageDeps.getOrCreateConversation() to the daemon, and send
hostHomeDir/hostUsername from the macOS client in every message request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace dynamic imports with static type imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: reset non-version-bump files to match main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Noa Flaherty <noa@vellum.ai>
Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Co-authored-by: tkheyfets <timur@vellum.ai>
Co-authored-by: Tirman Sidhu <tirmansidhu@gmail.com>
Co-authored-by: David Vargas Fuertes <vargasvellum@Davids-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant