Make dictation engine start non-blocking with audio route resilience#23811
Conversation
- Add installTapAndStartAsync to AudioEngineController for non-blocking engine start using Swift concurrency (withCheckedContinuation) - Extract installTapAndStartImpl to share logic between sync/async paths - Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode after Bluetooth device connect/disconnect and AirPods mode switches - Restructure VoiceInputManager.beginRecording() to show recording UI and play activation chime immediately, then start engine async via Task - Move DictationContextCapture off the critical path: engine starts concurrently on its audio queue while context capture runs on main - Add SFSpeechRecognizer transient unavailability retry (recreate if isAvailable returns false after sleep/wake or heavy use) - Handle edge case where PTT is released before async engine start completes (stopRecordingForDictation cleans up directly) Co-Authored-By: tkheyfets <timur@vellum.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
@codex review |
When PTT is released before installTapAndStartAsync completes, the isRecording guard now stops and removes the tap if the engine started successfully, preventing the mic path from staying alive with no active recording session. Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
…23811) * Make dictation engine start non-blocking and improve audio resilience - Add installTapAndStartAsync to AudioEngineController for non-blocking engine start using Swift concurrency (withCheckedContinuation) - Extract installTapAndStartImpl to share logic between sync/async paths - Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode after Bluetooth device connect/disconnect and AirPods mode switches - Restructure VoiceInputManager.beginRecording() to show recording UI and play activation chime immediately, then start engine async via Task - Move DictationContextCapture off the critical path: engine starts concurrently on its audio queue while context capture runs on main - Add SFSpeechRecognizer transient unavailability retry (recreate if isAvailable returns false after sleep/wake or heavy use) - Handle edge case where PTT is released before async engine start completes (stopRecordingForDictation cleans up directly) Co-Authored-By: tkheyfets <timur@vellum.ai> * Tear down engine when async startup outlives recording session When PTT is released before installTapAndStartAsync completes, the isRecording guard now stops and removes the tap if the engine started successfully, preventing the mic path from staying alive with no active recording session. Co-Authored-By: tkheyfets <timur@vellum.ai> * Add recording generation token and gate context capture on start success Co-Authored-By: tkheyfets <timur@vellum.ai> * Guard stale teardown against active sessions and gate rewarm on mic auth Co-Authored-By: tkheyfets <timur@vellum.ai> * Move context capture to Task.detached to avoid blocking main actor Co-Authored-By: tkheyfets <timur@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai>
|
Devin is archived and cannot be woken up. Please unarchive Devin if you want to continue using it. |
There was a problem hiding this comment.
🟡 stopContinuousRecording() missing recognitionTask == nil guard — can leave isRecording stuck and violate endAudio contract
stopRecordingForDictation() was correctly updated (lines 894-906) with a guard for recognitionTask == nil to handle the new async engine start flow where the recognition task hasn't been created yet. However, the structurally identical stopContinuousRecording() was not updated with the same guard.
If stopContinuousRecording() is called before the async installTapAndStartAsync completes: (1) recognitionTask is nil so no callback will deliver isFinal, (2) isRecording stays true permanently, (3) endAudio() is called on the request. When the async task eventually completes, it passes the generation check (same session), creates a recognition task with a request that already had endAudio() called, and the tap starts appending buffers after endAudio() — violating SFSpeechAudioBufferRecognitionRequest's contract.
Comparison with the fix in stopRecordingForDictation
stopRecordingForDictation() handles this at lines 894-906:
guard recognitionTask != nil else {
recognitionRequest = nil
isRecording = false
// ... full cleanup
return
}stopContinuousRecording() at lines 260-277 has no such guard.
(Refers to lines 260-277)
Prompt for agents
The stopContinuousRecording() method at line 260 needs the same recognitionTask == nil guard that was added to stopRecordingForDictation() at lines 894-906. Without it, if stopContinuousRecording() is called while the async engine start is still in progress, isRecording remains true forever and endAudio() is called on a request before its recognition task exists.
After the hasInstalledTap = false line (line 272), add a guard checking recognitionTask != nil. If nil, perform direct cleanup: set recognitionRequest = nil, isRecording = false, reset amplitude state, and return early. The cleanup should mirror what stopRecordingForDictation does in its equivalent guard block (lines 894-906), but adapted for the continuous recording context (no overlay dismiss, no dictation context reset).
Was this helpful? React with 👍 or 👎 to provide feedback.
| guard self.isRecording, self.recordingGeneration == generation else { | ||
| // Only tear down if no session is currently active. When a newer | ||
| // session is running (isRecording true, generation mismatch), | ||
| // it owns the engine — tearing down here would remove its tap. | ||
| if success, !self.isRecording { | ||
| self.engineController.stopAndRemoveTap() | ||
| log.info("Engine started for stale generation \(generation) — tore down (no active session)") | ||
| } else if success { | ||
| log.info("Stale generation \(generation) completed — skipping teardown, session \(self.recordingGeneration) owns engine") | ||
| } | ||
| return |
There was a problem hiding this comment.
🟡 Stale session's tap appends audio to request after endAudio() in rapid stop/start
In a rapid stop→start scenario, a stale session's audio tap can feed buffers to a SFSpeechAudioBufferRecognitionRequest after endAudio() has been called — violating the API contract explicitly noted at AudioEngineController.swift:188-189.
Detailed sequence
- Session A (
gen=1):beginRecording()setsisRecording=true, createsrequest_A, launches async engine start - User stops:
stopRecording()→tearDownAudioState()callsrequest_A.endAudio(), setsrecognitionRequest=nil. ButhasInstalledTapis false so the engine is NOT stopped. - Session B (
gen=2):beginRecording()createsrequest_B, launches another async engine start - Audio queue executes A's
installTapAndStartImpl: installstap_A(closure capturesrequest_A), starts engine tap_Afires on the audio thread, callingrequest_A.append(buffer)— afterendAudio()was already called- A's continuation resumes: stale check (
gen=1≠2) → skips teardown becauseisRecording=true("session B owns engine") - Audio queue executes B's impl: removes
tap_A, installstap_B— but audio was already appended to the ended request
The old synchronous installTapAndStart maintained the invariant (remove tap → then endAudio) because by the time beginRecording() returned, the tap was installed and hasInstalledTap=true, so tearDownAudioState() always removed the tap first. The new async path breaks this invariant. The window is narrow (between engine start and the next serial queue item removing the tap), but with the 2+ second Bluetooth latency the PR is designed to handle, the user can easily trigger a stop/start within that window.
Prompt for agents
The stale generation handler at VoiceInputManager.swift:732-742 correctly detects stale sessions but doesn't prevent the stale tap from feeding audio to a request that had endAudio() called. The root cause is that tearDownAudioState() (line 237-246) calls recognitionRequest?.endAudio() when hasInstalledTap is false (async start pending), but the stale session's tap will later install and start feeding audio to the captured request.
Possible approaches:
1. In tearDownAudioState(), when hasInstalledTap is false but a request exists, avoid calling endAudio() — let the stale completion handler clean up instead. The stale handler would need to call endAudio on the captured request after stopAndRemoveTap.
2. Track whether endAudio has been called on the current request (e.g. a flag) and have the tap block check it before appending.
3. In the stale generation handler, when success is true and isRecording is true (newer session active), still call stopAndRemoveTap() to remove the stale tap immediately, rather than relying on the newer session's installTapAndStartAsync to eventually replace it. The newer session's impl will re-install its own tap when it runs on the serial queue.
Was this helpful? React with 👍 or 👎 to provide feedback.
| guard success else { | ||
| log.error("Audio engine failed to start — invalid format or engine error") | ||
| self.isRecording = false | ||
| self.onRecordingStateChanged?(false) | ||
| self.currentDictationContext = nil | ||
| self.recognitionRequest = nil | ||
| self.overlayWindow.dismiss() | ||
| self.resetAudioEngine() | ||
| return |
There was a problem hiding this comment.
🟡 Engine start failure path doesn't play deactivation chime after activation chime was already played
The activation chime is now played immediately at VoiceInputManager.swift:672 (before the async engine start) for instant user feedback. But when the engine subsequently fails to start, the failure handler at lines 735-743 dismisses the overlay and resets state without playing the deactivation chime. The user hears the activation chime indicating recording has started, but gets no audible signal that it failed. In the pre-PR code, the chime was played after the engine was confirmed started, so engine failures never had an orphaned activation chime.
| guard success else { | |
| log.error("Audio engine failed to start — invalid format or engine error") | |
| self.isRecording = false | |
| self.onRecordingStateChanged?(false) | |
| self.currentDictationContext = nil | |
| self.recognitionRequest = nil | |
| self.overlayWindow.dismiss() | |
| self.resetAudioEngine() | |
| return | |
| guard success else { | |
| log.error("Audio engine failed to start — invalid format or engine error") | |
| self.isRecording = false | |
| self.onRecordingStateChanged?(false) | |
| self.currentDictationContext = nil | |
| self.recognitionRequest = nil | |
| self.overlayWindow.dismiss() | |
| self.resetAudioEngine() | |
| VoiceFeedback.playDeactivationChime() | |
| return | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| recordingGeneration &+= 1 | ||
| let generation = recordingGeneration | ||
| isRecording = true | ||
| onRecordingStateChanged?(true) |
There was a problem hiding this comment.
🚩 Activation chime now plays before engine is ready — verify no audio conflict
The activation chime (VoiceFeedback.playActivationChime() at line 672) now plays immediately when isRecording is set, BEFORE the audio engine starts asynchronously. Previously it played AFTER the engine was running (old line 738). This achieves the stated goal of instant feedback. However, if VoiceFeedback.playActivationChime() uses AVAudioPlayer or a system sound API that interacts with the audio session, it could potentially conflict with the audio engine starting shortly after. This is worth verifying but likely fine since system sounds typically use a separate audio path from AVAudioEngine's input node.
Was this helpful? React with 👍 or 👎 to provide feedback.
* revert: disable Teleport feature flag by default (#23744) (#23815) * fix: replace auxWhite-on-primaryBase with VButton across the app (#23802) * fix: use VButton for inline surface action buttons Replace raw Button with manual color functions in InlineSurfaceRouter with the design system VButton component. The manual buttonForeground used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which resolves to #FDFDFC in dark mode, producing invisible white-on-white text. Closes LUM-730 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: replace auxWhite-on-primaryBase with VButton in additional locations FileUploadSurfaceView: Upload/Cancel buttons used raw Button with VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode. Replaced with VButton(.primary) and VButton(.outlined). JITPermissionView: Permission buttons used the same auxWhite pattern. Replaced with VButton(.primary/.outlined, isFullWidth: true). ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on primaryBase fill. Changed to VColor.contentInset which adapts per color scheme. ChatGallerySection: Gallery demo of surface action pills mirrored the old buggy pattern. Updated to use VButton so the gallery accurately represents production rendering. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * Make dictation engine start non-blocking with audio route resilience (#23811) * Make dictation engine start non-blocking and improve audio resilience - Add installTapAndStartAsync to AudioEngineController for non-blocking engine start using Swift concurrency (withCheckedContinuation) - Extract installTapAndStartImpl to share logic between sync/async paths - Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode after Bluetooth device connect/disconnect and AirPods mode switches - Restructure VoiceInputManager.beginRecording() to show recording UI and play activation chime immediately, then start engine async via Task - Move DictationContextCapture off the critical path: engine starts concurrently on its audio queue while context capture runs on main - Add SFSpeechRecognizer transient unavailability retry (recreate if isAvailable returns false after sleep/wake or heavy use) - Handle edge case where PTT is released before async engine start completes (stopRecordingForDictation cleans up directly) Co-Authored-By: tkheyfets <timur@vellum.ai> * Tear down engine when async startup outlives recording session When PTT is released before installTapAndStartAsync completes, the isRecording guard now stops and removes the tap if the engine started successfully, preventing the mic path from staying alive with no active recording session. Co-Authored-By: tkheyfets <timur@vellum.ai> * Add recording generation token and gate context capture on start success Co-Authored-By: tkheyfets <timur@vellum.ai> * Guard stale teardown against active sessions and gate rewarm on mic auth Co-Authored-By: tkheyfets <timur@vellum.ai> * Move context capture to Task.detached to avoid blocking main actor Co-Authored-By: tkheyfets <timur@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766) After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the format cached inside AVAudioInputNode diverges from the engine's actual hardware format. Both outputFormat(forBus:) and a nil format argument to installTap resolve to this stale value, causing: 'Failed to create tap due to format mismatch, <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>' Fix: call audioEngine.reset() before re-querying the format, then pass it explicitly to installTap. This forces the engine to discard its cached graph state and re-read the hardware, so the tap, node, and engine all agree. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824) * fix: pass transport metadata through POST /v1/messages to enable host environment hints The HTTP message handler auto-creates conversations without transport metadata, so applyTransportMetadata() returns early and host environment hints (hostHomeDir, hostUsername) are never injected into the LLM context. This causes the assistant to hallucinate the user's home directory path from their display name instead of using the actual macOS username. Thread transport metadata from the message request body through SendMessageDeps.getOrCreateConversation() to the daemon, and send hostHomeDir/hostUsername from the macOS client in every message request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace dynamic imports with static type imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* revert: disable Teleport feature flag by default (#23744) (#23815) * fix: replace auxWhite-on-primaryBase with VButton across the app (#23802) * fix: use VButton for inline surface action buttons Replace raw Button with manual color functions in InlineSurfaceRouter with the design system VButton component. The manual buttonForeground used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which resolves to #FDFDFC in dark mode, producing invisible white-on-white text. Closes LUM-730 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: replace auxWhite-on-primaryBase with VButton in additional locations FileUploadSurfaceView: Upload/Cancel buttons used raw Button with VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode. Replaced with VButton(.primary) and VButton(.outlined). JITPermissionView: Permission buttons used the same auxWhite pattern. Replaced with VButton(.primary/.outlined, isFullWidth: true). ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on primaryBase fill. Changed to VColor.contentInset which adapts per color scheme. ChatGallerySection: Gallery demo of surface action pills mirrored the old buggy pattern. Updated to use VButton so the gallery accurately represents production rendering. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * Make dictation engine start non-blocking with audio route resilience (#23811) * Make dictation engine start non-blocking and improve audio resilience - Add installTapAndStartAsync to AudioEngineController for non-blocking engine start using Swift concurrency (withCheckedContinuation) - Extract installTapAndStartImpl to share logic between sync/async paths - Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode after Bluetooth device connect/disconnect and AirPods mode switches - Restructure VoiceInputManager.beginRecording() to show recording UI and play activation chime immediately, then start engine async via Task - Move DictationContextCapture off the critical path: engine starts concurrently on its audio queue while context capture runs on main - Add SFSpeechRecognizer transient unavailability retry (recreate if isAvailable returns false after sleep/wake or heavy use) - Handle edge case where PTT is released before async engine start completes (stopRecordingForDictation cleans up directly) Co-Authored-By: tkheyfets <timur@vellum.ai> * Tear down engine when async startup outlives recording session When PTT is released before installTapAndStartAsync completes, the isRecording guard now stops and removes the tap if the engine started successfully, preventing the mic path from staying alive with no active recording session. Co-Authored-By: tkheyfets <timur@vellum.ai> * Add recording generation token and gate context capture on start success Co-Authored-By: tkheyfets <timur@vellum.ai> * Guard stale teardown against active sessions and gate rewarm on mic auth Co-Authored-By: tkheyfets <timur@vellum.ai> * Move context capture to Task.detached to avoid blocking main actor Co-Authored-By: tkheyfets <timur@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766) After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the format cached inside AVAudioInputNode diverges from the engine's actual hardware format. Both outputFormat(forBus:) and a nil format argument to installTap resolve to this stale value, causing: 'Failed to create tap due to format mismatch, <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>' Fix: call audioEngine.reset() before re-querying the format, then pass it explicitly to installTap. This forces the engine to discard its cached graph state and re-read the hardware, so the tap, node, and engine all agree. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824) * fix: pass transport metadata through POST /v1/messages to enable host environment hints The HTTP message handler auto-creates conversations without transport metadata, so applyTransportMetadata() returns early and host environment hints (hostHomeDir, hostUsername) are never injected into the LLM context. This causes the assistant to hallucinate the user's home directory path from their display name instead of using the actual macOS username. Thread transport metadata from the message request body through SendMessageDeps.getOrCreateConversation() to the daemon, and send hostHomeDir/hostUsername from the macOS client in every message request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace dynamic imports with static type imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Release v0.6.1 * Cherry-pick fixes for v0.6.1 (#23785) * Increase teleport import timeout from 2 to 5 minutes (#23749) * increase teleport import timeout from 2 to 5 minutes * fix: update platform import timeout error message to say 5 minutes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update billing tab copy: referral subtitle, remove earning cap note, move credit info to card subtitle (#23751) * fix(macos): always collapse thinking blocks by default (#23750) Thinking blocks were auto-expanding during streaming, showing a wall of text. Remove the auto-expand logic so blocks always start collapsed. Users can still manually expand them. The header already shows "Thinking..." vs "Thought process" as a streaming indicator. Closes LUM-729 Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * [LUM-684/LUM-726] Fix dictation crash: pass nil format to installTap (#23754) * Fix dictation crash: pass nil format to installTap, consolidate audio engine calls Pass nil for the format parameter in AVAudioNode.installTap(onBus:bufferSize:format:block:) so AVAudioEngine uses its own internal hardware format, which is always self-consistent. This prevents NSInternalInconsistencyException crashes caused by format.sampleRate != hwFormat.sampleRate when the cached format from outputFormat(forBus:) diverges from the engine's internal hardware format after audio route changes (Bluetooth, USB mic, AirPods mode switch). AudioEngineController.swift: - installTapAndStart() now passes nil instead of explicit format to installTap - Removed 6 now-unused methods: inputNodeFormat(), installTap(bufferSize:format:block:), removeTap(), prepare(), start(), prepareAndStart() OpenAIVoiceService.swift: - startRecording(): replaced separate inputNodeFormat/installTap/prepare/start chain with single installTapAndStart() call - startBargeInMonitor(): same migration to installTapAndStart() - Removed error-path removeTap() call (handled internally by installTapAndStart) Resolves: LUM-684, LUM-726 Co-Authored-By: tkheyfets <timur@vellum.ai> * fix: use explicit block: parameter in guard statements for installTapAndStart Swift doesn't support trailing closure syntax with guard statements, causing compilation errors. Use explicit block: parameter label instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace contrast buttons with primary style (#23753) Remove all production usages of .contrast button style in favor of .primary. Fixes white-on-white button visibility issues in chat composer. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Inject host environment via transport hints (#23779) * refactor: discriminated union for transport metadata, remove iOS proxy setup (#23776) * feat: inject interface ID and macOS host environment into transport hints (#23777) * feat: send hostHomeDir and hostUsername from macOS client (#23778) * fix: remove iOS from proxy restoration in conversation-process.ts (#23782) --------- Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Tirman Sidhu <tirmansidhu@gmail.com> * [skip ci] Cherry-pick fixes for v0.6.1 (#23820) * revert: disable Teleport feature flag by default (#23744) (#23815) * fix: replace auxWhite-on-primaryBase with VButton across the app (#23802) * fix: use VButton for inline surface action buttons Replace raw Button with manual color functions in InlineSurfaceRouter with the design system VButton component. The manual buttonForeground used VColor.auxWhite (always #FFFFFF) against VColor.primaryBase which resolves to #FDFDFC in dark mode, producing invisible white-on-white text. Closes LUM-730 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> * fix: replace auxWhite-on-primaryBase with VButton in additional locations FileUploadSurfaceView: Upload/Cancel buttons used raw Button with VColor.auxWhite on VColor.primaryBase — white-on-white in dark mode. Replaced with VButton(.primary) and VButton(.outlined). JITPermissionView: Permission buttons used the same auxWhite pattern. Replaced with VButton(.primary/.outlined, isFullWidth: true). ImproveExperienceStepView: ToS checkbox checkmark used auxWhite on primaryBase fill. Changed to VColor.contentInset which adapts per color scheme. ChatGallerySection: Gallery demo of surface action pills mirrored the old buggy pattern. Updated to use VButton so the gallery accurately represents production rendering. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * Make dictation engine start non-blocking with audio route resilience (#23811) * Make dictation engine start non-blocking and improve audio resilience - Add installTapAndStartAsync to AudioEngineController for non-blocking engine start using Swift concurrency (withCheckedContinuation) - Extract installTapAndStartImpl to share logic between sync/async paths - Listen for AVAudioEngineConfigurationChange to re-prewarm inputNode after Bluetooth device connect/disconnect and AirPods mode switches - Restructure VoiceInputManager.beginRecording() to show recording UI and play activation chime immediately, then start engine async via Task - Move DictationContextCapture off the critical path: engine starts concurrently on its audio queue while context capture runs on main - Add SFSpeechRecognizer transient unavailability retry (recreate if isAvailable returns false after sleep/wake or heavy use) - Handle edge case where PTT is released before async engine start completes (stopRecordingForDictation cleans up directly) Co-Authored-By: tkheyfets <timur@vellum.ai> * Tear down engine when async startup outlives recording session When PTT is released before installTapAndStartAsync completes, the isRecording guard now stops and removes the tap if the engine started successfully, preventing the mic path from staying alive with no active recording session. Co-Authored-By: tkheyfets <timur@vellum.ai> * Add recording generation token and gate context capture on start success Co-Authored-By: tkheyfets <timur@vellum.ai> * Guard stale teardown against active sessions and gate rewarm on mic auth Co-Authored-By: tkheyfets <timur@vellum.ai> * Move context capture to Task.detached to avoid blocking main actor Co-Authored-By: tkheyfets <timur@vellum.ai> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * [LUM-681] Fix audio tap format mismatch by resetting engine before installTap (#23766) After audio-route changes (Bluetooth, USB mic, AirPods mode switch), the format cached inside AVAudioInputNode diverges from the engine's actual hardware format. Both outputFormat(forBus:) and a nil format argument to installTap resolve to this stale value, causing: 'Failed to create tap due to format mismatch, <AVAudioFormat: 2 ch, 44100 Hz, Float32, deinterleaved>' Fix: call audioEngine.reset() before re-querying the format, then pass it explicitly to installTap. This forces the engine to discard its cached graph state and re-read the hardware, so the tap, node, and engine all agree. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: tkheyfets <timur@vellum.ai> * fix: pass transport hints through HTTP message endpoint for managed-mode conversations (#23824) * fix: pass transport metadata through POST /v1/messages to enable host environment hints The HTTP message handler auto-creates conversations without transport metadata, so applyTransportMetadata() returns early and host environment hints (hostHomeDir, hostUsername) are never injected into the LLM context. This causes the assistant to hallucinate the user's home directory path from their display name instead of using the actual macOS username. Thread transport metadata from the message request body through SendMessageDeps.getOrCreateConversation() to the daemon, and send hostHomeDir/hostUsername from the macOS client in every message request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace dynamic imports with static type imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: reset non-version-bump files to match main Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Carson Shaar <carson.s.shaar@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> Co-authored-by: tkheyfets <timur@vellum.ai> Co-authored-by: Tirman Sidhu <tirmansidhu@gmail.com> Co-authored-by: David Vargas Fuertes <vargasvellum@Davids-MacBook-Pro.local>
Eliminates the 2+ second main-thread stall during PTT dictation by making the audio engine start asynchronously (
withCheckedContinuation) and movingDictationContextCaptureto a detached Task, so the overlay/chime appear instantly and key-up events are never blocked by hardware init or AX timeouts.