Skip to content

feat(desktop): add voice commands with wake word detection and mic permission UX#1055

Closed
saddlepaddle wants to merge 18 commits intomainfrom
wake-word-test
Closed

feat(desktop): add voice commands with wake word detection and mic permission UX#1055
saddlepaddle wants to merge 18 commits intomainfrom
wake-word-test

Conversation

@saddlepaddle
Copy link
Copy Markdown
Collaborator

@saddlepaddle saddlepaddle commented Jan 29, 2026

Summary

  • Adds a Python-based voice sidecar that listens for a wake word ("Hey Jarvis"), captures speech, transcribes it via Whisper, and routes commands back to the desktop app
  • Gates the feature behind a "Voice Commands" toggle in Settings > Features (default: off)
  • The voice sidecar process auto-starts when the setting is enabled and auto-stops when disabled, using the tRPC subscription subscriber-count lifecycle
  • Gates voice toggle on macOS microphone permission — prompts for access on first enable, shows guidance when denied, re-checks on window focus
  • Downloads openwakeword models during sidecar build — models are no longer shipped in the pip package (>=0.6.0), so the build script fetches them from GitHub releases

Key Changes

  • Schema: New voice_commands_enabled boolean column in local SQLite settings table (migration 0016)
  • tRPC settings: getVoiceCommandsEnabled / setVoiceCommandsEnabled procedures with optimistic UI updates
  • Voice tRPC router: Subscription-based voice event stream with subscriber-count auto-start/stop for the Python sidecar process; getMicPermission query and requestMicPermission mutation for macOS permission flow
  • Voice sidecar: Python process using openwakeword for wake word detection, sounddevice for audio capture, piping events via stdout JSON
  • API route: /api/voice endpoint for Whisper transcription and LLM tool routing
  • UI: VoiceListener component gated on both voiceEnabled and micPermission === "granted"; RecordingIndicator and ResponsePanel toast components; permission-aware settings toggle with denied-state warning and "Open System Settings" link
  • Build: Sidecar bundling script downloads required ONNX models, bundles via PyInstaller with manual copy fallback, and verifies hey_jarvis model is present
  • Entitlements: com.apple.security.device.audio-input added to macOS entitlements plist; NSMicrophoneUsageDescription in Info.plist
  • Canary config: Fixed extendInfo spread to preserve base config keys (NSMicrophoneUsageDescription)

Microphone Permission UX

  • First time: Toggle ON → macOS "would like to access the microphone" prompt → grant → sidecar starts
  • Denied: Toggle stays OFF, warning with "Open System Settings" link appears, switch disabled
  • Revoked in System Settings: VoiceListener re-checks on window focus, disconnects subscription
  • Non-macOS: No permission UI, toggle works directly

Test plan

  • Start app fresh — voice process should NOT start (default off)
  • Go to Settings > Features — "Voice Commands" toggle visible and off
  • Toggle ON — macOS microphone permission prompt appears
  • Grant permission — voice process starts (check logs for [voice-process])
  • Say "Hey Jarvis" — recording indicator toast appears, then response panel
  • Toggle OFF — voice process stops, lingering toasts dismiss
  • Deny permission (fresh install or TCC reset) — toggle stays OFF, warning with "Open System Settings" link
  • Click "Open System Settings" — navigates to Privacy > Microphone
  • Grant in System Settings, return to app — switch re-enables on window focus
  • Restart app — setting persists as last set
  • Search "voice" in settings — toggle is discoverable

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced voice commands allowing you to say "Hey Jarvis" to create tasks, ask questions, and manage work by voice
    • Added voice recording interface with real-time feedback during audio capture and processing
    • Added Voice Commands setting in behavior preferences to enable or disable voice interaction
    • Implemented microphone permission management with system-level permission requests

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete voice command system for the desktop application. It integrates OpenAI Whisper transcription and Claude AI responses on the backend, implements a Python-based voice sidecar for wake word detection and audio capture, adds TypeScript process management and TRPC routers, creates React UI components for voice interaction feedback, and extends the database schema with voice command settings.

Changes

Cohort / File(s) Summary
Backend Voice API
apps/api/package.json, apps/api/src/env.ts, apps/api/src/app/api/voice/route.ts, apps/api/src/app/api/voice/voice-service.ts
Adds OpenAI dependency, OPENAI_API_KEY env var, and implements POST /api/voice endpoint with SSE streaming. Voice pipeline transcribes audio via Whisper, processes with Claude AI, executes tool calls up to 5 rounds, and streams results back.
Desktop Python Voice Sidecar
apps/desktop/src/main/lib/voice/python/..., apps/desktop/scripts/build-voice-sidecar.sh
Implements Python-based voice sidecar with wake word detection via OpenWakeWord ("Hey Jarvis"), audio capture with silence detection, and JSON event streaming to parent process. Build script creates standalone PyInstaller binary with models and dependencies.
Desktop Voice Process Management
apps/desktop/src/main/lib/voice/voice-process-paths.ts, apps/desktop/src/main/lib/voice/voice-process.ts, apps/desktop/src/main/index.ts
Adds process spawning configuration with environment-aware logic (production/development/preview), event emitter for voice lifecycle, graceful shutdown with timeout, and cleanup on app quit.
Desktop TRPC Voice Router
apps/desktop/src/lib/trpc/routers/voice/index.ts, apps/desktop/src/lib/trpc/routers/settings/index.ts
Implements voice subscription with auto-start/stop, microphone permission queries and requests, voice command enable/disable settings with database persistence.
Desktop React UI Components
apps/desktop/src/renderer/components/Voice/..., apps/desktop/src/renderer/routes/_authenticated/layout.tsx, apps/desktop/src/renderer/routes/_authenticated/settings/behavior/...
Adds VoiceListener component that subscribes to voice events, displays RecordingIndicator during capture, renders ResponsePanel with SSE streaming, handles audio processing via useVoicePipeline hook, and integrates Voice Commands toggle in settings with permission flow.
Database Schema and Configuration
packages/local-db/drizzle/0016_add_voice_commands_enabled.sql, packages/local-db/drizzle/meta/..., packages/local-db/src/schema/schema.ts
Adds voice_commands_enabled column to settings table with migration and updated Drizzle schema snapshots.
Build and Desktop Configuration
.github/workflows/build-desktop.yml, apps/desktop/package.json, apps/desktop/electron-builder.ts, apps/desktop/electron-builder.canary.ts, apps/desktop/src/resources/build/entitlements.mac.plist
Adds Python 3.11 setup in CI, includes voice-sidecar binary as extraResource, spreads base macOS config, and adds microphone permission entitlements and NSMicrophoneUsageDescription for macOS Info.plist.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant VoiceListener as VoiceListener<br/>(React)
    participant Sidecar as Voice Sidecar<br/>(Python)
    participant ProcessMgr as Voice Process<br/>(Node.js)
    participant API as Backend API<br/>(Node.js)
    participant OpenAI as OpenAI
    participant Claude as Anthropic Claude
    participant ResponsePanel as ResponsePanel<br/>(React)

    User->>VoiceListener: Component mounts<br/>(voice enabled + mic permission)
    activate VoiceListener
    VoiceListener->>ProcessMgr: Subscribe to voice events
    ProcessMgr->>Sidecar: Start voice sidecar
    activate Sidecar
    Sidecar->>Sidecar: Load wake word model
    Sidecar->>ProcessMgr: emit(ready)
    ProcessMgr->>VoiceListener: idle event
    User->>User: Say "Hey Jarvis"
    Sidecar->>Sidecar: Detect wake word
    Sidecar->>Sidecar: Capture audio + pre-buffer
    Sidecar->>ProcessMgr: emit(recording)
    ProcessMgr->>VoiceListener: recording event
    VoiceListener->>ResponsePanel: Show RecordingIndicator
    User->>User: Speak command
    Sidecar->>Sidecar: End speech capture<br/>(silence detected)
    Sidecar->>ProcessMgr: emit(audio_captured)<br/>audioB64, durationS
    ProcessMgr->>VoiceListener: audio_captured event
    VoiceListener->>ResponsePanel: Pass audio to hook
    activate ResponsePanel
    ResponsePanel->>API: POST /api/voice<br/>multipart form-data
    activate API
    API->>OpenAI: Transcribe audio<br/>(Whisper)
    OpenAI->>API: Transcription text
    API->>API: emit(transcription)
    API->>Claude: Process with Claude<br/>+ tool definitions
    Claude->>API: Tool calls or response
    loop Tool execution (max 5 rounds)
        API->>API: Execute tool
        API->>Claude: Feed tool results
        Claude->>API: Next response
    end
    API->>API: emit(text_delta, done)
    API->>ResponsePanel: SSE stream
    deactivate API
    ResponsePanel->>ResponsePanel: Parse SSE events<br/>update state
    ResponsePanel->>ResponsePanel: Render streaming response
    deactivate ResponsePanel
    User->>ResponsePanel: Click Stop or<br/>auto-dismiss after done
    ResponsePanel->>API: Abort request
    deactivate VoiceListener
    Sidecar->>ProcessMgr: Stop listening
    deactivate Sidecar
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through audio streams,
Wake words spark and circuits dream,
From whispers caught to Claude's reply,
Voice commands soar through the UI sky,
"Hey Jarvis!" rings—the sidecar wakes,
As desktop talks for progress' sake! 🎙️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: adding voice commands with wake word detection and microphone permission UX for the desktop application.
Description check ✅ Passed The description is comprehensive and covers all required template sections: summary, related changes, type (new feature), testing checklist, and additional context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wake-word-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🤖 Fix all issues with AI agents
In `@apps/api/package.json`:
- Line 14: The package.json dependency for "@anthropic-ai/sdk" references a
non-existent version "^0.71.2"; update the version string for the
"@anthropic-ai/sdk" entry to a valid published release (e.g., "^0.61.0" or
"0.61.0") in apps/api/package.json, then run your package manager to refresh the
lockfile (npm install or yarn install) so lock files are updated accordingly;
ensure no other references to the invalid version remain.

In `@apps/api/src/app/api/voice/route.ts`:
- Around line 33-40: The catch block that handles request.formData() failures
swallows the thrown error; update it to capture the caught error (e.g., catch
(err) or catch (error)), log the error using the existing logger or console
(reference the formData handling around request.formData() and the
Response.json({ error: ... }, { status: 400 }) return) and then return the same
400 response—ensure the log includes the error object/message for debugging
while keeping the response body unchanged.

In `@apps/api/src/app/api/voice/tool-adapter.ts`:
- Around line 51-60: The catch is silently swallowing zodToJsonSchema failures;
update the catch in the loop over inputSchema to log the error before falling
back — use a prefixed console log like "[voice/tool-adapter] Failed to convert
schema for key <key>:" and include the caught error object, then continue to
assign properties[key] = { type: "string" } as the fallback; keep existing
checks around isOptional and use the same variables (inputSchema, properties,
required, zodToJsonSchema).
- Around line 90-170: The custom zodToJsonSchema() function is using Zod
internals (_zod?.def) which is fragile; remove zodToJsonSchema and replace its
usages with Zod's supported API z.toJSONSchema(schema, options?) (e.g., import z
from 'zod' and call z.toJSONSchema(yourSchema) where zodToJsonSchema was used),
delete any helper unwrapZod usage tied only to this converter, and adjust any
code that expected the old output shape to match the native z.toJSONSchema
output (update property names/options as needed).

In `@apps/api/src/app/api/voice/voice-service.ts`:
- Around line 23-38: Wrap the transcribeAudio function body (including the
OpenAI client creation and the openai.audio.transcriptions.create call) in a
try/catch; on error, log contextual information (e.g., the error object and any
relevant metadata) and either emit an SSE error event to the caller or rethrow a
wrapped error with a clear message so callers can handle it; ensure the catch
references transcribeAudio and openai.audio.transcriptions.create so the change
is easy to find.
- Around line 108-134: Wrap each call to executeTool inside a try/catch so a
thrown error doesn't crash the loop: for each toolBlock in toolUseBlocks, call
executeTool inside try, on success write the existing sse.write("tool_result", {
toolName, result }) and push the same toolResults entry; in catch, capture the
error, call sse.write("tool_result", { toolName: toolBlock.name, result: {
error: String(error), message: (error as Error)?.message } }) and push a
toolResults entry with type "tool_result", tool_use_id: toolBlock.id and content
containing the error details (so downstream code can see failures) then continue
to the next toolBlock.

In `@apps/api/src/env.ts`:
- Around line 39-40: Make OPENAI_API_KEY and ANTHROPIC_API_KEY optional in the
env schema (change z.string().min(1) to .optional()) so missing keys don't
prevent startup; then update the voice-related code paths (the voice
service/initialization where these keys are consumed) to explicitly check for
the presence of OPENAI_API_KEY and ANTHROPIC_API_KEY before attempting to use
them and return a clear, actionable error when the voice feature is enabled but
the required key(s) are missing.

In `@apps/desktop/electron-builder.ts`:
- Around line 59-64: The extraResources entry points to
dist/voice-sidecar/voice-sidecar but the build script build-voice-sidecar.sh is
never run; to fix, invoke the script before packaging by adding "bun run
scripts/build-voice-sidecar.sh" to the prepackage hook in package.json
(prepackage) or insert a step that runs the script in the build-desktop GitHub
Actions job prior to the packaging step, and update RELEASE.md local testing
instructions to document running build-voice-sidecar.sh so the voice-sidecar
binary exists when electron-builder packages the app.

In `@apps/desktop/src/lib/trpc/routers/voice/index.ts`:
- Around line 44-52: The start/stop mutations call
startVoiceProcess()/stopVoiceProcess() directly and thus bypass subscriberCount
tracking causing desyncs; fix by either removing these manual mutations (if not
needed) or changing their implementations to update subscriberCount consistently
(e.g., start should increment subscriberCount and call startVoiceProcess only
when moving from 0→1, stop should decrement and call stopVoiceProcess only when
moving to 0) and add brief comments documenting intended debugging/manual use;
touch the start and stop publicProcedure.mutation handlers and ensure they
reference subscriberCount, startVoiceProcess, and stopVoiceProcess so lifecycle
remains consistent.

In `@apps/desktop/src/main/lib/voice/python/.gitignore`:
- Around line 1-3: Add the ".env" pattern to the .gitignore shown in this change
so local environment files containing OPENAI_API_KEY and ANTHROPIC_API_KEY (and
other secrets) are not committed; update the existing .gitignore entries (which
currently list .venv/, __pycache__/, *.pyc) to also include .env, and commit
that change to prevent accidental inclusion of API keys.

In `@apps/desktop/src/main/lib/voice/python/audio.py`:
- Around line 54-60: The JSON writes can race between _emit_error and emit, so
add a shared threading.Lock named STDOUT_LOCK in this module and wrap the
sys.stdout.write/flush calls in _emit_error with STDOUT_LOCK.acquire()/release()
(or a with STDOUT_LOCK: context) to serialize writes; then update the other
writer (the emit function in main.py) to import and use the same STDOUT_LOCK
from this module so all stdout JSON lines are emitted under the same lock.

In `@apps/desktop/src/main/lib/voice/python/main.py`:
- Around line 54-63: The stdin_reader function currently handles a
{"cmd":"stop"} message but ignores the documented {"cmd":"start"}; update
stdin_reader to explicitly handle "start" as a no-op (do nothing but optionally
acknowledge) and add a fallback that warns/logs when an unknown cmd is received
so the contract matches the docstring; locate the stdin_reader function and
augment the cmd handling branch (alongside the existing stop_event.set() branch)
to treat "start" as a no-op and call the existing logging facility (or print)
for unknown commands.
- Around line 113-134: After the capture loop returns, short‑circuit if shutdown
was requested: check stop_event.is_set() before converting/serializing and
emitting audio so we don't emit partial captures during shutdown. In other
words, after obtaining speech_audio (and/or before calling to_wav_b64 and emit
in main.py), if stop_event.is_set() then reset detector and skip the
to_wav_b64/emit sequence (or return/continue the outer processing loop) so no
audio_captured event is sent when capture was aborted; reference the stop_event,
capturer.get_audio(), speech_audio, to_wav_b64, emit, and detector.reset in your
change.

In `@apps/desktop/src/main/lib/voice/python/speech_capture.py`:
- Around line 72-76: The duration_s method overestimates time for multi-channel
audio because it divides total_samples by self._config.sample_rate only; modify
duration_s to divide total_samples by (self._config.sample_rate *
self._config.channels) so frames across channels are accounted for (use
self._config.channels alongside sample_rate when computing duration from
self._buffers and self._buffers' .size values).

In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts`:
- Around line 60-72: getDevConfig currently assumes a Unix venv path
(.venv/bin/python3) causing Windows to always fallback; update getDevConfig to
detect the platform (process.platform === "win32") or attempt both
platform-specific venv paths and choose the first that exists. Construct
platform paths from scriptDir (e.g., join(scriptDir, ".venv", "Scripts",
"python.exe") for Windows and join(scriptDir, ".venv", "bin", "python3") for
others), check existsSync on the candidate path(s), and return the matching path
as command (with args ["main.py"], cwd: scriptDir); keep the fallback to
"python3" if no venv python is found.
- Around line 74-86: getPreviewConfig currently builds venvPython with a
POSIX-only path (".venv/bin/python3") and uses "python3" as the fallback, which
breaks on Windows; change getPreviewConfig to construct the venv python path
using path.join with a platform-specific segment (use ".venv/Scripts/python.exe"
on win32 and ".venv/bin/python3" otherwise) and set the fallback command to
"python" on Windows and "python3" on other platforms, keeping the same
scriptDir, previewDir, srcDir, args (["main.py"]) and cwd logic.

In
`@apps/desktop/src/renderer/components/Voice/components/RecordingIndicator/RecordingIndicator.tsx`:
- Around line 17-23: The button text "Cancel" is misleading because the onClick
only calls toast.dismiss(toastId) and does not stop recording; either wire it to
the actual cancel flow or rename it to match behavior—update the onClick to call
the real cancel handler (e.g., cancelRecording() or a passed prop like
onCancelRecording()) in addition to toast.dismiss(toastId), or simply change the
label from "Cancel" to "Dismiss" in the RecordingIndicator component so the UI
matches the existing toast.dismiss(toastId) behavior.

In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`:
- Around line 116-122: The try/catch around JSON.parse is silently swallowing
parse errors; update the catch to log the malformed SSE data and the parse error
(e.g., using console.error) so debugging is possible, include the raw
line.slice(6) payload, the current eventType and the caught error, then proceed
to reset eventType as before; the change should be applied in the block
surrounding JSON.parse(...) that calls handleSSEEvent(eventType, data,
setState).
🧹 Nitpick comments (6)
apps/desktop/src/main/lib/voice/python/.gitignore (1)

1-3: Consider adding common Python build and test artifacts.

Since the PR mentions build/packaging flow for the voice sidecar and a pyproject.toml, consider adding patterns for build artifacts and other common Python files:

  • dist/ and build/ - distribution and build directories
  • *.egg-info/ - Python package metadata
  • *.pyo - optimized bytecode files
  • .pytest_cache/ - pytest cache (if using pytest)
  • Alternative venv names like venv/ and env/
📦 Proposed additions for build artifacts
 .venv/
+venv/
+env/
 __pycache__/
 *.pyc
+*.pyo
+dist/
+build/
+*.egg-info/
+.pytest_cache/
apps/desktop/src/main/lib/voice/python/pyproject.toml (1)

7-11: Consider adding a lockfile for reproducible builds.

The dependency ranges are reasonable, but since this is built into a standalone binary via PyInstaller, consider adding a requirements.txt or using a lockfile mechanism (e.g., pip freeze > requirements.txt after testing) to ensure reproducible builds across different machines and CI runs.

apps/api/src/app/api/voice/tool-adapter.ts (1)

5-11: Prefer a params object for ToolHandler signature.
This avoids positional arguments and aligns with the project’s function-parameter guideline.

Proposed refactor
-type ToolHandler = (
-	params: Record<string, unknown>,
-	ctx: McpContext,
-) => Promise<{
+type ToolHandler = (args: {
+	params: Record<string, unknown>;
+	ctx: McpContext;
+}) => Promise<{
 	content: Array<{ type: "text"; text: string }>;
 	isError?: boolean;
 }>;
@@
-				handler: async (params, ctx) => {
-					return handler(params, {
+				handler: async ({ params, ctx }) => {
+					return handler(params, {
 						authInfo: { extra: { mcpContext: ctx } },
 					});
 				},
@@
-		const result = await tool.handler(toolInput, ctx);
+		const result = await tool.handler({ params: toolInput, ctx });

As per coding guidelines “Functions with 2+ parameters should accept a single params object with named properties instead of positional arguments”.

Also applies to: 71-74, 236-238

apps/desktop/src/lib/trpc/routers/voice/index.ts (1)

11-12: Subscriber count state is scoped to the factory function instance.

The subscriberCount variable lives in the closure created by createVoiceRouter(). Since this is called once at app startup, this works correctly. However, if the router were ever recreated (e.g., during hot reload in development), the count would reset while the voice process might still be running.

Consider adding a comment documenting this assumption, or alternatively, moving the state to the voice-process.ts module where the process lifecycle is managed.

apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (1)

27-35: Extract the auto-dismiss timeout to a named constant.

The 8000ms value is a magic number. Extracting it to a constant improves readability and makes future adjustments easier.

Proposed fix
+const AUTO_DISMISS_DELAY_MS = 8000;
+
 export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) {
 	// ... existing code ...

 	// Auto-dismiss after done
 	useEffect(() => {
 		if (status === "done") {
 			const timer = setTimeout(() => {
 				toast.dismiss(toastId);
-			}, 8000);
+			}, AUTO_DISMISS_DELAY_MS);
 			return () => clearTimeout(timer);
 		}
 	}, [status, toastId]);
apps/desktop/src/main/lib/voice/voice-process.ts (1)

69-71: Log JSON parse errors instead of silently ignoring.

While non-JSON output from the Python process is expected during startup, silently swallowing all parse errors makes debugging difficult. The warning is already being logged, but the catch block should be consistent.

♻️ Proposed minor improvement
 			try {
 				const raw = JSON.parse(line) as PythonVoiceEvent;
 				const event = parsePythonEvent(raw);
 				if (event) {
 					lastEvent = event;
 					voiceProcessEmitter.emit("voice-event", event);
 				}
-			} catch {
+			} catch (error) {
 				console.warn("[voice-process] Non-JSON stdout:", line);
 			}

Comment thread apps/api/package.json
Comment on lines +33 to +40
try {
formData = await request.formData();
} catch {
return Response.json(
{ error: "Expected multipart form data with audio file" },
{ status: 400 },
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log the error when form data parsing fails.

The catch block discards the error, making debugging difficult if unexpected parsing failures occur. As per coding guidelines, errors should be logged at minimum.

Proposed fix
 	try {
 		formData = await request.formData();
-	} catch {
+	} catch (error) {
+		console.error("[voice/route] Form data parsing failed:", error);
 		return Response.json(
 			{ error: "Expected multipart form data with audio file" },
 			{ status: 400 },
 		);
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
formData = await request.formData();
} catch {
return Response.json(
{ error: "Expected multipart form data with audio file" },
{ status: 400 },
);
}
try {
formData = await request.formData();
} catch (error) {
console.error("[voice/route] Form data parsing failed:", error);
return Response.json(
{ error: "Expected multipart form data with audio file" },
{ status: 400 },
);
}
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/route.ts` around lines 33 - 40, The catch block
that handles request.formData() failures swallows the thrown error; update it to
capture the caught error (e.g., catch (err) or catch (error)), log the error
using the existing logger or console (reference the formData handling around
request.formData() and the Response.json({ error: ... }, { status: 400 })
return) and then return the same 400 response—ensure the log includes the error
object/message for debugging while keeping the response body unchanged.

Comment on lines +51 to +60
for (const [key, schema] of Object.entries(inputSchema)) {
try {
properties[key] = zodToJsonSchema(schema);
if (!isOptional(schema)) {
required.push(key);
}
} catch {
// Fallback for schemas that can't be converted
properties[key] = { type: "string" };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log schema conversion failures instead of silent fallback.
The catch block hides conversion errors; add a prefixed log with context before defaulting.

Proposed fix
-				} catch {
-					// Fallback for schemas that can't be converted
-					properties[key] = { type: "string" };
-				}
+				} catch (error) {
+					console.warn(
+						`[voice/tool-adapter] Failed to convert schema for ${name}.${key}`,
+						error,
+					);
+					// Fallback for schemas that can't be converted
+					properties[key] = { type: "string" };
+				}

As per coding guidelines “Never swallow errors silently; at minimum log them with context” and “Use prefixed console logging with pattern [domain/operation] message for all logging”.

🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/tool-adapter.ts` around lines 51 - 60, The catch
is silently swallowing zodToJsonSchema failures; update the catch in the loop
over inputSchema to log the error before falling back — use a prefixed console
log like "[voice/tool-adapter] Failed to convert schema for key <key>:" and
include the caught error object, then continue to assign properties[key] = {
type: "string" } as the fallback; keep existing checks around isOptional and use
the same variables (inputSchema, properties, required, zodToJsonSchema).

Comment on lines +90 to +170
function zodToJsonSchema(schema: z.ZodType): Record<string, unknown> {
const def = (
schema as unknown as {
_zod?: { def?: { type?: string; typeName?: string } };
}
)._zod?.def;
const description = schema.description;

// Unwrap optional/default wrappers
const innerSchema = unwrapZod(schema);
const innerDef = (innerSchema as { _zod?: { def?: Record<string, unknown> } })
._zod?.def;
const typeName = (innerDef?.typeName ??
def?.type ??
def?.typeName ??
"") as string;

const result: Record<string, unknown> = {};

switch (typeName) {
case "ZodString":
case "string":
result.type = "string";
break;
case "ZodNumber":
case "number":
result.type = "number";
break;
case "ZodBoolean":
case "boolean":
result.type = "boolean";
break;
case "ZodArray":
case "array": {
result.type = "array";
const itemSchema =
(innerDef as Record<string, unknown>)?.innerType ??
(innerDef as Record<string, unknown>)?.type;
if (itemSchema && itemSchema instanceof z.ZodType) {
result.items = zodToJsonSchema(itemSchema);
}
break;
}
case "ZodEnum":
case "enum": {
result.type = "string";
const values =
(innerDef as Record<string, unknown>)?.entries ??
(innerDef as Record<string, unknown>)?.values;
if (Array.isArray(values)) {
result.enum = values;
} else if (values && typeof values === "object") {
result.enum = Object.keys(values);
}
break;
}
case "ZodObject":
case "object": {
result.type = "object";
const shape = (innerDef as Record<string, unknown>)?.shape;
if (shape && typeof shape === "object") {
const props: Record<string, unknown> = {};
for (const [k, v] of Object.entries(
shape as Record<string, z.ZodType>,
)) {
props[k] = zodToJsonSchema(v);
}
result.properties = props;
}
break;
}
default:
result.type = "string";
break;
}

if (description) {
result.description = description;
}

return result;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Zod 4 JSON Schema conversion recommended approach official API

💡 Result:

For Zod v4, the recommended/official approach is to use Zod’s native JSON Schema conversion API:

  • Zod → JSON Schema: z.toJSONSchema(schema, options?) (official) (zod.dev)
  • JSON Schema → Zod: z.fromJSONSchema(jsonSchema) (experimental, not stable API) (zod.dev)

Example (Zod → JSON Schema):

import * as z from "zod";

const schema = z.object({
  name: z.string(),
  age: z.number(),
});

const jsonSchema = z.toJSONSchema(schema, {
  target: "draft-2020-12", // default; also supports "draft-07", "draft-04", "openapi-3.0"
});

Key official options include target, unrepresentable ("throw" or "any"), cycle handling (cycles), and $defs behavior for reused schemas (reused). (zod.dev)

Citations:


🌐 Web query:

Zod _zod def private API stability across versions

💡 Result:

  • In Zod v4, every core schema ($ZodType) has a single property named _zod, which is explicitly described as the schema “internals” object; it contains things like _zod.def (a JSON-serializable definition) and _zod.run() (internal parser). So schema._zod.def is an intended inspection surface for tooling that traverses schemas—but it’s still framed as internals, not the everyday public API. [2]
  • In Zod v3, schemas do not have _zod; they use _def instead. The Zod docs even recommend using "_zod" in schema to distinguish v4 vs v3 at runtime. [1]
  • For stability across versions:
    • Zod’s guidance for library/tool authors is to import from versioned “permalink” subpaths like "zod/v3" and "zod/v4/core", and it states these subpaths “will remain available forever” (i.e., the import targets are stable). [1]
    • That said, because _zod is an internals object, you should assume its shape/semantics can change across releases (especially across major versions). In practice, regressions involving _zod have occurred in v4 patch releases (example: an issue filed against 4.0.17 reporting a runtime error reading '_zod'). [3]

Practical recommendation: if you need schema introspection, target zod/v4/core and primarily rely on schema._zod.def + documented discriminators like def.type; avoid depending on undocumented internal fields beyond what the Core docs describe, and pin/test Zod versions when you ship tooling. [1][2]

Sources: Zod “For library authors” docs [1], Zod Core docs [2], GitHub issue demonstrating breakage involving _zod [3].


🌐 Web query:

zod-to-json-schema library Zod compatibility maintained

💡 Result:

The original zod-to-json-schema package (StefanTerdell) is not maintaining ongoing Zod compatibility anymore: its README states that as of November 2025 it will no longer be receiving updates, recommending switching to Zod v4’s native JSON Schema support instead. [1] Zod v4 includes built-in z.toJSONSchema() / z.fromJSONSchema(), reducing the need for this library. [2]

Practically:

  • The last published v3.x line is 3.24.6 (published June 25, 2025) per package tracking, so don’t expect fixes for newer Zod changes. [3]
  • The README also notes Zod v4 can be used only as a peer dependency, but you still need to provide v3-style schemas (so it’s not “full” Zod v4 support). [1]

If you specifically need a maintained converter library (instead of Zod v4 native conversion), a commonly referenced maintained option is the fork @alcyone-labs/zod-to-json-schema, which is explicitly “upgraded to Zod V4” and claims full Zod v4 compatibility. [4]

Bottom line: official zod-to-json-schema is effectively frozen/deprecated, so for “compatibility maintained” you should prefer Zod v4 native z.toJSONSchema() or a maintained fork like @alcyone-labs/zod-to-json-schema. [1][2][4]

References:
[1] https://github.com/StefanTerdell/zod-to-json-schema
[2] https://zod.dev/json-schema
[3] https://www.stacktco.com/packages/zod-to-json-schema
[4] https://www.npmjs.com/package/%40alcyone-labs%2Fzod-to-json-schema


Replace custom zodToJsonSchema() with Zod's native z.toJSONSchema() API.

The code accesses _zod?.def, which is Zod's internal inspection surface and not stable across versions (v3 vs v4 use different structures, and breaking changes have occurred in patch releases). Zod v4 provides an official, maintained z.toJSONSchema(schema, options?) function specifically for this purpose. The external zod-to-json-schema package is also no longer maintained as of November 2025, with the Zod team recommending migration to the native API.

🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/tool-adapter.ts` around lines 90 - 170, The custom
zodToJsonSchema() function is using Zod internals (_zod?.def) which is fragile;
remove zodToJsonSchema and replace its usages with Zod's supported API
z.toJSONSchema(schema, options?) (e.g., import z from 'zod' and call
z.toJSONSchema(yourSchema) where zodToJsonSchema was used), delete any helper
unwrapZod usage tied only to this converter, and adjust any code that expected
the old output shape to match the native z.toJSONSchema output (update property
names/options as needed).

Comment on lines +23 to +38
async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> {
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });

const blob = new Blob([audioBuffer], { type: "audio/wav" });
const file = new File([blob], "audio.wav", { type: "audio/wav" });

const result = await openai.audio.transcriptions.create({
model: "whisper-1",
file,
});

// Strip wake word from transcription
let text = result.text.trim();
text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim();
return text;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error handling for transcription failures.

The transcribeAudio function can throw on API errors (network issues, invalid audio, rate limits), but errors propagate unhandled to the caller. Consider wrapping with try/catch and emitting an SSE error event, or at minimum logging the failure with context.

🛡️ Proposed fix to add error handling
 async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> {
 	const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
 
 	const blob = new Blob([audioBuffer], { type: "audio/wav" });
 	const file = new File([blob], "audio.wav", { type: "audio/wav" });
 
+	try {
 	const result = await openai.audio.transcriptions.create({
 		model: "whisper-1",
 		file,
 	});
 
 	// Strip wake word from transcription
 	let text = result.text.trim();
 	text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim();
 	return text;
+	} catch (error) {
+		console.error("[voice/transcribe] Whisper API error:", error);
+		throw error;
+	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> {
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
const blob = new Blob([audioBuffer], { type: "audio/wav" });
const file = new File([blob], "audio.wav", { type: "audio/wav" });
const result = await openai.audio.transcriptions.create({
model: "whisper-1",
file,
});
// Strip wake word from transcription
let text = result.text.trim();
text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim();
return text;
}
async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> {
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
const blob = new Blob([audioBuffer], { type: "audio/wav" });
const file = new File([blob], "audio.wav", { type: "audio/wav" });
try {
const result = await openai.audio.transcriptions.create({
model: "whisper-1",
file,
});
// Strip wake word from transcription
let text = result.text.trim();
text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim();
return text;
} catch (error) {
console.error("[voice/transcribe] Whisper API error:", error);
throw error;
}
}
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/voice-service.ts` around lines 23 - 38, Wrap the
transcribeAudio function body (including the OpenAI client creation and the
openai.audio.transcriptions.create call) in a try/catch; on error, log
contextual information (e.g., the error object and any relevant metadata) and
either emit an SSE error event to the caller or rethrow a wrapped error with a
clear message so callers can handle it; ensure the catch references
transcribeAudio and openai.audio.transcriptions.create so the change is easy to
find.

Comment on lines +72 to +76
def duration_s(self) -> float:
if not self._buffers:
return 0.0
total_samples = sum(b.size for b in self._buffers)
return total_samples / self._config.sample_rate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix duration calculation for multi-channel audio.

Config.channels (apps/desktop/src/main/lib/voice/python/config.py Lines 7‑9) allows multi‑channel capture, but duration_s divides only by sample_rate. For >1 channel this inflates duration.

🛠️ Suggested fix
-        total_samples = sum(b.size for b in self._buffers)
-        return total_samples / self._config.sample_rate
+        total_samples = sum(b.size for b in self._buffers)
+        return total_samples / (self._config.sample_rate * self._config.channels)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def duration_s(self) -> float:
if not self._buffers:
return 0.0
total_samples = sum(b.size for b in self._buffers)
return total_samples / self._config.sample_rate
def duration_s(self) -> float:
if not self._buffers:
return 0.0
total_samples = sum(b.size for b in self._buffers)
return total_samples / (self._config.sample_rate * self._config.channels)
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/python/speech_capture.py` around lines 72 -
76, The duration_s method overestimates time for multi-channel audio because it
divides total_samples by self._config.sample_rate only; modify duration_s to
divide total_samples by (self._config.sample_rate * self._config.channels) so
frames across channels are accounted for (use self._config.channels alongside
sample_rate when computing duration from self._buffers and self._buffers' .size
values).

Comment on lines +60 to +72
function getDevConfig(): VoiceSpawnConfig {
const scriptDir = join(app.getAppPath(), "src/main/lib/voice/python");
const venvPython = join(scriptDir, ".venv/bin/python3");

if (existsSync(venvPython)) {
return { command: venvPython, args: ["main.py"], cwd: scriptDir };
}

console.warn(
"[voice-paths] Dev venv not found, falling back to system python3",
);
return { command: "python3", args: ["main.py"], cwd: scriptDir };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Windows compatibility issue in dev mode.

The venv Python path .venv/bin/python3 is Unix-specific. On Windows, the path would be .venv/Scripts/python.exe. This will cause dev mode to always fall back to system Python on Windows.

🛡️ Proposed fix for cross-platform support
 function getDevConfig(): VoiceSpawnConfig {
 	const scriptDir = join(app.getAppPath(), "src/main/lib/voice/python");
-	const venvPython = join(scriptDir, ".venv/bin/python3");
+	const venvPython = join(
+		scriptDir,
+		process.platform === "win32"
+			? ".venv/Scripts/python.exe"
+			: ".venv/bin/python3",
+	);
 
 	if (existsSync(venvPython)) {
 		return { command: venvPython, args: ["main.py"], cwd: scriptDir };
 	}
 
 	console.warn(
 		"[voice-paths] Dev venv not found, falling back to system python3",
 	);
-	return { command: "python3", args: ["main.py"], cwd: scriptDir };
+	return {
+		command: process.platform === "win32" ? "python" : "python3",
+		args: ["main.py"],
+		cwd: scriptDir,
+	};
 }
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 60 - 72,
getDevConfig currently assumes a Unix venv path (.venv/bin/python3) causing
Windows to always fallback; update getDevConfig to detect the platform
(process.platform === "win32") or attempt both platform-specific venv paths and
choose the first that exists. Construct platform paths from scriptDir (e.g.,
join(scriptDir, ".venv", "Scripts", "python.exe") for Windows and
join(scriptDir, ".venv", "bin", "python3") for others), check existsSync on the
candidate path(s), and return the matching path as command (with args
["main.py"], cwd: scriptDir); keep the fallback to "python3" if no venv python
is found.

Comment on lines +74 to +86
function getPreviewConfig(): VoiceSpawnConfig {
const previewDir = join(__dirname, "../lib/voice/python");
const srcDir = join(app.getAppPath(), "src/main/lib/voice/python");

const scriptDir = existsSync(previewDir) ? previewDir : srcDir;
const venvPython = join(srcDir, ".venv/bin/python3");

if (existsSync(venvPython)) {
return { command: venvPython, args: ["main.py"], cwd: scriptDir };
}

return { command: "python3", args: ["main.py"], cwd: scriptDir };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same Windows compatibility issue in preview mode.

The venv path and system Python command have the same cross-platform issue as in dev mode.

🛡️ Proposed fix for cross-platform support
 function getPreviewConfig(): VoiceSpawnConfig {
 	const previewDir = join(__dirname, "../lib/voice/python");
 	const srcDir = join(app.getAppPath(), "src/main/lib/voice/python");
 
 	const scriptDir = existsSync(previewDir) ? previewDir : srcDir;
-	const venvPython = join(srcDir, ".venv/bin/python3");
+	const venvPython = join(
+		srcDir,
+		process.platform === "win32"
+			? ".venv/Scripts/python.exe"
+			: ".venv/bin/python3",
+	);
 
 	if (existsSync(venvPython)) {
 		return { command: venvPython, args: ["main.py"], cwd: scriptDir };
 	}
 
-	return { command: "python3", args: ["main.py"], cwd: scriptDir };
+	return {
+		command: process.platform === "win32" ? "python" : "python3",
+		args: ["main.py"],
+		cwd: scriptDir,
+	};
 }
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 74 - 86,
getPreviewConfig currently builds venvPython with a POSIX-only path
(".venv/bin/python3") and uses "python3" as the fallback, which breaks on
Windows; change getPreviewConfig to construct the venv python path using
path.join with a platform-specific segment (use ".venv/Scripts/python.exe" on
win32 and ".venv/bin/python3" otherwise) and set the fallback command to
"python" on Windows and "python3" on other platforms, keeping the same
scriptDir, previewDir, srcDir, args (["main.py"]) and cwd logic.

Comment on lines +17 to +23
<button
type="button"
onClick={() => toast.dismiss(toastId)}
className="ml-1 text-xs text-muted-foreground hover:text-foreground"
>
Cancel
</button>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

“Cancel” label doesn’t match behavior.
The button only dismisses the toast; recording likely continues. Either wire it to an actual cancel action or rename to “Dismiss” to match behavior.

Proposed text fix
-				Cancel
+				Dismiss
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<button
type="button"
onClick={() => toast.dismiss(toastId)}
className="ml-1 text-xs text-muted-foreground hover:text-foreground"
>
Cancel
</button>
<button
type="button"
onClick={() => toast.dismiss(toastId)}
className="ml-1 text-xs text-muted-foreground hover:text-foreground"
>
Dismiss
</button>
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/RecordingIndicator/RecordingIndicator.tsx`
around lines 17 - 23, The button text "Cancel" is misleading because the onClick
only calls toast.dismiss(toastId) and does not stop recording; either wire it to
the actual cancel flow or rename it to match behavior—update the onClick to call
the real cancel handler (e.g., cancelRecording() or a passed prop like
onCancelRecording()) in addition to toast.dismiss(toastId), or simply change the
label from "Cancel" to "Dismiss" in the RecordingIndicator component so the UI
matches the existing toast.dismiss(toastId) behavior.

Comment on lines +116 to +122
try {
const data = JSON.parse(line.slice(6));
handleSSEEvent(eventType, data, setState);
} catch {
// Skip malformed data
}
eventType = "";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't silently swallow JSON parse errors.

The empty catch block violates the coding guideline against silently swallowing errors. At minimum, log malformed SSE data for debugging purposes.

🛡️ Proposed fix to log parse errors
 					try {
 						const data = JSON.parse(line.slice(6));
 						handleSSEEvent(eventType, data, setState);
-					} catch {
-						// Skip malformed data
+					} catch (error) {
+						console.warn("[voice-pipeline] Malformed SSE data:", line, error);
 					}

As per coding guidelines: "Never silently swallow errors with catch(() => {}) or catch(e) { return null }"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
const data = JSON.parse(line.slice(6));
handleSSEEvent(eventType, data, setState);
} catch {
// Skip malformed data
}
eventType = "";
try {
const data = JSON.parse(line.slice(6));
handleSSEEvent(eventType, data, setState);
} catch (error) {
console.warn("[voice-pipeline] Malformed SSE data:", line, error);
}
eventType = "";
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`
around lines 116 - 122, The try/catch around JSON.parse is silently swallowing
parse errors; update the catch to log the malformed SSE data and the parse error
(e.g., using console.error) so debugging is possible, include the raw
line.slice(6) payload, the current eventType and the caught error, then proceed
to reset eventType as before; the change should be applied in the block
surrounding JSON.parse(...) that calls handleSSEEvent(eventType, data,
setState).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 29, 2026

🚀 Preview Deployment

🔗 Preview Links

Service Status Link
Neon Database (Neon) View Branch
Fly.io Electric (Fly.io) View App
Vercel API (Vercel) Failed to deploy
Vercel Web (Vercel) Open Preview
Vercel Marketing (Vercel) Open Preview
Vercel Admin (Vercel) Open Preview
Vercel Docs (Vercel) Open Preview

Preview updates automatically with new commits

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@apps/desktop/src/lib/trpc/routers/voice/index.ts`:
- Around line 12-19: The current getMicStatus blindly casts
systemPreferences.getMediaAccessStatus("microphone") to MicPermissionStatus and
ignores Electron's possible 'unknown' return; update getMicStatus to capture the
raw status from systemPreferences.getMediaAccessStatus("microphone"), check if
rawStatus === "unknown" and map it to a safe MicPermissionStatus (e.g.,
"denied") and optionally console.warn about the unexpected 'unknown' value,
otherwise return the cast/validated status; reference the getMicStatus function
and systemPreferences.getMediaAccessStatus to implement this check and mapping.

In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 110-125: handleVoiceToggle currently does nothing when
micPermission is undefined (loading); update it to handle that case by calling
requestMicPermission.mutate() when enabled === true and micPermission is
undefined so the toggle triggers a permission request, referencing
handleVoiceToggle, micPermission, and requestMicPermission.mutate;
alternatively, disable the switch in the UI while micPermission is loading to
prevent the noop behavior.

Comment on lines +12 to +19
function getMicStatus(): MicPermissionStatus {
if (process.platform !== "darwin") {
return "granted";
}
return systemPreferences.getMediaAccessStatus(
"microphone",
) as MicPermissionStatus;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle the 'unknown' status from Electron's getMediaAccessStatus.

systemPreferences.getMediaAccessStatus("microphone") can return 'unknown' in addition to the statuses in MicPermissionStatus. The type assertion silently ignores this possibility, which could lead to unexpected behavior if Electron returns 'unknown'.

Proposed fix to handle unknown status
 function getMicStatus(): MicPermissionStatus {
 	if (process.platform !== "darwin") {
 		return "granted";
 	}
-	return systemPreferences.getMediaAccessStatus(
+	const status = systemPreferences.getMediaAccessStatus(
 		"microphone",
-	) as MicPermissionStatus;
+	);
+	// Electron can return 'unknown' in edge cases; treat as not-determined
+	if (status === "unknown") {
+		return "not-determined";
+	}
+	return status as MicPermissionStatus;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function getMicStatus(): MicPermissionStatus {
if (process.platform !== "darwin") {
return "granted";
}
return systemPreferences.getMediaAccessStatus(
"microphone",
) as MicPermissionStatus;
}
function getMicStatus(): MicPermissionStatus {
if (process.platform !== "darwin") {
return "granted";
}
const status = systemPreferences.getMediaAccessStatus(
"microphone",
);
// Electron can return 'unknown' in edge cases; treat as not-determined
if (status === "unknown") {
return "not-determined";
}
return status as MicPermissionStatus;
}
🤖 Prompt for AI Agents
In `@apps/desktop/src/lib/trpc/routers/voice/index.ts` around lines 12 - 19, The
current getMicStatus blindly casts
systemPreferences.getMediaAccessStatus("microphone") to MicPermissionStatus and
ignores Electron's possible 'unknown' return; update getMicStatus to capture the
raw status from systemPreferences.getMediaAccessStatus("microphone"), check if
rawStatus === "unknown" and map it to a safe MicPermissionStatus (e.g.,
"denied") and optionally console.warn about the unexpected 'unknown' value,
otherwise return the cast/validated status; reference the getMicStatus function
and systemPreferences.getMediaAccessStatus to implement this check and mapping.

Comment on lines +110 to +125
const handleVoiceToggle = (enabled: boolean) => {
if (!enabled) {
setVoiceCommandsEnabled.mutate({ enabled: false });
return;
}

if (micPermission === "granted") {
setVoiceCommandsEnabled.mutate({ enabled: true });
return;
}

if (micPermission === "not-determined") {
requestMicPermission.mutate();
return;
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Toggle does nothing when permission status is still loading.

When enabled=true and micPermission is undefined (query still loading), the function falls through all conditions without executing any action. The user clicks the switch, nothing visible happens.

Proposed fix to handle the loading/undefined case
 const handleVoiceToggle = (enabled: boolean) => {
 	if (!enabled) {
 		setVoiceCommandsEnabled.mutate({ enabled: false });
 		return;
 	}

 	if (micPermission === "granted") {
 		setVoiceCommandsEnabled.mutate({ enabled: true });
 		return;
 	}

-	if (micPermission === "not-determined") {
+	// Treat undefined (loading) or not-determined as needing permission request
+	if (micPermission === "not-determined" || micPermission === undefined) {
 		requestMicPermission.mutate();
 		return;
 	}
+
+	// micDenied case: do nothing (switch is disabled anyway)
 };

Alternatively, disable the switch while micPermission is loading to prevent this edge case entirely.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleVoiceToggle = (enabled: boolean) => {
if (!enabled) {
setVoiceCommandsEnabled.mutate({ enabled: false });
return;
}
if (micPermission === "granted") {
setVoiceCommandsEnabled.mutate({ enabled: true });
return;
}
if (micPermission === "not-determined") {
requestMicPermission.mutate();
return;
}
};
const handleVoiceToggle = (enabled: boolean) => {
if (!enabled) {
setVoiceCommandsEnabled.mutate({ enabled: false });
return;
}
if (micPermission === "granted") {
setVoiceCommandsEnabled.mutate({ enabled: true });
return;
}
// Treat undefined (loading) or not-determined as needing permission request
if (micPermission === "not-determined" || micPermission === undefined) {
requestMicPermission.mutate();
return;
}
// micDenied case: do nothing (switch is disabled anyway)
};
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`
around lines 110 - 125, handleVoiceToggle currently does nothing when
micPermission is undefined (loading); update it to handle that case by calling
requestMicPermission.mutate() when enabled === true and micPermission is
undefined so the toggle triggers a permission request, referencing
handleVoiceToggle, micPermission, and requestMicPermission.mutate;
alternatively, disable the switch in the UI while micPermission is loading to
prevent the noop behavior.

@saddlepaddle saddlepaddle changed the title feat(desktop): add voice commands with wake word detection feat(desktop): add voice commands with wake word detection and mic permission UX Jan 30, 2026
…gs toggle

Adds a Python-based voice sidecar that listens for a wake word ("Hey Jarvis"),
captures speech, transcribes it, and routes commands back to the desktop app.
The feature is gated behind a "Voice Commands" toggle in Settings > Features
(default: off). The sidecar process auto-starts/stops based on subscriber count,
so it only runs when the setting is enabled.
Sets up Python 3.11, creates a venv with openwakeword/sounddevice/numpy,
and runs the PyInstaller build script so the voice-sidecar binary gets
bundled into the Electron app's extraResources.
The build script now auto-creates the venv and installs dependencies, so
CI doesn't need to know about Python internals. It runs as part of
prepackage — CI only needs setup-python to ensure python3 is on PATH.
Request mic access before starting the voice sidecar, show guidance
when permission is denied, and re-check on window focus so users can
grant access in System Settings and return to a working toggle.
The canary config was replacing the entire extendInfo object, dropping
NSMicrophoneUsageDescription and NSLocalNetworkUsageDescription from
the base config. Spread the base extendInfo so these plist keys are
preserved.
Hardened runtime blocks microphone access without the
com.apple.security.device.audio-input entitlement, causing
getMediaAccessStatus to return "denied" regardless of TCC state.
Add an explicit entitlements plist with the audio-input entitlement.
The workflow had Setup Python but never ran build-voice-sidecar.sh,
so the sidecar binary was packaged without the openwakeword model
data. Add the missing build step before electron-vite compilation.
--collect-data openwakeword alone was not including the model .onnx
files in the PyInstaller bundle. Add --add-data with the resolved
package path as a fallback, and verify the hey_jarvis model exists
in the output before finishing the build.

Also revert the redundant CI workflow step since the sidecar build
is already wired through prepackage in package.json.
PyInstaller's --collect-data silently misses openwakeword model files.
Fall back to manually copying the package data into _internal/ when
the model isn't found after the initial build.
rm the target first so cp -R creates a clean copy of the package
directory instead of nesting it inside an existing partial dir.
openwakeword >=0.6.0 no longer ships pre-trained models in the pip
package. Download hey_jarvis, melspectrogram, and embedding_model
from the v0.5.1 GitHub release during the sidecar build.
After rebase onto main, update voice API imports from @/lib/mcp/* to
@superset/mcp and @superset/mcp/auth. Remove duplicate ANTHROPIC_API_KEY
in env schema (already added by Slack integration).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@apps/desktop/src/main/lib/voice/python/main.py`:
- Around line 36-40: The emit() function writes JSON to stdout without
synchronization, which can interleave with writes from audio.py's sounddevice
callback; define a module-level threading.Lock named STDOUT_LOCK in main.py,
import threading, and wrap the body of emit(event: str, **kwargs: Any) with with
STDOUT_LOCK: before writing and flushing; export STDOUT_LOCK (so other modules
can import it) and update audio.py's _emit_error to import and use
main.STDOUT_LOCK to guard its sys.stdout.write + sys.stdout.flush calls.

In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts`:
- Around line 45-57: The fallback currently returns command "python3" which
fails on Windows; update the fallback in voice-process-paths (where scriptDir is
computed and the object with command, args, cwd is returned) to select the
command based on platform (use "python" when process.platform === 'win32',
otherwise "python3"), keep args as [join(scriptDir, "main.py")] and cwd as
scriptDir so the rest of the code (scriptDir, join, process.resourcesPath)
remains unchanged.

In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 65-104: The mutations setVoiceCommandsEnabled.useMutation and
requestMicPermission.useMutation currently swallow errors; update their onError
handlers to log the error and relevant context (error object, mutation vars like
{ enabled } or request inputs, and any previous state from context) so failures
are visible; specifically, add a logging call in setVoiceCommandsEnabled.onError
that logs _err, _vars and context.previous when rollback occurs, and add an
onError handler to requestMicPermission.useMutation that logs the error and
request parameters and then invalidates utils.voice.getMicPermission as
appropriate.
🧹 Nitpick comments (8)
packages/local-db/drizzle/meta/0016_snapshot.json (1)

401-408: Consider making voice_commands_enabled non-nullable to avoid tri-state behavior.

If this is a strict boolean feature flag, a NOT NULL column with DEFAULT false avoids null handling and keeps the semantics tighter. That change should be made in the migration/schema and then regenerate the snapshot rather than editing it directly.

apps/desktop/scripts/build-voice-sidecar.sh (1)

42-48: Consider verifying downloaded model integrity.

The models are downloaded over HTTPS which provides transport security, but there's no checksum verification. A compromised CDN or MITM (if TLS is bypassed) could serve malicious ONNX files. Consider adding SHA256 verification for the downloaded models.

🛡️ Suggested checksum verification
 OWW_BASE_URL="https://github.com/dscripka/openWakeWord/releases/download/v0.5.1"
+
+# Expected SHA256 checksums for model integrity verification
+declare -A MODEL_CHECKSUMS=(
+  ["hey_jarvis_v0.1.onnx"]="<sha256-hash>"
+  ["melspectrogram.onnx"]="<sha256-hash>"
+  ["embedding_model.onnx"]="<sha256-hash>"
+)
+
 for model in hey_jarvis_v0.1.onnx melspectrogram.onnx embedding_model.onnx; do
   if [ ! -f "$OWW_MODELS_DIR/$model" ]; then
     echo "[voice-sidecar] Downloading model: $model"
     curl -sL "$OWW_BASE_URL/$model" -o "$OWW_MODELS_DIR/$model"
+    # Verify checksum
+    if [ -n "${MODEL_CHECKSUMS[$model]:-}" ]; then
+      echo "[voice-sidecar] Verifying checksum for $model..."
+      echo "${MODEL_CHECKSUMS[$model]}  $OWW_MODELS_DIR/$model" | shasum -a 256 -c - || {
+        echo "[voice-sidecar] ERROR: Checksum verification failed for $model"
+        rm -f "$OWW_MODELS_DIR/$model"
+        exit 1
+      }
+    fi
   fi
 done

You'll need to compute and populate the actual SHA256 hashes for the models from the v0.5.1 release.

apps/desktop/src/main/lib/voice/voice-process.ts (1)

103-110: Consider adding a debug log for the swallowed stdin write error.

The empty catch block is acceptable given the comment, but a debug-level log could aid troubleshooting without cluttering normal output.

♻️ Optional: Add debug logging
 	if (childProcess.stdin && !childProcess.stdin.destroyed) {
 		try {
 			childProcess.stdin.write(`${JSON.stringify({ cmd: "stop" })}\n`);
 		} catch {
-			// stdin may be closed already
+			// stdin may be closed already - this is expected during shutdown
+			console.debug("[voice-process] stdin write failed (likely already closed)");
 		}
 	}
apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (1)

22-35: Consider extracting the auto-dismiss delay to a named constant.

The 8000 millisecond delay is a magic number that could be extracted for clarity and easier adjustment.

♻️ Optional: Extract to named constant
+const AUTO_DISMISS_DELAY_MS = 8000;
+
 export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) {
   // ...
 
   // Auto-dismiss after done
   useEffect(() => {
     if (status === "done") {
       const timer = setTimeout(() => {
         toast.dismiss(toastId);
-      }, 8000);
+      }, AUTO_DISMISS_DELAY_MS);
       return () => clearTimeout(timer);
     }
   }, [status, toastId]);
apps/api/src/app/api/voice/route.ts (1)

5-22: Consider adding type guard for session extension.

The type assertion at lines 9-11 assumes the session has an activeOrganizationId property. A type guard or schema validation would be safer and more explicit.

♻️ Proposed improvement
 async function authenticate(request: Request): Promise<McpContext | null> {
 	// Try session auth
 	const session = await auth.api.getSession({ headers: request.headers });
-	if (session?.session) {
-		const extendedSession = session.session as {
-			activeOrganizationId?: string;
-		};
-		if (!extendedSession.activeOrganizationId) {
+	if (session?.session && session.user) {
+		const activeOrganizationId = (session.session as Record<string, unknown>)
+			.activeOrganizationId;
+		if (typeof activeOrganizationId !== "string") {
 			return null;
 		}
 		return {
 			userId: session.user.id,
-			organizationId: extendedSession.activeOrganizationId,
+			organizationId: activeOrganizationId,
 		};
 	}
 
 	return null;
 }
apps/desktop/src/main/lib/voice/python/audio.py (1)

18-27: Unused callback parameters are required by sounddevice API.

The frames and time_info parameters flagged by static analysis are required by the sounddevice callback signature. Consider prefixing with underscore to indicate intentional non-use.

♻️ Naming convention fix
     def _callback(
         self,
         indata: np.ndarray,
-        frames: int,
-        time_info: object,
+        _frames: int,
+        _time_info: object,
         status: sd.CallbackFlags,
     ) -> None:
apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts (1)

151-214: Add defensive checks for type assertions in SSE event handler.

The handler uses multiple as string assertions without validation. If the server sends malformed data, this could cause subtle bugs or runtime issues.

♻️ Proposed defensive handling
 function handleSSEEvent(
 	event: string,
 	data: Record<string, unknown>,
 	setState: React.Dispatch<React.SetStateAction<VoicePipelineState>>,
 ) {
 	switch (event) {
 		case "transcription":
+			if (typeof data.text !== "string") return;
 			setState((prev) => ({
 				...prev,
 				status: "processing",
-				transcription: data.text as string,
+				transcription: data.text,
 			}));
 			break;

 		case "tool_use":
+			if (typeof data.toolName !== "string") return;
 			setState((prev) => ({
 				...prev,
 				status: "processing",
 				toolCalls: [
 					...prev.toolCalls,
 					{
-						toolName: data.toolName as string,
+						toolName: data.toolName,
 						toolInput: data.toolInput,
 					},
 				],
 			}));
 			break;
apps/desktop/src/main/lib/voice/python/speech_capture.py (1)

65-69: Consider dtype consistency in get_audio.

The method returns an empty array with dtype=np.int16, but when concatenating buffers, the dtype comes from the input chunks. This could lead to inconsistent dtypes if chunks have different dtypes.

♻️ Ensure consistent dtype
     def get_audio(self) -> np.ndarray:
         """Return all captured audio as a single array."""
         if not self._buffers:
             return np.array([], dtype=np.int16)
-        return np.concatenate(self._buffers).flatten()
+        return np.concatenate(self._buffers).flatten().astype(np.int16)

Comment on lines +36 to +40
def emit(event: str, **kwargs: Any) -> None:
"""Write a JSON event to stdout."""
msg = {"event": event, **kwargs}
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add stdout lock to prevent interleaved JSON.

This emit() function writes to stdout without synchronization. Combined with _emit_error in audio.py running on the sounddevice callback thread, JSON lines can interleave and corrupt the IPC protocol.

🔒 Proposed fix with shared lock
+import threading
+
+STDOUT_LOCK = threading.Lock()
+
 def emit(event: str, **kwargs: Any) -> None:
     """Write a JSON event to stdout."""
     msg = {"event": event, **kwargs}
-    sys.stdout.write(json.dumps(msg) + "\n")
-    sys.stdout.flush()
+    with STDOUT_LOCK:
+        sys.stdout.write(json.dumps(msg) + "\n")
+        sys.stdout.flush()

Then update audio.py to import and use the same lock:

from main import STDOUT_LOCK

def _emit_error(message: str) -> None:
    import json
    import sys
    with STDOUT_LOCK:
        sys.stdout.write(json.dumps({"event": "error", "message": message}) + "\n")
        sys.stdout.flush()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def emit(event: str, **kwargs: Any) -> None:
"""Write a JSON event to stdout."""
msg = {"event": event, **kwargs}
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
import threading
STDOUT_LOCK = threading.Lock()
def emit(event: str, **kwargs: Any) -> None:
"""Write a JSON event to stdout."""
msg = {"event": event, **kwargs}
with STDOUT_LOCK:
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/python/main.py` around lines 36 - 40, The
emit() function writes JSON to stdout without synchronization, which can
interleave with writes from audio.py's sounddevice callback; define a
module-level threading.Lock named STDOUT_LOCK in main.py, import threading, and
wrap the body of emit(event: str, **kwargs: Any) with with STDOUT_LOCK: before
writing and flushing; export STDOUT_LOCK (so other modules can import it) and
update audio.py's _emit_error to import and use main.STDOUT_LOCK to guard its
sys.stdout.write + sys.stdout.flush calls.

Comment on lines +45 to +57
// Fallback: try system python3 with unpacked script
console.warn(
"[voice-paths] PyInstaller binary not found, falling back to system python3",
);
const scriptDir = join(
process.resourcesPath,
"app.asar.unpacked/src/main/lib/voice/python",
);
return {
command: "python3",
args: [join(scriptDir, "main.py")],
cwd: scriptDir,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Windows compatibility issue in packaged fallback.

The fallback path uses "python3" as the command, which doesn't exist on Windows where the command is typically "python".

🛡️ Proposed fix for cross-platform fallback
 	console.warn(
 		"[voice-paths] PyInstaller binary not found, falling back to system python3",
 	);
 	const scriptDir = join(
 		process.resourcesPath,
 		"app.asar.unpacked/src/main/lib/voice/python",
 	);
 	return {
-		command: "python3",
+		command: process.platform === "win32" ? "python" : "python3",
 		args: [join(scriptDir, "main.py")],
 		cwd: scriptDir,
 	};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Fallback: try system python3 with unpacked script
console.warn(
"[voice-paths] PyInstaller binary not found, falling back to system python3",
);
const scriptDir = join(
process.resourcesPath,
"app.asar.unpacked/src/main/lib/voice/python",
);
return {
command: "python3",
args: [join(scriptDir, "main.py")],
cwd: scriptDir,
};
// Fallback: try system python3 with unpacked script
console.warn(
"[voice-paths] PyInstaller binary not found, falling back to system python3",
);
const scriptDir = join(
process.resourcesPath,
"app.asar.unpacked/src/main/lib/voice/python",
);
return {
command: process.platform === "win32" ? "python" : "python3",
args: [join(scriptDir, "main.py")],
cwd: scriptDir,
};
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 45 - 57,
The fallback currently returns command "python3" which fails on Windows; update
the fallback in voice-process-paths (where scriptDir is computed and the object
with command, args, cwd is returned) to select the command based on platform
(use "python" when process.platform === 'win32', otherwise "python3"), keep args
as [join(scriptDir, "main.py")] and cwd as scriptDir so the rest of the code
(scriptDir, join, process.resourcesPath) remains unchanged.

Comment on lines +65 to +104
const { data: voiceCommandsEnabled, isLoading: isVoiceLoading } =
electronTrpc.settings.getVoiceCommandsEnabled.useQuery();
const setVoiceCommandsEnabled =
electronTrpc.settings.setVoiceCommandsEnabled.useMutation({
onMutate: async ({ enabled }) => {
await utils.settings.getVoiceCommandsEnabled.cancel();
const previous = utils.settings.getVoiceCommandsEnabled.getData();
utils.settings.getVoiceCommandsEnabled.setData(undefined, enabled);
return { previous };
},
onError: (_err, _vars, context) => {
if (context?.previous !== undefined) {
utils.settings.getVoiceCommandsEnabled.setData(
undefined,
context.previous,
);
}
},
onSettled: () => {
utils.settings.getVoiceCommandsEnabled.invalidate();
},
});

const { data: micPermission } = electronTrpc.voice.getMicPermission.useQuery(
undefined,
{
refetchOnWindowFocus: true,
},
);

const requestMicPermission =
electronTrpc.voice.requestMicPermission.useMutation({
onSuccess: ({ granted }) => {
utils.voice.getMicPermission.invalidate();
if (granted) {
setVoiceCommandsEnabled.mutate({ enabled: true });
}
},
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error logging for the new mutations.
Right now failures are silent; please log with context.

Proposed fix
 const setVoiceCommandsEnabled =
 	electronTrpc.settings.setVoiceCommandsEnabled.useMutation({
 		onMutate: async ({ enabled }) => {
 			await utils.settings.getVoiceCommandsEnabled.cancel();
 			const previous = utils.settings.getVoiceCommandsEnabled.getData();
 			utils.settings.getVoiceCommandsEnabled.setData(undefined, enabled);
 			return { previous };
 		},
-		onError: (_err, _vars, context) => {
+		onError: (err, _vars, context) => {
+			console.error(
+				"[settings/voice-commands] Failed to update setting:",
+				err,
+			);
 			if (context?.previous !== undefined) {
 				utils.settings.getVoiceCommandsEnabled.setData(
 					undefined,
 					context.previous,
 				);
 			}
 		},
 		onSettled: () => {
 			utils.settings.getVoiceCommandsEnabled.invalidate();
 		},
 	});

 const requestMicPermission =
 	electronTrpc.voice.requestMicPermission.useMutation({
 		onSuccess: ({ granted }) => {
 			utils.voice.getMicPermission.invalidate();
 			if (granted) {
 				setVoiceCommandsEnabled.mutate({ enabled: true });
 			}
 		},
+		onError: (err) => {
+			console.error(
+				"[settings/mic-permission] Failed to request permission:",
+				err,
+			);
+		},
 	});

As per coding guidelines, Never swallow errors silently; at minimum log them with context.

🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`
around lines 65 - 104, The mutations setVoiceCommandsEnabled.useMutation and
requestMicPermission.useMutation currently swallow errors; update their onError
handlers to log the error and relevant context (error object, mutation vars like
{ enabled } or request inputs, and any previous state from context) so failures
are visible; specifically, add a logging call in setVoiceCommandsEnabled.onError
that logs _err, _vars and context.previous when rollback occurs, and add an
onError handler to requestMicPermission.useMutation that logs the error and
request parameters and then invalidates utils.voice.getMicPermission as
appropriate.

Cast ZodType through unknown for internal _zod access, and cast
Uint8Array to BlobPart for Blob constructor compatibility.
- Capture process reference locally in stopVoiceProcess and clear
  childProcess immediately, so startVoiceProcess can proceed if called
  while the old process is still shutting down
- Guard exit/error handlers with identity check (childProcess === proc)
  to avoid cleaning up a newer process when a stale one exits
- Emit idle event on cleanup so subscribers know process stopped
- Fix import ordering (biome auto-fix)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@apps/api/src/app/api/voice/voice-service.ts`:
- Around line 52-60: Wrap the call to transcribeAudio in a try/catch inside the
function so any thrown errors are caught instead of crashing the pipeline; on
catch, call sse.write("error", { message: err.message || String(err), step:
"transcription" }) (or similar structured payload) and then write a terminal
event (e.g., sse.write("done", { fullResponse: "" }) or return) to stop further
processing; update the block around transcribeAudio and subsequent
sse.write("transcription", ...) to only run on success.
- Around line 77-93: The Claude streaming call and async iteration
(anthropic.messages.stream(...) and the subsequent for await (const event of
stream) loop) lack error handling; wrap the stream creation and its for-await
loop in a try/catch (inside the MAX_TOOL_ROUNDS loop) to catch API/network
errors, log the error, send an SSE error event via sse.write (e.g.,
sse.write("error", { message: error.message, code: ... })), ensure any necessary
cleanup/stream.close if available, and break/return from the outer loop so the
pipeline stops gracefully while preserving the accumulated fullResponse.
🧹 Nitpick comments (1)
apps/api/src/app/api/voice/tool-adapter.ts (1)

80-81: Avoid as never type assertion — it defeats type safety.

Using as never bypasses all type checking and could hide mismatches between interceptServer and the expected MCP server interface. Consider typing interceptServer to match the expected interface or using a more specific assertion.

Suggested approach
-	registerTools(interceptServer as never);
+	// If registerTools expects a specific type, define interceptServer accordingly
+	// or use a targeted type assertion that preserves some checking:
+	registerTools(interceptServer as Parameters<typeof registerTools>[0]);

Comment on lines +52 to +60
}): Promise<void> {
// 1. Transcribe
const transcription = await transcribeAudio(audioBuffer);
sse.write("transcription", { text: transcription });

if (!transcription) {
sse.write("done", { fullResponse: "" });
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap transcription call in try/catch to prevent pipeline crash.

If transcribeAudio throws (network error, invalid audio, rate limits), the pipeline crashes without emitting an error event. Consider wrapping the transcription step to emit an SSE error event on failure.

Proposed fix
 ): Promise<void> {
 	// 1. Transcribe
+	let transcription: string;
+	try {
-	const transcription = await transcribeAudio(audioBuffer);
+		transcription = await transcribeAudio(audioBuffer);
+	} catch (error) {
+		console.error("[voice/pipeline] Transcription failed:", error);
+		sse.write("error", {
+			message: error instanceof Error ? error.message : "Transcription failed",
+		});
+		sse.write("done", { fullResponse: "" });
+		return;
+	}
 	sse.write("transcription", { text: transcription });
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/voice-service.ts` around lines 52 - 60, Wrap the
call to transcribeAudio in a try/catch inside the function so any thrown errors
are caught instead of crashing the pipeline; on catch, call sse.write("error", {
message: err.message || String(err), step: "transcription" }) (or similar
structured payload) and then write a terminal event (e.g., sse.write("done", {
fullResponse: "" }) or return) to stop further processing; update the block
around transcribeAudio and subsequent sse.write("transcription", ...) to only
run on success.

Comment thread apps/api/src/app/api/voice/voice-service.ts Outdated
Delete the 257-line tool-adapter.ts that hand-rolled Zod-to-JSON-Schema
conversion and MCP tool interception. Replace with createInMemoryMcpClient
from @superset/mcp/in-memory, using client.listTools() and
client.callTool() directly — the same pattern as the Slack agent.
- Move VoiceSidecarEvent and PythonVoiceEvent into voice-process.ts
- Inline MicPermissionStatus in voice router
- Delete shared/voice.ts (renderer gets types via tRPC inference)
- Remove unused getVoiceProcessStatus(), isRunning, start/stop mutations
- Remove unused PythonVoiceCommand type
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`:
- Line 39: The linter warns that handleSSEEvent is captured by the useCallback
for processAudio but not listed in its dependency array; to fix, move the
handleSSEEvent function body inside the processAudio useCallback (so it closes
over stable setState directly) and remove the outer handleSSEEvent declaration,
or alternatively add handleSSEEvent to processAudio's dependency array if you
prefer keeping it separate; update references so processAudio uses the inlined
handleSSEEvent logic and ensure only stable setters (setState) are used inside.
🧹 Nitpick comments (3)
apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts (1)

146-151: Consider validating SSE event data before type assertions.

Multiple as string casts assume the server always sends correctly typed data. If the server sends null, undefined, or a non-string value, these assertions could lead to subtle bugs (e.g., "null" being appended to responseText).

A lightweight guard or optional chaining would make this more defensive:

Example defensive approach
 			case "transcription":
 				setState((prev) => ({
 					...prev,
 					status: "processing",
-					transcription: data.text as string,
+					transcription: typeof data.text === "string" ? data.text : null,
 				}));
 				break;

Also applies to: 153-161, 163-171, 173-178, 183-188

apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (2)

28-36: Extract the auto-dismiss timeout to a named constant.

The 8000 milliseconds magic number should be extracted to a module-level constant for clarity and easier maintenance.

Proposed fix
+const AUTO_DISMISS_DELAY_MS = 8000;
+
 export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) {
 	useEffect(() => {
 		if (status === "done" || status === "error") {
 			const timer = setTimeout(() => {
 				toast.dismiss(toastId);
-			}, 8000);
+			}, AUTO_DISMISS_DELAY_MS);
 			return () => clearTimeout(timer);
 		}
 	}, [status, toastId]);

38-40: Consider inlining the abort call.

The handleStop function is a trivial wrapper around abort(). You could inline it in the onClick handler for brevity, though this is a minor stylistic preference.

Optional simplification
-	const handleStop = () => {
-		abort();
-	};
-
 	// ...
 
-					<Button variant="ghost" size="sm" onClick={handleStop}>
+					<Button variant="ghost" size="sm" onClick={abort}>

const [state, setState] = useState<VoicePipelineState>(INITIAL_STATE);
const abortRef = useRef<AbortController | null>(null);

const processAudio = useCallback(async (audioB64: string) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add handleSSEEvent to the dependency array or move it inside useCallback.

The static analysis tool flags that handleSSEEvent is used inside processAudio but not listed in its dependency array. Since handleSSEEvent is defined outside the useCallback, this creates a stale closure risk where the function reference could become outdated.

The simplest fix is to move handleSSEEvent inside the useCallback, as it only depends on setState which is stable.

Proposed fix: move handleSSEEvent inside useCallback
 	const processAudio = useCallback(async (audioB64: string) => {
 		abortRef.current?.abort();
 		setState({ ...INITIAL_STATE, status: "transcribing" });
+
+		function handleSSEEvent(event: string, data: Record<string, unknown>) {
+			switch (event) {
+				case "transcription":
+					setState((prev) => ({
+						...prev,
+						status: "processing",
+						transcription: data.text as string,
+					}));
+					break;
+				// ... rest of cases
+			}
+		}
 
 		const binaryStr = atob(audioB64);

Then remove the outer handleSSEEvent function at lines 144-191.

Also applies to: 144-191

🧰 Tools
🪛 Biome (2.3.13)

[error] 39-39: This hook does not specify its dependency on handleSSEEvent.

This dependency is being used here, but is not specified in the hook dependency list.

React relies on hook dependencies to determine when to re-compute Effects.
Failing to specify dependencies can result in Effects not updating correctly when state changes.
These "stale closures" are a common source of surprising bugs.
Either include it or remove the dependency array.
Unsafe fix: Add the missing dependency to the list.

(lint/correctness/useExhaustiveDependencies)

🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`
at line 39, The linter warns that handleSSEEvent is captured by the useCallback
for processAudio but not listed in its dependency array; to fix, move the
handleSSEEvent function body inside the processAudio useCallback (so it closes
over stable setState directly) and remove the outer handleSSEEvent declaration,
or alternatively add handleSSEEvent to processAudio's dependency array if you
prefer keeping it separate; update references so processAudio uses the inlined
handleSSEEvent logic and ensure only stable setters (setState) are used inside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant