feat(desktop): add voice commands with wake word detection and mic permission UX#1055
feat(desktop): add voice commands with wake word detection and mic permission UX#1055saddlepaddle wants to merge 18 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces a complete voice command system for the desktop application. It integrates OpenAI Whisper transcription and Claude AI responses on the backend, implements a Python-based voice sidecar for wake word detection and audio capture, adds TypeScript process management and TRPC routers, creates React UI components for voice interaction feedback, and extends the database schema with voice command settings. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant VoiceListener as VoiceListener<br/>(React)
participant Sidecar as Voice Sidecar<br/>(Python)
participant ProcessMgr as Voice Process<br/>(Node.js)
participant API as Backend API<br/>(Node.js)
participant OpenAI as OpenAI
participant Claude as Anthropic Claude
participant ResponsePanel as ResponsePanel<br/>(React)
User->>VoiceListener: Component mounts<br/>(voice enabled + mic permission)
activate VoiceListener
VoiceListener->>ProcessMgr: Subscribe to voice events
ProcessMgr->>Sidecar: Start voice sidecar
activate Sidecar
Sidecar->>Sidecar: Load wake word model
Sidecar->>ProcessMgr: emit(ready)
ProcessMgr->>VoiceListener: idle event
User->>User: Say "Hey Jarvis"
Sidecar->>Sidecar: Detect wake word
Sidecar->>Sidecar: Capture audio + pre-buffer
Sidecar->>ProcessMgr: emit(recording)
ProcessMgr->>VoiceListener: recording event
VoiceListener->>ResponsePanel: Show RecordingIndicator
User->>User: Speak command
Sidecar->>Sidecar: End speech capture<br/>(silence detected)
Sidecar->>ProcessMgr: emit(audio_captured)<br/>audioB64, durationS
ProcessMgr->>VoiceListener: audio_captured event
VoiceListener->>ResponsePanel: Pass audio to hook
activate ResponsePanel
ResponsePanel->>API: POST /api/voice<br/>multipart form-data
activate API
API->>OpenAI: Transcribe audio<br/>(Whisper)
OpenAI->>API: Transcription text
API->>API: emit(transcription)
API->>Claude: Process with Claude<br/>+ tool definitions
Claude->>API: Tool calls or response
loop Tool execution (max 5 rounds)
API->>API: Execute tool
API->>Claude: Feed tool results
Claude->>API: Next response
end
API->>API: emit(text_delta, done)
API->>ResponsePanel: SSE stream
deactivate API
ResponsePanel->>ResponsePanel: Parse SSE events<br/>update state
ResponsePanel->>ResponsePanel: Render streaming response
deactivate ResponsePanel
User->>ResponsePanel: Click Stop or<br/>auto-dismiss after done
ResponsePanel->>API: Abort request
deactivate VoiceListener
Sidecar->>ProcessMgr: Stop listening
deactivate Sidecar
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🤖 Fix all issues with AI agents
In `@apps/api/package.json`:
- Line 14: The package.json dependency for "@anthropic-ai/sdk" references a
non-existent version "^0.71.2"; update the version string for the
"@anthropic-ai/sdk" entry to a valid published release (e.g., "^0.61.0" or
"0.61.0") in apps/api/package.json, then run your package manager to refresh the
lockfile (npm install or yarn install) so lock files are updated accordingly;
ensure no other references to the invalid version remain.
In `@apps/api/src/app/api/voice/route.ts`:
- Around line 33-40: The catch block that handles request.formData() failures
swallows the thrown error; update it to capture the caught error (e.g., catch
(err) or catch (error)), log the error using the existing logger or console
(reference the formData handling around request.formData() and the
Response.json({ error: ... }, { status: 400 }) return) and then return the same
400 response—ensure the log includes the error object/message for debugging
while keeping the response body unchanged.
In `@apps/api/src/app/api/voice/tool-adapter.ts`:
- Around line 51-60: The catch is silently swallowing zodToJsonSchema failures;
update the catch in the loop over inputSchema to log the error before falling
back — use a prefixed console log like "[voice/tool-adapter] Failed to convert
schema for key <key>:" and include the caught error object, then continue to
assign properties[key] = { type: "string" } as the fallback; keep existing
checks around isOptional and use the same variables (inputSchema, properties,
required, zodToJsonSchema).
- Around line 90-170: The custom zodToJsonSchema() function is using Zod
internals (_zod?.def) which is fragile; remove zodToJsonSchema and replace its
usages with Zod's supported API z.toJSONSchema(schema, options?) (e.g., import z
from 'zod' and call z.toJSONSchema(yourSchema) where zodToJsonSchema was used),
delete any helper unwrapZod usage tied only to this converter, and adjust any
code that expected the old output shape to match the native z.toJSONSchema
output (update property names/options as needed).
In `@apps/api/src/app/api/voice/voice-service.ts`:
- Around line 23-38: Wrap the transcribeAudio function body (including the
OpenAI client creation and the openai.audio.transcriptions.create call) in a
try/catch; on error, log contextual information (e.g., the error object and any
relevant metadata) and either emit an SSE error event to the caller or rethrow a
wrapped error with a clear message so callers can handle it; ensure the catch
references transcribeAudio and openai.audio.transcriptions.create so the change
is easy to find.
- Around line 108-134: Wrap each call to executeTool inside a try/catch so a
thrown error doesn't crash the loop: for each toolBlock in toolUseBlocks, call
executeTool inside try, on success write the existing sse.write("tool_result", {
toolName, result }) and push the same toolResults entry; in catch, capture the
error, call sse.write("tool_result", { toolName: toolBlock.name, result: {
error: String(error), message: (error as Error)?.message } }) and push a
toolResults entry with type "tool_result", tool_use_id: toolBlock.id and content
containing the error details (so downstream code can see failures) then continue
to the next toolBlock.
In `@apps/api/src/env.ts`:
- Around line 39-40: Make OPENAI_API_KEY and ANTHROPIC_API_KEY optional in the
env schema (change z.string().min(1) to .optional()) so missing keys don't
prevent startup; then update the voice-related code paths (the voice
service/initialization where these keys are consumed) to explicitly check for
the presence of OPENAI_API_KEY and ANTHROPIC_API_KEY before attempting to use
them and return a clear, actionable error when the voice feature is enabled but
the required key(s) are missing.
In `@apps/desktop/electron-builder.ts`:
- Around line 59-64: The extraResources entry points to
dist/voice-sidecar/voice-sidecar but the build script build-voice-sidecar.sh is
never run; to fix, invoke the script before packaging by adding "bun run
scripts/build-voice-sidecar.sh" to the prepackage hook in package.json
(prepackage) or insert a step that runs the script in the build-desktop GitHub
Actions job prior to the packaging step, and update RELEASE.md local testing
instructions to document running build-voice-sidecar.sh so the voice-sidecar
binary exists when electron-builder packages the app.
In `@apps/desktop/src/lib/trpc/routers/voice/index.ts`:
- Around line 44-52: The start/stop mutations call
startVoiceProcess()/stopVoiceProcess() directly and thus bypass subscriberCount
tracking causing desyncs; fix by either removing these manual mutations (if not
needed) or changing their implementations to update subscriberCount consistently
(e.g., start should increment subscriberCount and call startVoiceProcess only
when moving from 0→1, stop should decrement and call stopVoiceProcess only when
moving to 0) and add brief comments documenting intended debugging/manual use;
touch the start and stop publicProcedure.mutation handlers and ensure they
reference subscriberCount, startVoiceProcess, and stopVoiceProcess so lifecycle
remains consistent.
In `@apps/desktop/src/main/lib/voice/python/.gitignore`:
- Around line 1-3: Add the ".env" pattern to the .gitignore shown in this change
so local environment files containing OPENAI_API_KEY and ANTHROPIC_API_KEY (and
other secrets) are not committed; update the existing .gitignore entries (which
currently list .venv/, __pycache__/, *.pyc) to also include .env, and commit
that change to prevent accidental inclusion of API keys.
In `@apps/desktop/src/main/lib/voice/python/audio.py`:
- Around line 54-60: The JSON writes can race between _emit_error and emit, so
add a shared threading.Lock named STDOUT_LOCK in this module and wrap the
sys.stdout.write/flush calls in _emit_error with STDOUT_LOCK.acquire()/release()
(or a with STDOUT_LOCK: context) to serialize writes; then update the other
writer (the emit function in main.py) to import and use the same STDOUT_LOCK
from this module so all stdout JSON lines are emitted under the same lock.
In `@apps/desktop/src/main/lib/voice/python/main.py`:
- Around line 54-63: The stdin_reader function currently handles a
{"cmd":"stop"} message but ignores the documented {"cmd":"start"}; update
stdin_reader to explicitly handle "start" as a no-op (do nothing but optionally
acknowledge) and add a fallback that warns/logs when an unknown cmd is received
so the contract matches the docstring; locate the stdin_reader function and
augment the cmd handling branch (alongside the existing stop_event.set() branch)
to treat "start" as a no-op and call the existing logging facility (or print)
for unknown commands.
- Around line 113-134: After the capture loop returns, short‑circuit if shutdown
was requested: check stop_event.is_set() before converting/serializing and
emitting audio so we don't emit partial captures during shutdown. In other
words, after obtaining speech_audio (and/or before calling to_wav_b64 and emit
in main.py), if stop_event.is_set() then reset detector and skip the
to_wav_b64/emit sequence (or return/continue the outer processing loop) so no
audio_captured event is sent when capture was aborted; reference the stop_event,
capturer.get_audio(), speech_audio, to_wav_b64, emit, and detector.reset in your
change.
In `@apps/desktop/src/main/lib/voice/python/speech_capture.py`:
- Around line 72-76: The duration_s method overestimates time for multi-channel
audio because it divides total_samples by self._config.sample_rate only; modify
duration_s to divide total_samples by (self._config.sample_rate *
self._config.channels) so frames across channels are accounted for (use
self._config.channels alongside sample_rate when computing duration from
self._buffers and self._buffers' .size values).
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts`:
- Around line 60-72: getDevConfig currently assumes a Unix venv path
(.venv/bin/python3) causing Windows to always fallback; update getDevConfig to
detect the platform (process.platform === "win32") or attempt both
platform-specific venv paths and choose the first that exists. Construct
platform paths from scriptDir (e.g., join(scriptDir, ".venv", "Scripts",
"python.exe") for Windows and join(scriptDir, ".venv", "bin", "python3") for
others), check existsSync on the candidate path(s), and return the matching path
as command (with args ["main.py"], cwd: scriptDir); keep the fallback to
"python3" if no venv python is found.
- Around line 74-86: getPreviewConfig currently builds venvPython with a
POSIX-only path (".venv/bin/python3") and uses "python3" as the fallback, which
breaks on Windows; change getPreviewConfig to construct the venv python path
using path.join with a platform-specific segment (use ".venv/Scripts/python.exe"
on win32 and ".venv/bin/python3" otherwise) and set the fallback command to
"python" on Windows and "python3" on other platforms, keeping the same
scriptDir, previewDir, srcDir, args (["main.py"]) and cwd logic.
In
`@apps/desktop/src/renderer/components/Voice/components/RecordingIndicator/RecordingIndicator.tsx`:
- Around line 17-23: The button text "Cancel" is misleading because the onClick
only calls toast.dismiss(toastId) and does not stop recording; either wire it to
the actual cancel flow or rename it to match behavior—update the onClick to call
the real cancel handler (e.g., cancelRecording() or a passed prop like
onCancelRecording()) in addition to toast.dismiss(toastId), or simply change the
label from "Cancel" to "Dismiss" in the RecordingIndicator component so the UI
matches the existing toast.dismiss(toastId) behavior.
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`:
- Around line 116-122: The try/catch around JSON.parse is silently swallowing
parse errors; update the catch to log the malformed SSE data and the parse error
(e.g., using console.error) so debugging is possible, include the raw
line.slice(6) payload, the current eventType and the caught error, then proceed
to reset eventType as before; the change should be applied in the block
surrounding JSON.parse(...) that calls handleSSEEvent(eventType, data,
setState).
🧹 Nitpick comments (6)
apps/desktop/src/main/lib/voice/python/.gitignore (1)
1-3: Consider adding common Python build and test artifacts.Since the PR mentions build/packaging flow for the voice sidecar and a pyproject.toml, consider adding patterns for build artifacts and other common Python files:
dist/andbuild/- distribution and build directories*.egg-info/- Python package metadata*.pyo- optimized bytecode files.pytest_cache/- pytest cache (if using pytest)- Alternative venv names like
venv/andenv/📦 Proposed additions for build artifacts
.venv/ +venv/ +env/ __pycache__/ *.pyc +*.pyo +dist/ +build/ +*.egg-info/ +.pytest_cache/apps/desktop/src/main/lib/voice/python/pyproject.toml (1)
7-11: Consider adding a lockfile for reproducible builds.The dependency ranges are reasonable, but since this is built into a standalone binary via PyInstaller, consider adding a
requirements.txtor using a lockfile mechanism (e.g.,pip freeze > requirements.txtafter testing) to ensure reproducible builds across different machines and CI runs.apps/api/src/app/api/voice/tool-adapter.ts (1)
5-11: Prefer a params object for ToolHandler signature.
This avoids positional arguments and aligns with the project’s function-parameter guideline.Proposed refactor
-type ToolHandler = ( - params: Record<string, unknown>, - ctx: McpContext, -) => Promise<{ +type ToolHandler = (args: { + params: Record<string, unknown>; + ctx: McpContext; +}) => Promise<{ content: Array<{ type: "text"; text: string }>; isError?: boolean; }>; @@ - handler: async (params, ctx) => { - return handler(params, { + handler: async ({ params, ctx }) => { + return handler(params, { authInfo: { extra: { mcpContext: ctx } }, }); }, @@ - const result = await tool.handler(toolInput, ctx); + const result = await tool.handler({ params: toolInput, ctx });As per coding guidelines “Functions with 2+ parameters should accept a single params object with named properties instead of positional arguments”.
Also applies to: 71-74, 236-238
apps/desktop/src/lib/trpc/routers/voice/index.ts (1)
11-12: Subscriber count state is scoped to the factory function instance.The
subscriberCountvariable lives in the closure created bycreateVoiceRouter(). Since this is called once at app startup, this works correctly. However, if the router were ever recreated (e.g., during hot reload in development), the count would reset while the voice process might still be running.Consider adding a comment documenting this assumption, or alternatively, moving the state to the
voice-process.tsmodule where the process lifecycle is managed.apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (1)
27-35: Extract the auto-dismiss timeout to a named constant.The 8000ms value is a magic number. Extracting it to a constant improves readability and makes future adjustments easier.
Proposed fix
+const AUTO_DISMISS_DELAY_MS = 8000; + export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) { // ... existing code ... // Auto-dismiss after done useEffect(() => { if (status === "done") { const timer = setTimeout(() => { toast.dismiss(toastId); - }, 8000); + }, AUTO_DISMISS_DELAY_MS); return () => clearTimeout(timer); } }, [status, toastId]);apps/desktop/src/main/lib/voice/voice-process.ts (1)
69-71: Log JSON parse errors instead of silently ignoring.While non-JSON output from the Python process is expected during startup, silently swallowing all parse errors makes debugging difficult. The warning is already being logged, but the catch block should be consistent.
♻️ Proposed minor improvement
try { const raw = JSON.parse(line) as PythonVoiceEvent; const event = parsePythonEvent(raw); if (event) { lastEvent = event; voiceProcessEmitter.emit("voice-event", event); } - } catch { + } catch (error) { console.warn("[voice-process] Non-JSON stdout:", line); }
| try { | ||
| formData = await request.formData(); | ||
| } catch { | ||
| return Response.json( | ||
| { error: "Expected multipart form data with audio file" }, | ||
| { status: 400 }, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Log the error when form data parsing fails.
The catch block discards the error, making debugging difficult if unexpected parsing failures occur. As per coding guidelines, errors should be logged at minimum.
Proposed fix
try {
formData = await request.formData();
- } catch {
+ } catch (error) {
+ console.error("[voice/route] Form data parsing failed:", error);
return Response.json(
{ error: "Expected multipart form data with audio file" },
{ status: 400 },
);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| formData = await request.formData(); | |
| } catch { | |
| return Response.json( | |
| { error: "Expected multipart form data with audio file" }, | |
| { status: 400 }, | |
| ); | |
| } | |
| try { | |
| formData = await request.formData(); | |
| } catch (error) { | |
| console.error("[voice/route] Form data parsing failed:", error); | |
| return Response.json( | |
| { error: "Expected multipart form data with audio file" }, | |
| { status: 400 }, | |
| ); | |
| } |
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/route.ts` around lines 33 - 40, The catch block
that handles request.formData() failures swallows the thrown error; update it to
capture the caught error (e.g., catch (err) or catch (error)), log the error
using the existing logger or console (reference the formData handling around
request.formData() and the Response.json({ error: ... }, { status: 400 })
return) and then return the same 400 response—ensure the log includes the error
object/message for debugging while keeping the response body unchanged.
| for (const [key, schema] of Object.entries(inputSchema)) { | ||
| try { | ||
| properties[key] = zodToJsonSchema(schema); | ||
| if (!isOptional(schema)) { | ||
| required.push(key); | ||
| } | ||
| } catch { | ||
| // Fallback for schemas that can't be converted | ||
| properties[key] = { type: "string" }; | ||
| } |
There was a problem hiding this comment.
Log schema conversion failures instead of silent fallback.
The catch block hides conversion errors; add a prefixed log with context before defaulting.
Proposed fix
- } catch {
- // Fallback for schemas that can't be converted
- properties[key] = { type: "string" };
- }
+ } catch (error) {
+ console.warn(
+ `[voice/tool-adapter] Failed to convert schema for ${name}.${key}`,
+ error,
+ );
+ // Fallback for schemas that can't be converted
+ properties[key] = { type: "string" };
+ }As per coding guidelines “Never swallow errors silently; at minimum log them with context” and “Use prefixed console logging with pattern [domain/operation] message for all logging”.
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/tool-adapter.ts` around lines 51 - 60, The catch
is silently swallowing zodToJsonSchema failures; update the catch in the loop
over inputSchema to log the error before falling back — use a prefixed console
log like "[voice/tool-adapter] Failed to convert schema for key <key>:" and
include the caught error object, then continue to assign properties[key] = {
type: "string" } as the fallback; keep existing checks around isOptional and use
the same variables (inputSchema, properties, required, zodToJsonSchema).
| function zodToJsonSchema(schema: z.ZodType): Record<string, unknown> { | ||
| const def = ( | ||
| schema as unknown as { | ||
| _zod?: { def?: { type?: string; typeName?: string } }; | ||
| } | ||
| )._zod?.def; | ||
| const description = schema.description; | ||
|
|
||
| // Unwrap optional/default wrappers | ||
| const innerSchema = unwrapZod(schema); | ||
| const innerDef = (innerSchema as { _zod?: { def?: Record<string, unknown> } }) | ||
| ._zod?.def; | ||
| const typeName = (innerDef?.typeName ?? | ||
| def?.type ?? | ||
| def?.typeName ?? | ||
| "") as string; | ||
|
|
||
| const result: Record<string, unknown> = {}; | ||
|
|
||
| switch (typeName) { | ||
| case "ZodString": | ||
| case "string": | ||
| result.type = "string"; | ||
| break; | ||
| case "ZodNumber": | ||
| case "number": | ||
| result.type = "number"; | ||
| break; | ||
| case "ZodBoolean": | ||
| case "boolean": | ||
| result.type = "boolean"; | ||
| break; | ||
| case "ZodArray": | ||
| case "array": { | ||
| result.type = "array"; | ||
| const itemSchema = | ||
| (innerDef as Record<string, unknown>)?.innerType ?? | ||
| (innerDef as Record<string, unknown>)?.type; | ||
| if (itemSchema && itemSchema instanceof z.ZodType) { | ||
| result.items = zodToJsonSchema(itemSchema); | ||
| } | ||
| break; | ||
| } | ||
| case "ZodEnum": | ||
| case "enum": { | ||
| result.type = "string"; | ||
| const values = | ||
| (innerDef as Record<string, unknown>)?.entries ?? | ||
| (innerDef as Record<string, unknown>)?.values; | ||
| if (Array.isArray(values)) { | ||
| result.enum = values; | ||
| } else if (values && typeof values === "object") { | ||
| result.enum = Object.keys(values); | ||
| } | ||
| break; | ||
| } | ||
| case "ZodObject": | ||
| case "object": { | ||
| result.type = "object"; | ||
| const shape = (innerDef as Record<string, unknown>)?.shape; | ||
| if (shape && typeof shape === "object") { | ||
| const props: Record<string, unknown> = {}; | ||
| for (const [k, v] of Object.entries( | ||
| shape as Record<string, z.ZodType>, | ||
| )) { | ||
| props[k] = zodToJsonSchema(v); | ||
| } | ||
| result.properties = props; | ||
| } | ||
| break; | ||
| } | ||
| default: | ||
| result.type = "string"; | ||
| break; | ||
| } | ||
|
|
||
| if (description) { | ||
| result.description = description; | ||
| } | ||
|
|
||
| return result; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Zod 4 JSON Schema conversion recommended approach official API
💡 Result:
For Zod v4, the recommended/official approach is to use Zod’s native JSON Schema conversion API:
- Zod → JSON Schema:
z.toJSONSchema(schema, options?)(official) (zod.dev) - JSON Schema → Zod:
z.fromJSONSchema(jsonSchema)(experimental, not stable API) (zod.dev)
Example (Zod → JSON Schema):
import * as z from "zod";
const schema = z.object({
name: z.string(),
age: z.number(),
});
const jsonSchema = z.toJSONSchema(schema, {
target: "draft-2020-12", // default; also supports "draft-07", "draft-04", "openapi-3.0"
});Key official options include target, unrepresentable ("throw" or "any"), cycle handling (cycles), and $defs behavior for reused schemas (reused). (zod.dev)
Citations:
- 1: https://zod.dev/json-schema?utm_source=openai
- 2: https://zod.dev/json-schema?utm_source=openai
- 3: https://zod.dev/json-schema?utm_source=openai
🌐 Web query:
Zod _zod def private API stability across versions
💡 Result:
- In Zod v4, every core schema (
$ZodType) has a single property named_zod, which is explicitly described as the schema “internals” object; it contains things like_zod.def(a JSON-serializable definition) and_zod.run()(internal parser). Soschema._zod.defis an intended inspection surface for tooling that traverses schemas—but it’s still framed as internals, not the everyday public API. [2] - In Zod v3, schemas do not have
_zod; they use_definstead. The Zod docs even recommend using"_zod" in schemato distinguish v4 vs v3 at runtime. [1] - For stability across versions:
- Zod’s guidance for library/tool authors is to import from versioned “permalink” subpaths like
"zod/v3"and"zod/v4/core", and it states these subpaths “will remain available forever” (i.e., the import targets are stable). [1] - That said, because
_zodis an internals object, you should assume its shape/semantics can change across releases (especially across major versions). In practice, regressions involving_zodhave occurred in v4 patch releases (example: an issue filed against 4.0.17 reporting a runtime error reading'_zod'). [3]
- Zod’s guidance for library/tool authors is to import from versioned “permalink” subpaths like
Practical recommendation: if you need schema introspection, target zod/v4/core and primarily rely on schema._zod.def + documented discriminators like def.type; avoid depending on undocumented internal fields beyond what the Core docs describe, and pin/test Zod versions when you ship tooling. [1][2]
Sources: Zod “For library authors” docs [1], Zod Core docs [2], GitHub issue demonstrating breakage involving _zod [3].
🌐 Web query:
zod-to-json-schema library Zod compatibility maintained
💡 Result:
The original zod-to-json-schema package (StefanTerdell) is not maintaining ongoing Zod compatibility anymore: its README states that as of November 2025 it will no longer be receiving updates, recommending switching to Zod v4’s native JSON Schema support instead. [1] Zod v4 includes built-in z.toJSONSchema() / z.fromJSONSchema(), reducing the need for this library. [2]
Practically:
- The last published v3.x line is 3.24.6 (published June 25, 2025) per package tracking, so don’t expect fixes for newer Zod changes. [3]
- The README also notes Zod v4 can be used only as a peer dependency, but you still need to provide v3-style schemas (so it’s not “full” Zod v4 support). [1]
If you specifically need a maintained converter library (instead of Zod v4 native conversion), a commonly referenced maintained option is the fork @alcyone-labs/zod-to-json-schema, which is explicitly “upgraded to Zod V4” and claims full Zod v4 compatibility. [4]
Bottom line: official zod-to-json-schema is effectively frozen/deprecated, so for “compatibility maintained” you should prefer Zod v4 native z.toJSONSchema() or a maintained fork like @alcyone-labs/zod-to-json-schema. [1][2][4]
References:
[1] https://github.com/StefanTerdell/zod-to-json-schema
[2] https://zod.dev/json-schema
[3] https://www.stacktco.com/packages/zod-to-json-schema
[4] https://www.npmjs.com/package/%40alcyone-labs%2Fzod-to-json-schema
Replace custom zodToJsonSchema() with Zod's native z.toJSONSchema() API.
The code accesses _zod?.def, which is Zod's internal inspection surface and not stable across versions (v3 vs v4 use different structures, and breaking changes have occurred in patch releases). Zod v4 provides an official, maintained z.toJSONSchema(schema, options?) function specifically for this purpose. The external zod-to-json-schema package is also no longer maintained as of November 2025, with the Zod team recommending migration to the native API.
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/tool-adapter.ts` around lines 90 - 170, The custom
zodToJsonSchema() function is using Zod internals (_zod?.def) which is fragile;
remove zodToJsonSchema and replace its usages with Zod's supported API
z.toJSONSchema(schema, options?) (e.g., import z from 'zod' and call
z.toJSONSchema(yourSchema) where zodToJsonSchema was used), delete any helper
unwrapZod usage tied only to this converter, and adjust any code that expected
the old output shape to match the native z.toJSONSchema output (update property
names/options as needed).
| async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> { | ||
| const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY }); | ||
|
|
||
| const blob = new Blob([audioBuffer], { type: "audio/wav" }); | ||
| const file = new File([blob], "audio.wav", { type: "audio/wav" }); | ||
|
|
||
| const result = await openai.audio.transcriptions.create({ | ||
| model: "whisper-1", | ||
| file, | ||
| }); | ||
|
|
||
| // Strip wake word from transcription | ||
| let text = result.text.trim(); | ||
| text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim(); | ||
| return text; | ||
| } |
There was a problem hiding this comment.
Add error handling for transcription failures.
The transcribeAudio function can throw on API errors (network issues, invalid audio, rate limits), but errors propagate unhandled to the caller. Consider wrapping with try/catch and emitting an SSE error event, or at minimum logging the failure with context.
🛡️ Proposed fix to add error handling
async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> {
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
const blob = new Blob([audioBuffer], { type: "audio/wav" });
const file = new File([blob], "audio.wav", { type: "audio/wav" });
+ try {
const result = await openai.audio.transcriptions.create({
model: "whisper-1",
file,
});
// Strip wake word from transcription
let text = result.text.trim();
text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim();
return text;
+ } catch (error) {
+ console.error("[voice/transcribe] Whisper API error:", error);
+ throw error;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> { | |
| const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY }); | |
| const blob = new Blob([audioBuffer], { type: "audio/wav" }); | |
| const file = new File([blob], "audio.wav", { type: "audio/wav" }); | |
| const result = await openai.audio.transcriptions.create({ | |
| model: "whisper-1", | |
| file, | |
| }); | |
| // Strip wake word from transcription | |
| let text = result.text.trim(); | |
| text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim(); | |
| return text; | |
| } | |
| async function transcribeAudio(audioBuffer: Uint8Array): Promise<string> { | |
| const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY }); | |
| const blob = new Blob([audioBuffer], { type: "audio/wav" }); | |
| const file = new File([blob], "audio.wav", { type: "audio/wav" }); | |
| try { | |
| const result = await openai.audio.transcriptions.create({ | |
| model: "whisper-1", | |
| file, | |
| }); | |
| // Strip wake word from transcription | |
| let text = result.text.trim(); | |
| text = text.replace(/^hey\s*jarvis[,.\s!?]*/i, "").trim(); | |
| return text; | |
| } catch (error) { | |
| console.error("[voice/transcribe] Whisper API error:", error); | |
| throw error; | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/voice-service.ts` around lines 23 - 38, Wrap the
transcribeAudio function body (including the OpenAI client creation and the
openai.audio.transcriptions.create call) in a try/catch; on error, log
contextual information (e.g., the error object and any relevant metadata) and
either emit an SSE error event to the caller or rethrow a wrapped error with a
clear message so callers can handle it; ensure the catch references
transcribeAudio and openai.audio.transcriptions.create so the change is easy to
find.
| def duration_s(self) -> float: | ||
| if not self._buffers: | ||
| return 0.0 | ||
| total_samples = sum(b.size for b in self._buffers) | ||
| return total_samples / self._config.sample_rate |
There was a problem hiding this comment.
Fix duration calculation for multi-channel audio.
Config.channels (apps/desktop/src/main/lib/voice/python/config.py Lines 7‑9) allows multi‑channel capture, but duration_s divides only by sample_rate. For >1 channel this inflates duration.
🛠️ Suggested fix
- total_samples = sum(b.size for b in self._buffers)
- return total_samples / self._config.sample_rate
+ total_samples = sum(b.size for b in self._buffers)
+ return total_samples / (self._config.sample_rate * self._config.channels)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def duration_s(self) -> float: | |
| if not self._buffers: | |
| return 0.0 | |
| total_samples = sum(b.size for b in self._buffers) | |
| return total_samples / self._config.sample_rate | |
| def duration_s(self) -> float: | |
| if not self._buffers: | |
| return 0.0 | |
| total_samples = sum(b.size for b in self._buffers) | |
| return total_samples / (self._config.sample_rate * self._config.channels) |
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/python/speech_capture.py` around lines 72 -
76, The duration_s method overestimates time for multi-channel audio because it
divides total_samples by self._config.sample_rate only; modify duration_s to
divide total_samples by (self._config.sample_rate * self._config.channels) so
frames across channels are accounted for (use self._config.channels alongside
sample_rate when computing duration from self._buffers and self._buffers' .size
values).
| function getDevConfig(): VoiceSpawnConfig { | ||
| const scriptDir = join(app.getAppPath(), "src/main/lib/voice/python"); | ||
| const venvPython = join(scriptDir, ".venv/bin/python3"); | ||
|
|
||
| if (existsSync(venvPython)) { | ||
| return { command: venvPython, args: ["main.py"], cwd: scriptDir }; | ||
| } | ||
|
|
||
| console.warn( | ||
| "[voice-paths] Dev venv not found, falling back to system python3", | ||
| ); | ||
| return { command: "python3", args: ["main.py"], cwd: scriptDir }; | ||
| } |
There was a problem hiding this comment.
Windows compatibility issue in dev mode.
The venv Python path .venv/bin/python3 is Unix-specific. On Windows, the path would be .venv/Scripts/python.exe. This will cause dev mode to always fall back to system Python on Windows.
🛡️ Proposed fix for cross-platform support
function getDevConfig(): VoiceSpawnConfig {
const scriptDir = join(app.getAppPath(), "src/main/lib/voice/python");
- const venvPython = join(scriptDir, ".venv/bin/python3");
+ const venvPython = join(
+ scriptDir,
+ process.platform === "win32"
+ ? ".venv/Scripts/python.exe"
+ : ".venv/bin/python3",
+ );
if (existsSync(venvPython)) {
return { command: venvPython, args: ["main.py"], cwd: scriptDir };
}
console.warn(
"[voice-paths] Dev venv not found, falling back to system python3",
);
- return { command: "python3", args: ["main.py"], cwd: scriptDir };
+ return {
+ command: process.platform === "win32" ? "python" : "python3",
+ args: ["main.py"],
+ cwd: scriptDir,
+ };
}🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 60 - 72,
getDevConfig currently assumes a Unix venv path (.venv/bin/python3) causing
Windows to always fallback; update getDevConfig to detect the platform
(process.platform === "win32") or attempt both platform-specific venv paths and
choose the first that exists. Construct platform paths from scriptDir (e.g.,
join(scriptDir, ".venv", "Scripts", "python.exe") for Windows and
join(scriptDir, ".venv", "bin", "python3") for others), check existsSync on the
candidate path(s), and return the matching path as command (with args
["main.py"], cwd: scriptDir); keep the fallback to "python3" if no venv python
is found.
| function getPreviewConfig(): VoiceSpawnConfig { | ||
| const previewDir = join(__dirname, "../lib/voice/python"); | ||
| const srcDir = join(app.getAppPath(), "src/main/lib/voice/python"); | ||
|
|
||
| const scriptDir = existsSync(previewDir) ? previewDir : srcDir; | ||
| const venvPython = join(srcDir, ".venv/bin/python3"); | ||
|
|
||
| if (existsSync(venvPython)) { | ||
| return { command: venvPython, args: ["main.py"], cwd: scriptDir }; | ||
| } | ||
|
|
||
| return { command: "python3", args: ["main.py"], cwd: scriptDir }; | ||
| } |
There was a problem hiding this comment.
Same Windows compatibility issue in preview mode.
The venv path and system Python command have the same cross-platform issue as in dev mode.
🛡️ Proposed fix for cross-platform support
function getPreviewConfig(): VoiceSpawnConfig {
const previewDir = join(__dirname, "../lib/voice/python");
const srcDir = join(app.getAppPath(), "src/main/lib/voice/python");
const scriptDir = existsSync(previewDir) ? previewDir : srcDir;
- const venvPython = join(srcDir, ".venv/bin/python3");
+ const venvPython = join(
+ srcDir,
+ process.platform === "win32"
+ ? ".venv/Scripts/python.exe"
+ : ".venv/bin/python3",
+ );
if (existsSync(venvPython)) {
return { command: venvPython, args: ["main.py"], cwd: scriptDir };
}
- return { command: "python3", args: ["main.py"], cwd: scriptDir };
+ return {
+ command: process.platform === "win32" ? "python" : "python3",
+ args: ["main.py"],
+ cwd: scriptDir,
+ };
}🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 74 - 86,
getPreviewConfig currently builds venvPython with a POSIX-only path
(".venv/bin/python3") and uses "python3" as the fallback, which breaks on
Windows; change getPreviewConfig to construct the venv python path using
path.join with a platform-specific segment (use ".venv/Scripts/python.exe" on
win32 and ".venv/bin/python3" otherwise) and set the fallback command to
"python" on Windows and "python3" on other platforms, keeping the same
scriptDir, previewDir, srcDir, args (["main.py"]) and cwd logic.
| <button | ||
| type="button" | ||
| onClick={() => toast.dismiss(toastId)} | ||
| className="ml-1 text-xs text-muted-foreground hover:text-foreground" | ||
| > | ||
| Cancel | ||
| </button> |
There was a problem hiding this comment.
“Cancel” label doesn’t match behavior.
The button only dismisses the toast; recording likely continues. Either wire it to an actual cancel action or rename to “Dismiss” to match behavior.
Proposed text fix
- Cancel
+ Dismiss📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <button | |
| type="button" | |
| onClick={() => toast.dismiss(toastId)} | |
| className="ml-1 text-xs text-muted-foreground hover:text-foreground" | |
| > | |
| Cancel | |
| </button> | |
| <button | |
| type="button" | |
| onClick={() => toast.dismiss(toastId)} | |
| className="ml-1 text-xs text-muted-foreground hover:text-foreground" | |
| > | |
| Dismiss | |
| </button> |
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/RecordingIndicator/RecordingIndicator.tsx`
around lines 17 - 23, The button text "Cancel" is misleading because the onClick
only calls toast.dismiss(toastId) and does not stop recording; either wire it to
the actual cancel flow or rename it to match behavior—update the onClick to call
the real cancel handler (e.g., cancelRecording() or a passed prop like
onCancelRecording()) in addition to toast.dismiss(toastId), or simply change the
label from "Cancel" to "Dismiss" in the RecordingIndicator component so the UI
matches the existing toast.dismiss(toastId) behavior.
| try { | ||
| const data = JSON.parse(line.slice(6)); | ||
| handleSSEEvent(eventType, data, setState); | ||
| } catch { | ||
| // Skip malformed data | ||
| } | ||
| eventType = ""; |
There was a problem hiding this comment.
Don't silently swallow JSON parse errors.
The empty catch block violates the coding guideline against silently swallowing errors. At minimum, log malformed SSE data for debugging purposes.
🛡️ Proposed fix to log parse errors
try {
const data = JSON.parse(line.slice(6));
handleSSEEvent(eventType, data, setState);
- } catch {
- // Skip malformed data
+ } catch (error) {
+ console.warn("[voice-pipeline] Malformed SSE data:", line, error);
}As per coding guidelines: "Never silently swallow errors with catch(() => {}) or catch(e) { return null }"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| const data = JSON.parse(line.slice(6)); | |
| handleSSEEvent(eventType, data, setState); | |
| } catch { | |
| // Skip malformed data | |
| } | |
| eventType = ""; | |
| try { | |
| const data = JSON.parse(line.slice(6)); | |
| handleSSEEvent(eventType, data, setState); | |
| } catch (error) { | |
| console.warn("[voice-pipeline] Malformed SSE data:", line, error); | |
| } | |
| eventType = ""; |
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`
around lines 116 - 122, The try/catch around JSON.parse is silently swallowing
parse errors; update the catch to log the malformed SSE data and the parse error
(e.g., using console.error) so debugging is possible, include the raw
line.slice(6) payload, the current eventType and the caught error, then proceed
to reset eventType as before; the change should be applied in the block
surrounding JSON.parse(...) that calls handleSSEEvent(eventType, data,
setState).
🚀 Preview Deployment🔗 Preview Links
Preview updates automatically with new commits |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@apps/desktop/src/lib/trpc/routers/voice/index.ts`:
- Around line 12-19: The current getMicStatus blindly casts
systemPreferences.getMediaAccessStatus("microphone") to MicPermissionStatus and
ignores Electron's possible 'unknown' return; update getMicStatus to capture the
raw status from systemPreferences.getMediaAccessStatus("microphone"), check if
rawStatus === "unknown" and map it to a safe MicPermissionStatus (e.g.,
"denied") and optionally console.warn about the unexpected 'unknown' value,
otherwise return the cast/validated status; reference the getMicStatus function
and systemPreferences.getMediaAccessStatus to implement this check and mapping.
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 110-125: handleVoiceToggle currently does nothing when
micPermission is undefined (loading); update it to handle that case by calling
requestMicPermission.mutate() when enabled === true and micPermission is
undefined so the toggle triggers a permission request, referencing
handleVoiceToggle, micPermission, and requestMicPermission.mutate;
alternatively, disable the switch in the UI while micPermission is loading to
prevent the noop behavior.
| function getMicStatus(): MicPermissionStatus { | ||
| if (process.platform !== "darwin") { | ||
| return "granted"; | ||
| } | ||
| return systemPreferences.getMediaAccessStatus( | ||
| "microphone", | ||
| ) as MicPermissionStatus; | ||
| } |
There was a problem hiding this comment.
Handle the 'unknown' status from Electron's getMediaAccessStatus.
systemPreferences.getMediaAccessStatus("microphone") can return 'unknown' in addition to the statuses in MicPermissionStatus. The type assertion silently ignores this possibility, which could lead to unexpected behavior if Electron returns 'unknown'.
Proposed fix to handle unknown status
function getMicStatus(): MicPermissionStatus {
if (process.platform !== "darwin") {
return "granted";
}
- return systemPreferences.getMediaAccessStatus(
+ const status = systemPreferences.getMediaAccessStatus(
"microphone",
- ) as MicPermissionStatus;
+ );
+ // Electron can return 'unknown' in edge cases; treat as not-determined
+ if (status === "unknown") {
+ return "not-determined";
+ }
+ return status as MicPermissionStatus;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function getMicStatus(): MicPermissionStatus { | |
| if (process.platform !== "darwin") { | |
| return "granted"; | |
| } | |
| return systemPreferences.getMediaAccessStatus( | |
| "microphone", | |
| ) as MicPermissionStatus; | |
| } | |
| function getMicStatus(): MicPermissionStatus { | |
| if (process.platform !== "darwin") { | |
| return "granted"; | |
| } | |
| const status = systemPreferences.getMediaAccessStatus( | |
| "microphone", | |
| ); | |
| // Electron can return 'unknown' in edge cases; treat as not-determined | |
| if (status === "unknown") { | |
| return "not-determined"; | |
| } | |
| return status as MicPermissionStatus; | |
| } |
🤖 Prompt for AI Agents
In `@apps/desktop/src/lib/trpc/routers/voice/index.ts` around lines 12 - 19, The
current getMicStatus blindly casts
systemPreferences.getMediaAccessStatus("microphone") to MicPermissionStatus and
ignores Electron's possible 'unknown' return; update getMicStatus to capture the
raw status from systemPreferences.getMediaAccessStatus("microphone"), check if
rawStatus === "unknown" and map it to a safe MicPermissionStatus (e.g.,
"denied") and optionally console.warn about the unexpected 'unknown' value,
otherwise return the cast/validated status; reference the getMicStatus function
and systemPreferences.getMediaAccessStatus to implement this check and mapping.
| const handleVoiceToggle = (enabled: boolean) => { | ||
| if (!enabled) { | ||
| setVoiceCommandsEnabled.mutate({ enabled: false }); | ||
| return; | ||
| } | ||
|
|
||
| if (micPermission === "granted") { | ||
| setVoiceCommandsEnabled.mutate({ enabled: true }); | ||
| return; | ||
| } | ||
|
|
||
| if (micPermission === "not-determined") { | ||
| requestMicPermission.mutate(); | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Toggle does nothing when permission status is still loading.
When enabled=true and micPermission is undefined (query still loading), the function falls through all conditions without executing any action. The user clicks the switch, nothing visible happens.
Proposed fix to handle the loading/undefined case
const handleVoiceToggle = (enabled: boolean) => {
if (!enabled) {
setVoiceCommandsEnabled.mutate({ enabled: false });
return;
}
if (micPermission === "granted") {
setVoiceCommandsEnabled.mutate({ enabled: true });
return;
}
- if (micPermission === "not-determined") {
+ // Treat undefined (loading) or not-determined as needing permission request
+ if (micPermission === "not-determined" || micPermission === undefined) {
requestMicPermission.mutate();
return;
}
+
+ // micDenied case: do nothing (switch is disabled anyway)
};Alternatively, disable the switch while micPermission is loading to prevent this edge case entirely.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const handleVoiceToggle = (enabled: boolean) => { | |
| if (!enabled) { | |
| setVoiceCommandsEnabled.mutate({ enabled: false }); | |
| return; | |
| } | |
| if (micPermission === "granted") { | |
| setVoiceCommandsEnabled.mutate({ enabled: true }); | |
| return; | |
| } | |
| if (micPermission === "not-determined") { | |
| requestMicPermission.mutate(); | |
| return; | |
| } | |
| }; | |
| const handleVoiceToggle = (enabled: boolean) => { | |
| if (!enabled) { | |
| setVoiceCommandsEnabled.mutate({ enabled: false }); | |
| return; | |
| } | |
| if (micPermission === "granted") { | |
| setVoiceCommandsEnabled.mutate({ enabled: true }); | |
| return; | |
| } | |
| // Treat undefined (loading) or not-determined as needing permission request | |
| if (micPermission === "not-determined" || micPermission === undefined) { | |
| requestMicPermission.mutate(); | |
| return; | |
| } | |
| // micDenied case: do nothing (switch is disabled anyway) | |
| }; |
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`
around lines 110 - 125, handleVoiceToggle currently does nothing when
micPermission is undefined (loading); update it to handle that case by calling
requestMicPermission.mutate() when enabled === true and micPermission is
undefined so the toggle triggers a permission request, referencing
handleVoiceToggle, micPermission, and requestMicPermission.mutate;
alternatively, disable the switch in the UI while micPermission is loading to
prevent the noop behavior.
…gs toggle
Adds a Python-based voice sidecar that listens for a wake word ("Hey Jarvis"),
captures speech, transcribes it, and routes commands back to the desktop app.
The feature is gated behind a "Voice Commands" toggle in Settings > Features
(default: off). The sidecar process auto-starts/stops based on subscriber count,
so it only runs when the setting is enabled.
Sets up Python 3.11, creates a venv with openwakeword/sounddevice/numpy, and runs the PyInstaller build script so the voice-sidecar binary gets bundled into the Electron app's extraResources.
The build script now auto-creates the venv and installs dependencies, so CI doesn't need to know about Python internals. It runs as part of prepackage — CI only needs setup-python to ensure python3 is on PATH.
Request mic access before starting the voice sidecar, show guidance when permission is denied, and re-check on window focus so users can grant access in System Settings and return to a working toggle.
The canary config was replacing the entire extendInfo object, dropping NSMicrophoneUsageDescription and NSLocalNetworkUsageDescription from the base config. Spread the base extendInfo so these plist keys are preserved.
Hardened runtime blocks microphone access without the com.apple.security.device.audio-input entitlement, causing getMediaAccessStatus to return "denied" regardless of TCC state. Add an explicit entitlements plist with the audio-input entitlement.
The workflow had Setup Python but never ran build-voice-sidecar.sh, so the sidecar binary was packaged without the openwakeword model data. Add the missing build step before electron-vite compilation.
--collect-data openwakeword alone was not including the model .onnx files in the PyInstaller bundle. Add --add-data with the resolved package path as a fallback, and verify the hey_jarvis model exists in the output before finishing the build. Also revert the redundant CI workflow step since the sidecar build is already wired through prepackage in package.json.
PyInstaller's --collect-data silently misses openwakeword model files. Fall back to manually copying the package data into _internal/ when the model isn't found after the initial build.
rm the target first so cp -R creates a clean copy of the package directory instead of nesting it inside an existing partial dir.
openwakeword >=0.6.0 no longer ships pre-trained models in the pip package. Download hey_jarvis, melspectrogram, and embedding_model from the v0.5.1 GitHub release during the sidecar build.
After rebase onto main, update voice API imports from @/lib/mcp/* to @superset/mcp and @superset/mcp/auth. Remove duplicate ANTHROPIC_API_KEY in env schema (already added by Slack integration).
0434463 to
9b7e000
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@apps/desktop/src/main/lib/voice/python/main.py`:
- Around line 36-40: The emit() function writes JSON to stdout without
synchronization, which can interleave with writes from audio.py's sounddevice
callback; define a module-level threading.Lock named STDOUT_LOCK in main.py,
import threading, and wrap the body of emit(event: str, **kwargs: Any) with with
STDOUT_LOCK: before writing and flushing; export STDOUT_LOCK (so other modules
can import it) and update audio.py's _emit_error to import and use
main.STDOUT_LOCK to guard its sys.stdout.write + sys.stdout.flush calls.
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts`:
- Around line 45-57: The fallback currently returns command "python3" which
fails on Windows; update the fallback in voice-process-paths (where scriptDir is
computed and the object with command, args, cwd is returned) to select the
command based on platform (use "python" when process.platform === 'win32',
otherwise "python3"), keep args as [join(scriptDir, "main.py")] and cwd as
scriptDir so the rest of the code (scriptDir, join, process.resourcesPath)
remains unchanged.
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 65-104: The mutations setVoiceCommandsEnabled.useMutation and
requestMicPermission.useMutation currently swallow errors; update their onError
handlers to log the error and relevant context (error object, mutation vars like
{ enabled } or request inputs, and any previous state from context) so failures
are visible; specifically, add a logging call in setVoiceCommandsEnabled.onError
that logs _err, _vars and context.previous when rollback occurs, and add an
onError handler to requestMicPermission.useMutation that logs the error and
request parameters and then invalidates utils.voice.getMicPermission as
appropriate.
🧹 Nitpick comments (8)
packages/local-db/drizzle/meta/0016_snapshot.json (1)
401-408: Consider makingvoice_commands_enablednon-nullable to avoid tri-state behavior.If this is a strict boolean feature flag, a
NOT NULLcolumn withDEFAULT falseavoids null handling and keeps the semantics tighter. That change should be made in the migration/schema and then regenerate the snapshot rather than editing it directly.apps/desktop/scripts/build-voice-sidecar.sh (1)
42-48: Consider verifying downloaded model integrity.The models are downloaded over HTTPS which provides transport security, but there's no checksum verification. A compromised CDN or MITM (if TLS is bypassed) could serve malicious ONNX files. Consider adding SHA256 verification for the downloaded models.
🛡️ Suggested checksum verification
OWW_BASE_URL="https://github.com/dscripka/openWakeWord/releases/download/v0.5.1" + +# Expected SHA256 checksums for model integrity verification +declare -A MODEL_CHECKSUMS=( + ["hey_jarvis_v0.1.onnx"]="<sha256-hash>" + ["melspectrogram.onnx"]="<sha256-hash>" + ["embedding_model.onnx"]="<sha256-hash>" +) + for model in hey_jarvis_v0.1.onnx melspectrogram.onnx embedding_model.onnx; do if [ ! -f "$OWW_MODELS_DIR/$model" ]; then echo "[voice-sidecar] Downloading model: $model" curl -sL "$OWW_BASE_URL/$model" -o "$OWW_MODELS_DIR/$model" + # Verify checksum + if [ -n "${MODEL_CHECKSUMS[$model]:-}" ]; then + echo "[voice-sidecar] Verifying checksum for $model..." + echo "${MODEL_CHECKSUMS[$model]} $OWW_MODELS_DIR/$model" | shasum -a 256 -c - || { + echo "[voice-sidecar] ERROR: Checksum verification failed for $model" + rm -f "$OWW_MODELS_DIR/$model" + exit 1 + } + fi fi doneYou'll need to compute and populate the actual SHA256 hashes for the models from the v0.5.1 release.
apps/desktop/src/main/lib/voice/voice-process.ts (1)
103-110: Consider adding a debug log for the swallowed stdin write error.The empty catch block is acceptable given the comment, but a debug-level log could aid troubleshooting without cluttering normal output.
♻️ Optional: Add debug logging
if (childProcess.stdin && !childProcess.stdin.destroyed) { try { childProcess.stdin.write(`${JSON.stringify({ cmd: "stop" })}\n`); } catch { - // stdin may be closed already + // stdin may be closed already - this is expected during shutdown + console.debug("[voice-process] stdin write failed (likely already closed)"); } }apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (1)
22-35: Consider extracting the auto-dismiss delay to a named constant.The
8000millisecond delay is a magic number that could be extracted for clarity and easier adjustment.♻️ Optional: Extract to named constant
+const AUTO_DISMISS_DELAY_MS = 8000; + export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) { // ... // Auto-dismiss after done useEffect(() => { if (status === "done") { const timer = setTimeout(() => { toast.dismiss(toastId); - }, 8000); + }, AUTO_DISMISS_DELAY_MS); return () => clearTimeout(timer); } }, [status, toastId]);apps/api/src/app/api/voice/route.ts (1)
5-22: Consider adding type guard for session extension.The type assertion at lines 9-11 assumes the session has an
activeOrganizationIdproperty. A type guard or schema validation would be safer and more explicit.♻️ Proposed improvement
async function authenticate(request: Request): Promise<McpContext | null> { // Try session auth const session = await auth.api.getSession({ headers: request.headers }); - if (session?.session) { - const extendedSession = session.session as { - activeOrganizationId?: string; - }; - if (!extendedSession.activeOrganizationId) { + if (session?.session && session.user) { + const activeOrganizationId = (session.session as Record<string, unknown>) + .activeOrganizationId; + if (typeof activeOrganizationId !== "string") { return null; } return { userId: session.user.id, - organizationId: extendedSession.activeOrganizationId, + organizationId: activeOrganizationId, }; } return null; }apps/desktop/src/main/lib/voice/python/audio.py (1)
18-27: Unused callback parameters are required by sounddevice API.The
framesandtime_infoparameters flagged by static analysis are required by the sounddevice callback signature. Consider prefixing with underscore to indicate intentional non-use.♻️ Naming convention fix
def _callback( self, indata: np.ndarray, - frames: int, - time_info: object, + _frames: int, + _time_info: object, status: sd.CallbackFlags, ) -> None:apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts (1)
151-214: Add defensive checks for type assertions in SSE event handler.The handler uses multiple
as stringassertions without validation. If the server sends malformed data, this could cause subtle bugs or runtime issues.♻️ Proposed defensive handling
function handleSSEEvent( event: string, data: Record<string, unknown>, setState: React.Dispatch<React.SetStateAction<VoicePipelineState>>, ) { switch (event) { case "transcription": + if (typeof data.text !== "string") return; setState((prev) => ({ ...prev, status: "processing", - transcription: data.text as string, + transcription: data.text, })); break; case "tool_use": + if (typeof data.toolName !== "string") return; setState((prev) => ({ ...prev, status: "processing", toolCalls: [ ...prev.toolCalls, { - toolName: data.toolName as string, + toolName: data.toolName, toolInput: data.toolInput, }, ], })); break;apps/desktop/src/main/lib/voice/python/speech_capture.py (1)
65-69: Consider dtype consistency in get_audio.The method returns an empty array with
dtype=np.int16, but when concatenating buffers, the dtype comes from the input chunks. This could lead to inconsistent dtypes if chunks have different dtypes.♻️ Ensure consistent dtype
def get_audio(self) -> np.ndarray: """Return all captured audio as a single array.""" if not self._buffers: return np.array([], dtype=np.int16) - return np.concatenate(self._buffers).flatten() + return np.concatenate(self._buffers).flatten().astype(np.int16)
| def emit(event: str, **kwargs: Any) -> None: | ||
| """Write a JSON event to stdout.""" | ||
| msg = {"event": event, **kwargs} | ||
| sys.stdout.write(json.dumps(msg) + "\n") | ||
| sys.stdout.flush() |
There was a problem hiding this comment.
Add stdout lock to prevent interleaved JSON.
This emit() function writes to stdout without synchronization. Combined with _emit_error in audio.py running on the sounddevice callback thread, JSON lines can interleave and corrupt the IPC protocol.
🔒 Proposed fix with shared lock
+import threading
+
+STDOUT_LOCK = threading.Lock()
+
def emit(event: str, **kwargs: Any) -> None:
"""Write a JSON event to stdout."""
msg = {"event": event, **kwargs}
- sys.stdout.write(json.dumps(msg) + "\n")
- sys.stdout.flush()
+ with STDOUT_LOCK:
+ sys.stdout.write(json.dumps(msg) + "\n")
+ sys.stdout.flush()Then update audio.py to import and use the same lock:
from main import STDOUT_LOCK
def _emit_error(message: str) -> None:
import json
import sys
with STDOUT_LOCK:
sys.stdout.write(json.dumps({"event": "error", "message": message}) + "\n")
sys.stdout.flush()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def emit(event: str, **kwargs: Any) -> None: | |
| """Write a JSON event to stdout.""" | |
| msg = {"event": event, **kwargs} | |
| sys.stdout.write(json.dumps(msg) + "\n") | |
| sys.stdout.flush() | |
| import threading | |
| STDOUT_LOCK = threading.Lock() | |
| def emit(event: str, **kwargs: Any) -> None: | |
| """Write a JSON event to stdout.""" | |
| msg = {"event": event, **kwargs} | |
| with STDOUT_LOCK: | |
| sys.stdout.write(json.dumps(msg) + "\n") | |
| sys.stdout.flush() |
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/python/main.py` around lines 36 - 40, The
emit() function writes JSON to stdout without synchronization, which can
interleave with writes from audio.py's sounddevice callback; define a
module-level threading.Lock named STDOUT_LOCK in main.py, import threading, and
wrap the body of emit(event: str, **kwargs: Any) with with STDOUT_LOCK: before
writing and flushing; export STDOUT_LOCK (so other modules can import it) and
update audio.py's _emit_error to import and use main.STDOUT_LOCK to guard its
sys.stdout.write + sys.stdout.flush calls.
| // Fallback: try system python3 with unpacked script | ||
| console.warn( | ||
| "[voice-paths] PyInstaller binary not found, falling back to system python3", | ||
| ); | ||
| const scriptDir = join( | ||
| process.resourcesPath, | ||
| "app.asar.unpacked/src/main/lib/voice/python", | ||
| ); | ||
| return { | ||
| command: "python3", | ||
| args: [join(scriptDir, "main.py")], | ||
| cwd: scriptDir, | ||
| }; |
There was a problem hiding this comment.
Windows compatibility issue in packaged fallback.
The fallback path uses "python3" as the command, which doesn't exist on Windows where the command is typically "python".
🛡️ Proposed fix for cross-platform fallback
console.warn(
"[voice-paths] PyInstaller binary not found, falling back to system python3",
);
const scriptDir = join(
process.resourcesPath,
"app.asar.unpacked/src/main/lib/voice/python",
);
return {
- command: "python3",
+ command: process.platform === "win32" ? "python" : "python3",
args: [join(scriptDir, "main.py")],
cwd: scriptDir,
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Fallback: try system python3 with unpacked script | |
| console.warn( | |
| "[voice-paths] PyInstaller binary not found, falling back to system python3", | |
| ); | |
| const scriptDir = join( | |
| process.resourcesPath, | |
| "app.asar.unpacked/src/main/lib/voice/python", | |
| ); | |
| return { | |
| command: "python3", | |
| args: [join(scriptDir, "main.py")], | |
| cwd: scriptDir, | |
| }; | |
| // Fallback: try system python3 with unpacked script | |
| console.warn( | |
| "[voice-paths] PyInstaller binary not found, falling back to system python3", | |
| ); | |
| const scriptDir = join( | |
| process.resourcesPath, | |
| "app.asar.unpacked/src/main/lib/voice/python", | |
| ); | |
| return { | |
| command: process.platform === "win32" ? "python" : "python3", | |
| args: [join(scriptDir, "main.py")], | |
| cwd: scriptDir, | |
| }; |
🤖 Prompt for AI Agents
In `@apps/desktop/src/main/lib/voice/voice-process-paths.ts` around lines 45 - 57,
The fallback currently returns command "python3" which fails on Windows; update
the fallback in voice-process-paths (where scriptDir is computed and the object
with command, args, cwd is returned) to select the command based on platform
(use "python" when process.platform === 'win32', otherwise "python3"), keep args
as [join(scriptDir, "main.py")] and cwd as scriptDir so the rest of the code
(scriptDir, join, process.resourcesPath) remains unchanged.
| const { data: voiceCommandsEnabled, isLoading: isVoiceLoading } = | ||
| electronTrpc.settings.getVoiceCommandsEnabled.useQuery(); | ||
| const setVoiceCommandsEnabled = | ||
| electronTrpc.settings.setVoiceCommandsEnabled.useMutation({ | ||
| onMutate: async ({ enabled }) => { | ||
| await utils.settings.getVoiceCommandsEnabled.cancel(); | ||
| const previous = utils.settings.getVoiceCommandsEnabled.getData(); | ||
| utils.settings.getVoiceCommandsEnabled.setData(undefined, enabled); | ||
| return { previous }; | ||
| }, | ||
| onError: (_err, _vars, context) => { | ||
| if (context?.previous !== undefined) { | ||
| utils.settings.getVoiceCommandsEnabled.setData( | ||
| undefined, | ||
| context.previous, | ||
| ); | ||
| } | ||
| }, | ||
| onSettled: () => { | ||
| utils.settings.getVoiceCommandsEnabled.invalidate(); | ||
| }, | ||
| }); | ||
|
|
||
| const { data: micPermission } = electronTrpc.voice.getMicPermission.useQuery( | ||
| undefined, | ||
| { | ||
| refetchOnWindowFocus: true, | ||
| }, | ||
| ); | ||
|
|
||
| const requestMicPermission = | ||
| electronTrpc.voice.requestMicPermission.useMutation({ | ||
| onSuccess: ({ granted }) => { | ||
| utils.voice.getMicPermission.invalidate(); | ||
| if (granted) { | ||
| setVoiceCommandsEnabled.mutate({ enabled: true }); | ||
| } | ||
| }, | ||
| }); | ||
|
|
There was a problem hiding this comment.
Add error logging for the new mutations.
Right now failures are silent; please log with context.
Proposed fix
const setVoiceCommandsEnabled =
electronTrpc.settings.setVoiceCommandsEnabled.useMutation({
onMutate: async ({ enabled }) => {
await utils.settings.getVoiceCommandsEnabled.cancel();
const previous = utils.settings.getVoiceCommandsEnabled.getData();
utils.settings.getVoiceCommandsEnabled.setData(undefined, enabled);
return { previous };
},
- onError: (_err, _vars, context) => {
+ onError: (err, _vars, context) => {
+ console.error(
+ "[settings/voice-commands] Failed to update setting:",
+ err,
+ );
if (context?.previous !== undefined) {
utils.settings.getVoiceCommandsEnabled.setData(
undefined,
context.previous,
);
}
},
onSettled: () => {
utils.settings.getVoiceCommandsEnabled.invalidate();
},
});
const requestMicPermission =
electronTrpc.voice.requestMicPermission.useMutation({
onSuccess: ({ granted }) => {
utils.voice.getMicPermission.invalidate();
if (granted) {
setVoiceCommandsEnabled.mutate({ enabled: true });
}
},
+ onError: (err) => {
+ console.error(
+ "[settings/mic-permission] Failed to request permission:",
+ err,
+ );
+ },
});As per coding guidelines, Never swallow errors silently; at minimum log them with context.
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`
around lines 65 - 104, The mutations setVoiceCommandsEnabled.useMutation and
requestMicPermission.useMutation currently swallow errors; update their onError
handlers to log the error and relevant context (error object, mutation vars like
{ enabled } or request inputs, and any previous state from context) so failures
are visible; specifically, add a logging call in setVoiceCommandsEnabled.onError
that logs _err, _vars and context.previous when rollback occurs, and add an
onError handler to requestMicPermission.useMutation that logs the error and
request parameters and then invalidates utils.voice.getMicPermission as
appropriate.
Cast ZodType through unknown for internal _zod access, and cast Uint8Array to BlobPart for Blob constructor compatibility.
- Capture process reference locally in stopVoiceProcess and clear childProcess immediately, so startVoiceProcess can proceed if called while the old process is still shutting down - Guard exit/error handlers with identity check (childProcess === proc) to avoid cleaning up a newer process when a stale one exits - Emit idle event on cleanup so subscribers know process stopped - Fix import ordering (biome auto-fix)
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@apps/api/src/app/api/voice/voice-service.ts`:
- Around line 52-60: Wrap the call to transcribeAudio in a try/catch inside the
function so any thrown errors are caught instead of crashing the pipeline; on
catch, call sse.write("error", { message: err.message || String(err), step:
"transcription" }) (or similar structured payload) and then write a terminal
event (e.g., sse.write("done", { fullResponse: "" }) or return) to stop further
processing; update the block around transcribeAudio and subsequent
sse.write("transcription", ...) to only run on success.
- Around line 77-93: The Claude streaming call and async iteration
(anthropic.messages.stream(...) and the subsequent for await (const event of
stream) loop) lack error handling; wrap the stream creation and its for-await
loop in a try/catch (inside the MAX_TOOL_ROUNDS loop) to catch API/network
errors, log the error, send an SSE error event via sse.write (e.g.,
sse.write("error", { message: error.message, code: ... })), ensure any necessary
cleanup/stream.close if available, and break/return from the outer loop so the
pipeline stops gracefully while preserving the accumulated fullResponse.
🧹 Nitpick comments (1)
apps/api/src/app/api/voice/tool-adapter.ts (1)
80-81: Avoidas nevertype assertion — it defeats type safety.Using
as neverbypasses all type checking and could hide mismatches betweeninterceptServerand the expected MCP server interface. Consider typinginterceptServerto match the expected interface or using a more specific assertion.Suggested approach
- registerTools(interceptServer as never); + // If registerTools expects a specific type, define interceptServer accordingly + // or use a targeted type assertion that preserves some checking: + registerTools(interceptServer as Parameters<typeof registerTools>[0]);
| }): Promise<void> { | ||
| // 1. Transcribe | ||
| const transcription = await transcribeAudio(audioBuffer); | ||
| sse.write("transcription", { text: transcription }); | ||
|
|
||
| if (!transcription) { | ||
| sse.write("done", { fullResponse: "" }); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Wrap transcription call in try/catch to prevent pipeline crash.
If transcribeAudio throws (network error, invalid audio, rate limits), the pipeline crashes without emitting an error event. Consider wrapping the transcription step to emit an SSE error event on failure.
Proposed fix
): Promise<void> {
// 1. Transcribe
+ let transcription: string;
+ try {
- const transcription = await transcribeAudio(audioBuffer);
+ transcription = await transcribeAudio(audioBuffer);
+ } catch (error) {
+ console.error("[voice/pipeline] Transcription failed:", error);
+ sse.write("error", {
+ message: error instanceof Error ? error.message : "Transcription failed",
+ });
+ sse.write("done", { fullResponse: "" });
+ return;
+ }
sse.write("transcription", { text: transcription });🤖 Prompt for AI Agents
In `@apps/api/src/app/api/voice/voice-service.ts` around lines 52 - 60, Wrap the
call to transcribeAudio in a try/catch inside the function so any thrown errors
are caught instead of crashing the pipeline; on catch, call sse.write("error", {
message: err.message || String(err), step: "transcription" }) (or similar
structured payload) and then write a terminal event (e.g., sse.write("done", {
fullResponse: "" }) or return) to stop further processing; update the block
around transcribeAudio and subsequent sse.write("transcription", ...) to only
run on success.
Delete the 257-line tool-adapter.ts that hand-rolled Zod-to-JSON-Schema conversion and MCP tool interception. Replace with createInMemoryMcpClient from @superset/mcp/in-memory, using client.listTools() and client.callTool() directly — the same pattern as the Slack agent.
- Move VoiceSidecarEvent and PythonVoiceEvent into voice-process.ts - Inline MicPermissionStatus in voice router - Delete shared/voice.ts (renderer gets types via tRPC inference) - Remove unused getVoiceProcessStatus(), isRunning, start/stop mutations - Remove unused PythonVoiceCommand type
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`:
- Line 39: The linter warns that handleSSEEvent is captured by the useCallback
for processAudio but not listed in its dependency array; to fix, move the
handleSSEEvent function body inside the processAudio useCallback (so it closes
over stable setState directly) and remove the outer handleSSEEvent declaration,
or alternatively add handleSSEEvent to processAudio's dependency array if you
prefer keeping it separate; update references so processAudio uses the inlined
handleSSEEvent logic and ensure only stable setters (setState) are used inside.
🧹 Nitpick comments (3)
apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts (1)
146-151: Consider validating SSE event data before type assertions.Multiple
as stringcasts assume the server always sends correctly typed data. If the server sendsnull,undefined, or a non-string value, these assertions could lead to subtle bugs (e.g.,"null"being appended toresponseText).A lightweight guard or optional chaining would make this more defensive:
Example defensive approach
case "transcription": setState((prev) => ({ ...prev, status: "processing", - transcription: data.text as string, + transcription: typeof data.text === "string" ? data.text : null, })); break;Also applies to: 153-161, 163-171, 173-178, 183-188
apps/desktop/src/renderer/components/Voice/components/ResponsePanel/ResponsePanel.tsx (2)
28-36: Extract the auto-dismiss timeout to a named constant.The
8000milliseconds magic number should be extracted to a module-level constant for clarity and easier maintenance.Proposed fix
+const AUTO_DISMISS_DELAY_MS = 8000; + export function ResponsePanel({ toastId, audioB64 }: ResponsePanelProps) {useEffect(() => { if (status === "done" || status === "error") { const timer = setTimeout(() => { toast.dismiss(toastId); - }, 8000); + }, AUTO_DISMISS_DELAY_MS); return () => clearTimeout(timer); } }, [status, toastId]);
38-40: Consider inlining the abort call.The
handleStopfunction is a trivial wrapper aroundabort(). You could inline it in the onClick handler for brevity, though this is a minor stylistic preference.Optional simplification
- const handleStop = () => { - abort(); - }; - // ... - <Button variant="ghost" size="sm" onClick={handleStop}> + <Button variant="ghost" size="sm" onClick={abort}>
| const [state, setState] = useState<VoicePipelineState>(INITIAL_STATE); | ||
| const abortRef = useRef<AbortController | null>(null); | ||
|
|
||
| const processAudio = useCallback(async (audioB64: string) => { |
There was a problem hiding this comment.
Add handleSSEEvent to the dependency array or move it inside useCallback.
The static analysis tool flags that handleSSEEvent is used inside processAudio but not listed in its dependency array. Since handleSSEEvent is defined outside the useCallback, this creates a stale closure risk where the function reference could become outdated.
The simplest fix is to move handleSSEEvent inside the useCallback, as it only depends on setState which is stable.
Proposed fix: move handleSSEEvent inside useCallback
const processAudio = useCallback(async (audioB64: string) => {
abortRef.current?.abort();
setState({ ...INITIAL_STATE, status: "transcribing" });
+
+ function handleSSEEvent(event: string, data: Record<string, unknown>) {
+ switch (event) {
+ case "transcription":
+ setState((prev) => ({
+ ...prev,
+ status: "processing",
+ transcription: data.text as string,
+ }));
+ break;
+ // ... rest of cases
+ }
+ }
const binaryStr = atob(audioB64);Then remove the outer handleSSEEvent function at lines 144-191.
Also applies to: 144-191
🧰 Tools
🪛 Biome (2.3.13)
[error] 39-39: This hook does not specify its dependency on handleSSEEvent.
This dependency is being used here, but is not specified in the hook dependency list.
React relies on hook dependencies to determine when to re-compute Effects.
Failing to specify dependencies can result in Effects not updating correctly when state changes.
These "stale closures" are a common source of surprising bugs.
Either include it or remove the dependency array.
Unsafe fix: Add the missing dependency to the list.
(lint/correctness/useExhaustiveDependencies)
🤖 Prompt for AI Agents
In
`@apps/desktop/src/renderer/components/Voice/components/ResponsePanel/hooks/useVoicePipeline/useVoicePipeline.ts`
at line 39, The linter warns that handleSSEEvent is captured by the useCallback
for processAudio but not listed in its dependency array; to fix, move the
handleSSEEvent function body inside the processAudio useCallback (so it closes
over stable setState directly) and remove the outer handleSSEEvent declaration,
or alternatively add handleSSEEvent to processAudio's dependency array if you
prefer keeping it separate; update references so processAudio uses the inlined
handleSSEEvent logic and ensure only stable setters (setState) are used inside.
Summary
Key Changes
voice_commands_enabledboolean column in local SQLite settings table (migration 0016)getVoiceCommandsEnabled/setVoiceCommandsEnabledprocedures with optimistic UI updatesgetMicPermissionquery andrequestMicPermissionmutation for macOS permission flowopenwakewordfor wake word detection,sounddevicefor audio capture, piping events via stdout JSON/api/voiceendpoint for Whisper transcription and LLM tool routingVoiceListenercomponent gated on bothvoiceEnabledandmicPermission === "granted";RecordingIndicatorandResponsePaneltoast components; permission-aware settings toggle with denied-state warning and "Open System Settings" linkcom.apple.security.device.audio-inputadded to macOS entitlements plist;NSMicrophoneUsageDescriptionin Info.plistextendInfospread to preserve base config keys (NSMicrophoneUsageDescription)Microphone Permission UX
Test plan
[voice-process])Summary by CodeRabbit
Release Notes