feat(desktop): add Voice Control dictation#4981
Conversation
Add droid to HOST_AGENT_PRESETS and BUILTIN_TERMINAL_AGENTS with --auto medium autonomy, simplify the agent-identity type union, and add branded SVG icons.
Two related performance fixes for the V2 FilesTab: 1. Empty folder on expand (root cause): the lazy-expand detector in useFilesTabBridge subscribed to model changes and iterated EVERY path in knownPaths on each notification, polling isExpanded() to decide whether to fetch children. With large workspaces this was O(n) per Pierre subscriber tick, AND racy — if Pierre's internal expansion state hadn't flipped by the time the subscriber fired, the fetch never started and the folder appeared empty until the user pressed Refresh. Replaced the polling sweep with a purpose-built `unloadedDirCandidatesRef` set that tracks only directories Pierre knows about but we haven't loaded yet. The subscriber now iterates the candidate set (typically <20 entries) instead of all known paths. Candidates are populated as fetchDir discovers child directories, added by fs:events for new folders, and cleared on workspace switch / doRefresh. 2. Workspace switch lag: FilesTab was issuing its own workspace.get.useQuery without staleTime, so every remount fired a fresh IPC even though the parent route (V2WorkspacePage) had just loaded the same data. Bumped staleTime to 30s so the cache serves the duplicate request instantly on remount. Files: - useFilesTabBridge.ts: O(n) → O(k) expansion detection, candidate set - FilesTab.tsx: staleTime on workspace.get query Out of scope (deferred): legacy FilesView (separate fix branch exists), backend listDirectory perf, Pierre Trees upgrade for native onExpand.
Per research into VSCode/JetBrains/Zed/Warp/Cursor: this project already meets most best-practice items for real-time file tree updates — we use @parcel/watcher (same library as VSCode), with comprehensive native excludes for node_modules/.git/dist/etc. The gap is observability, not architecture. Adds dev-mode logging at three points in the fs:events pipeline so we can diagnose any future perceived-lag reports with evidence instead of speculation: 1. packages/workspace-fs/src/watch.ts — log at parcel callback emission (entry point into our pipeline). Gated on process.env.SUPERSET_FS_EVENTS_DEBUG=1. 2. packages/host-service/src/events/event-bus.ts — log at sendMessage for fs:events payloads (transport boundary). Same env flag. 3. apps/desktop/.../FilesTab/.../useFilesTabBridge.ts — log every fs:events handler entry, plus the two surviving early-return paths (rootPath-empty belt-and-suspenders; outside-workspace path filter; rename fallback when oldKey not in knownPaths). Gated on import.meta.env.DEV so logs ship in dev builds only. To enable Node-side logging during dev: SUPERSET_FS_EVENTS_DEBUG=1 SUPERSET_PROFILE=local bun dev --filter=@superset/desktop Renderer logs are always on in dev builds. No production behaviour change. No latency tuning — defer until logs prove throttling (75ms parcel debounce + 200ms ThrottledWorker) is the dominant source of perceived lag.
# Conflicts: # packages/shared/src/agent-identity.ts # packages/shared/src/builtin-terminal-agents.ts # packages/shared/src/host-agent-presets.ts # packages/ui/src/assets/icons/preset-icons/droid-white.svg # packages/ui/src/assets/icons/preset-icons/droid.svg
The origin/main merge left duplicate imports, PRESET_ICONS entries, and re-exports for droidIcon/droidWhiteIcon in the preset-icons index. Removed the duplicates to fix the esbuild build error.
…-1779916496-file-navigator-lag'
Builds the desktop app as a real production bundle and installs it to /Applications, swapping dev .env values (NODE_ENV, public URLs, RELAY_URL, SUPERSET_WORKSPACE_NAME) for production only during the build, then restoring the dev .env via an EXIT trap. Avoids the dev-server-URL crash, the blank-screen RELAY_URL Zod failure, and the wrong-workspace data dir.
📝 WalkthroughWalkthroughIntroduces desktop voice dictation end-to-end with hotkeys (including Fn/Globe), TRPC/OpenAI transcription, UI integrations (dashboard/chat/terminal), settings with DB migration, and extensive tests. Adds SUPER-869 spec docs and reorders Droid preset icons. Improves FilesTab lazy expansion, adds FS debug logging, and a local macOS build script. ChangesDesktop Voice Input, Hotkeys, and Settings
SUPER-869 Droid preset specs and preset icon order
FilesTab lazy expansion, FS events debug, and local prod build script
Sequence Diagram(s)sequenceDiagram
participant User
participant Renderer as Renderer (Dashboard/Chat/Terminal)
participant Hook as useVoiceDictation
participant TRPC as tRPC voiceInput.transcribe
participant OpenAI as OpenAI Transcriptions
participant Target as Chat/Terminal Target
User->>Renderer: Press VOICE_INPUT_TOGGLE
Renderer->>Hook: start(target)
Hook->>Hook: capture audio (MediaRecorder)
User-->>Hook: release key (stop)
Hook->>TRPC: base64 audio + mime
TRPC->>OpenAI: POST /v1/audio/transcriptions
OpenAI-->>TRPC: { text }
TRPC-->>Hook: { text }
Hook->>Target: insert transcript
Renderer-->>User: Voice indicator success
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
|
Ready to review this PR? Stage has broken it down into 9 individual chapters for you: Chapters generated by Stage for commit fcfe4a0 on May 28, 2026 9:57pm UTC. |
Greptile SummaryThis PR introduces a native Voice Control dictation feature for the desktop app, covering the full end-to-end path from a configurable keyboard shortcut through
Confidence Score: 4/5The new feature is off by default and isolated behind the The dictation session lifecycle, press-and-hold release tracking, and transcription plumbing are all well-constructed and have solid test coverage. The main concern is apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts needs the
|
| Filename | Overview |
|---|---|
| apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts | Target resolution and text insertion logic; getEditableElement returns document.activeElement without checking it is a descendant of targetElement, risking insertion into an unrelated input when the TiptapPromptEditor event handler is absent. |
| apps/desktop/src/renderer/voice-input/hooks/useVoiceDictation/useVoiceDictation.ts | New hook managing MediaRecorder lifecycle, press-and-hold session state, and async transcription. Race conditions between start/stop are handled via refs; cleanup on unmount looks correct. |
| apps/desktop/src/lib/trpc/routers/voice-input.ts | New tRPC route that forwards audio to OpenAI's transcription API; correctly validates size bounds, handles error types, and checks OAuth token expiry before use. |
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/layout.tsx | Integrates voice dictation into the dashboard layout; hotkey registration with press-and-hold release tracking looks correct; armVoiceReleaseStop correctly cleans up on component unmount via useEffect. |
| apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx | Adds Voice Control section with microphone readiness UI and optimistic toggle; permission status polled every 2 s regardless of current state, which is more frequent than necessary for terminal states. |
| apps/desktop/src/renderer/hotkeys/utils/chord.ts | New module consolidating chord normalization, AltGr suppression, and terminal-reserved chord logic previously inline in resolveHotkeyFromEvent.ts; logic is unchanged, only extracted. |
| apps/desktop/src/renderer/hotkeys/utils/resolveHotkeyFromEvent.ts | Refactored to re-export from the new chord.ts module; Fn/Globe handling added to resolveHotkeyFromEvent; changes are clean and backward-compatible. |
| apps/desktop/src/renderer/hotkeys/hooks/useHotkey/useHotkey.ts | Adds a manual window.addEventListener("keydown") path for standalone Fn/Globe shortcuts; optionsRef avoids stale closures in the listener; cleanup via useEffect return looks correct. |
| packages/local-db/drizzle/0042_add_voice_input_enabled.sql | Minimal additive migration adding nullable voice_input_enabled column with no default; existing rows stay null and the app applies the DEFAULT_VOICE_INPUT_ENABLED default at read time. |
| apps/desktop/src/lib/trpc/routers/permissions/native-permissions.ts | Refactors checkMicrophone to use a richer getMicrophonePermissionStatus helper that maps Electron's four states to the UI-facing MicrophonePermissionStatus type; backward-compatible. |
Sequence Diagram
sequenceDiagram
participant U as User
participant KH as useHotkey (VOICE_INPUT_TOGGLE)
participant G as useVoiceActivationGuard
participant FT as focusTracking
participant VD as useVoiceDictation
participant MR as MediaRecorder
participant tRPC as tRPC (main process)
participant OAI as OpenAI Transcriptions API
participant T as DictationTarget (chat/terminal)
U->>KH: keydown (shortcut)
KH->>G: runVoiceActivationHotkeyEvent()
G->>FT: getFocusedVoiceInputTargetElement()
FT-->>G: targetElement (via DOM or remembered)
G-->>KH: "{ status: allowed, target }"
KH->>KH: armVoiceReleaseStop()
KH->>T: getFocusedVoiceDictationTarget()
T-->>KH: "VoiceDictationTarget { insertTranscript }"
KH->>VD: toggle(target)
VD->>MR: getUserMedia() then recorder.start(250ms)
MR-->>VD: "onstart, phase = listening"
U->>KH: keyup (shortcut released)
KH->>VD: stop()
VD->>MR: recorder.stop()
MR-->>VD: "onstop, phase = processing"
VD->>VD: blobToBase64(chunks)
VD->>tRPC: "voiceInput.transcribe({ audioBase64, mimeType })"
tRPC->>OAI: POST /v1/audio/transcriptions
OAI-->>tRPC: "{ text }"
tRPC-->>VD: "{ text }"
VD->>T: insertTranscript(text)
T->>T: dispatch VOICE_DICTATION_INSERT_EVENT
T-->>VD: "handled = true"
VD-->>U: "phase = success, idle after 1.4s"
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts:23-34
**`getEditableElement` may insert text into the wrong element**
`document.activeElement` is returned without checking that it is a descendant of `targetElement`. When the `TiptapPromptEditor` event handler is absent (editor unmounted or not yet attached), the fallback path calls `getEditableElement(chatTargetElement)`. If at that moment `document.activeElement` is a `<textarea>` or `<input>` elsewhere (e.g., an xterm helper textarea that triggered the remembered-target fallback), `insertTextIntoEditable` will write the dictated transcript into that unrelated element instead of the intended chat input.
### Issue 2 of 3
apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts:49-54
`document.execCommand("insertText")` is deprecated and removed from the HTML Living Standard; Chromium/Electron may silently no-op or drop it in a future update. For content-editable elements, prefer `document.getSelection()` + `Range.insertNode()` or a direct DOM mutation with a dispatched `input` event as the fallback.
```suggestion
if (!element.isContentEditable) {
return false;
}
element.focus();
const selection = window.getSelection();
if (selection && selection.rangeCount > 0) {
const range = selection.getRangeAt(0);
range.deleteContents();
const textNode = document.createTextNode(text);
range.insertNode(textNode);
range.setStartAfter(textNode);
range.collapse(true);
selection.removeAllRanges();
selection.addRange(range);
element.dispatchEvent(new Event("input", { bubbles: true }));
return true;
}
return document.execCommand("insertText", false, text);
```
### Issue 3 of 3
apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx:261-264
**Aggressive permission polling while settings are open**
`permissions.getStatus` is refetched every 2 seconds for the entire time the Voice Control section is visible in settings. On macOS this translates to a `getMediaAccessStatus` IPC call every 2 s, which is unnecessary once the status is `"granted"` or `"denied"` — those states won't change without the user acting in System Settings. Consider polling only while `status` is `"promptable"` or `"unknown"`, and removing `refetchInterval` once a terminal state is observed.
Reviews (1): Last reviewed commit: "Merge upstream main into fork main" | Re-trigger Greptile
| if (activeElement instanceof HTMLInputElement) return activeElement; | ||
| if (activeElement instanceof HTMLTextAreaElement) return activeElement; | ||
| if (activeElement instanceof HTMLElement && activeElement.isContentEditable) { | ||
| return activeElement; | ||
| } | ||
| return targetElement.querySelector( | ||
| "textarea, input, [contenteditable='true']", | ||
| ); | ||
| } | ||
|
|
||
| function insertTextIntoEditable( | ||
| element: HTMLInputElement | HTMLTextAreaElement | HTMLElement, |
There was a problem hiding this comment.
getEditableElement may insert text into the wrong element
document.activeElement is returned without checking that it is a descendant of targetElement. When the TiptapPromptEditor event handler is absent (editor unmounted or not yet attached), the fallback path calls getEditableElement(chatTargetElement). If at that moment document.activeElement is a <textarea> or <input> elsewhere (e.g., an xterm helper textarea that triggered the remembered-target fallback), insertTextIntoEditable will write the dictated transcript into that unrelated element instead of the intended chat input.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts
Line: 23-34
Comment:
**`getEditableElement` may insert text into the wrong element**
`document.activeElement` is returned without checking that it is a descendant of `targetElement`. When the `TiptapPromptEditor` event handler is absent (editor unmounted or not yet attached), the fallback path calls `getEditableElement(chatTargetElement)`. If at that moment `document.activeElement` is a `<textarea>` or `<input>` elsewhere (e.g., an xterm helper textarea that triggered the remembered-target fallback), `insertTextIntoEditable` will write the dictated transcript into that unrelated element instead of the intended chat input.
How can I resolve this? If you propose a fix, please make it concise.| if (!element.isContentEditable) { | ||
| return false; | ||
| } | ||
|
|
||
| element.focus(); | ||
| return document.execCommand("insertText", false, text); |
There was a problem hiding this comment.
document.execCommand("insertText") is deprecated and removed from the HTML Living Standard; Chromium/Electron may silently no-op or drop it in a future update. For content-editable elements, prefer document.getSelection() + Range.insertNode() or a direct DOM mutation with a dispatched input event as the fallback.
| if (!element.isContentEditable) { | |
| return false; | |
| } | |
| element.focus(); | |
| return document.execCommand("insertText", false, text); | |
| if (!element.isContentEditable) { | |
| return false; | |
| } | |
| element.focus(); | |
| const selection = window.getSelection(); | |
| if (selection && selection.rangeCount > 0) { | |
| const range = selection.getRangeAt(0); | |
| range.deleteContents(); | |
| const textNode = document.createTextNode(text); | |
| range.insertNode(textNode); | |
| range.setStartAfter(textNode); | |
| range.collapse(true); | |
| selection.removeAllRanges(); | |
| selection.addRange(range); | |
| element.dispatchEvent(new Event("input", { bubbles: true })); | |
| return true; | |
| } | |
| return document.execCommand("insertText", false, text); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts
Line: 49-54
Comment:
`document.execCommand("insertText")` is deprecated and removed from the HTML Living Standard; Chromium/Electron may silently no-op or drop it in a future update. For content-editable elements, prefer `document.getSelection()` + `Range.insertNode()` or a direct DOM mutation with a dispatched `input` event as the fallback.
```suggestion
if (!element.isContentEditable) {
return false;
}
element.focus();
const selection = window.getSelection();
if (selection && selection.rangeCount > 0) {
const range = selection.getRangeAt(0);
range.deleteContents();
const textNode = document.createTextNode(text);
range.insertNode(textNode);
range.setStartAfter(textNode);
range.collapse(true);
selection.removeAllRanges();
selection.addRange(range);
element.dispatchEvent(new Event("input", { bubbles: true }));
return true;
}
return document.execCommand("insertText", false, text);
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| }); | ||
| const requestMicrophone = | ||
| electronTrpc.permissions.requestMicrophone.useMutation({ | ||
| onSettled: () => { |
There was a problem hiding this comment.
Aggressive permission polling while settings are open
permissions.getStatus is refetched every 2 seconds for the entire time the Voice Control section is visible in settings. On macOS this translates to a getMediaAccessStatus IPC call every 2 s, which is unnecessary once the status is "granted" or "denied" — those states won't change without the user acting in System Settings. Consider polling only while status is "promptable" or "unknown", and removing refetchInterval once a terminal state is observed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx
Line: 261-264
Comment:
**Aggressive permission polling while settings are open**
`permissions.getStatus` is refetched every 2 seconds for the entire time the Voice Control section is visible in settings. On macOS this translates to a `getMediaAccessStatus` IPC call every 2 s, which is unnecessary once the status is `"granted"` or `"denied"` — those states won't change without the user acting in System Settings. Consider polling only while `status` is `"promptable"` or `"unknown"`, and removing `refetchInterval` once a terminal state is observed.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (8)
apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts (1)
19-31: ⚡ Quick winConsider adding a null check for
document.activeElement.The function queries
document.activeElementbut doesn't verify it's non-null before the instance checks. WhileactiveElementis rarely null in practice, TypeScript's strict null checking and the DOM spec allow it.🛡️ Proposed defensive null check
function getEditableElement( targetElement: HTMLElement, ): HTMLInputElement | HTMLTextAreaElement | HTMLElement | null { const activeElement = document.activeElement; + if (!activeElement) { + return targetElement.querySelector( + "textarea, input, [contenteditable='true']", + ); + } if (activeElement instanceof HTMLInputElement) return activeElement; if (activeElement instanceof HTMLTextAreaElement) return activeElement; if (activeElement instanceof HTMLElement && activeElement.isContentEditable) { return activeElement; } return targetElement.querySelector( "textarea, input, [contenteditable='true']", ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts` around lines 19 - 31, getEditableElement currently reads document.activeElement without guarding against null; add a defensive null check for activeElement before performing instanceof or isContentEditable checks. Update the function (getEditableElement) to first assign const activeElement = document.activeElement; if (!activeElement) return targetElement.querySelector("textarea, input, [contenteditable='true']"); otherwise proceed with the existing instanceof HTMLInputElement / HTMLTextAreaElement / isContentEditable checks so TypeScript strict-null checks are satisfied and runtime nulls are handled.apps/desktop/src/lib/trpc/routers/voice-input.ts (1)
75-138: 💤 Low valueConsider using
UNAUTHORIZEDfor OpenAI authentication errors.Lines 121-125 map all OpenAI errors (including 401/403 authentication failures) to
BAD_REQUEST. For consistency with line 79 (which usesPRECONDITION_FAILEDfor missing API key), authentication failures from OpenAI could useUNAUTHORIZEDto better signal the error category to clients.♻️ Optional: distinguish auth errors
if (!response.ok) { + const errorMessage = await readOpenAIError(response); + if (response.status === 401 || response.status === 403) { + throw new TRPCError({ + code: "UNAUTHORIZED", + message: errorMessage, + }); + } throw new TRPCError({ code: "BAD_REQUEST", - message: await readOpenAIError(response), + message: errorMessage, }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/lib/trpc/routers/voice-input.ts` around lines 75 - 138, The current voice dictation mutation maps all non-OK OpenAI responses to a BAD_REQUEST TRPCError; update the error mapping in the mutation that calls OPENAI_TRANSCRIPTION_URL (inside the .mutation handler) to detect authentication failures (response.status === 401 || response.status === 403) and throw a TRPCError with code "UNAUTHORIZED" (using the same message from await readOpenAIError(response)); keep other non-OK responses as BAD_REQUEST. Ensure you only change the error branch after the fetch and before parsing response.json().apps/desktop/src/main/windows/main.ts (1)
130-146: ⚡ Quick winKeep the permission policy as audio-only (current voice input doesn’t request video)
- The only renderer
getUserMediausage for voice dictation requestsaudioonly (novideo), so denyingaudio+videotogether won’t affect the current microphone flow.- Add a brief rationale comment to document that
audio && !videois an intentional privacy/security choice.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/main/windows/main.ts` around lines 130 - 146, Add a brief inline comment above the window.webContents.session.setPermissionRequestHandler media-permission branch to state that the app intentionally allows only audio and explicitly denies combined audio+video for privacy/security—keep the existing check in the permission === "media" branch that computes mediaTypes and calls callback(mediaTypes.includes("audio") && !mediaTypes.includes("video")); do not change logic, only add the explanatory comment near the mediaTypes handling to document that current getUserMedia use is audio-only (voice dictation) and video is intentionally disallowed.apps/desktop/src/lib/trpc/routers/settings/voice-input.test.ts (1)
49-53: ⚡ Quick winMake migration path resolution independent of
process.cwd().Using
process.cwd()here can break when tests run fromapps/desktop(or any non-repo-root cwd). Resolve from the test file location instead to avoid environment-dependent failures.Proposed fix
+import { dirname } from "node:path"; +import { fileURLToPath } from "node:url"; import { readFileSync } from "node:fs"; import { resolve } from "node:path"; @@ function applyVoiceInputMigration() { + const currentDir = dirname(fileURLToPath(import.meta.url)); const migrationSql = readFileSync( resolve( - process.cwd(), - "packages/local-db/drizzle/0042_add_voice_input_enabled.sql", + currentDir, + "../../../../../../../packages/local-db/drizzle/0042_add_voice_input_enabled.sql", ), "utf8", );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/lib/trpc/routers/settings/voice-input.test.ts` around lines 49 - 53, The test currently resolves the migration SQL using process.cwd() when reading migrationSql, which makes the path environment-dependent; change the resolution to be relative to the test file instead (use __dirname or convert import.meta.url to a file path) and update the resolve(...) call that wraps readFileSync so it builds the path from the test module location to packages/local-db/drizzle/0042_add_voice_input_enabled.sql; keep the readFileSync(migrationSql) usage but replace process.cwd() with a path computed from the test file's directory so tests run correctly regardless of the current working directory.apps/desktop/src/renderer/routes/_authenticated/_dashboard/layout.tsx (1)
131-132: 💤 Low valueVerify event listener capture phase consistency.
The
keyuplistener uses capture phase (true) while theblurlistener does not. This is likely intentional sincekeyupneeds to intercept before other handlers, andblurfollows standard focus event handling. Confirm this mixed approach aligns with the intended press-and-hold behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/renderer/routes/_authenticated/_dashboard/layout.tsx` around lines 131 - 132, The two event listeners use inconsistent capture settings: window.addEventListener("keyup", stopFromRelease, true) uses capture while window.addEventListener("blur", stopFromRelease) does not; confirm intended press-and-hold behavior and make the phase consistent. If you need to intercept blur during capture like keyup, add the third arg true to the blur listener (referencing stopFromRelease), otherwise remove the capture flag from the keyup listener so both use the bubble phase; update whichever call (the addEventListener for "keyup" or "blur") so both use the same capture boolean and add a brief inline comment explaining why capture was chosen for stopFromRelease.apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx (1)
257-261: 💤 Low valueVerify polling interval for permission status.
The 2-second
refetchIntervalfor microphone permission status might be aggressive. While gated byshowVoiceInput, confirm this polling frequency is acceptable for permission checks, especially on systems where these checks may incur non-trivial overhead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx` around lines 257 - 261, The permission polling in electronTrpc.permissions.getStatus.useQuery is currently set to refetchInterval: 2000 while only gated by showVoiceInput; reduce the polling frequency or switch to event-driven checks to avoid unnecessary overhead — e.g., change refetchInterval to a less aggressive value (like 10_000+ ms) or remove periodic polling and trigger a refetch when the component mounts, gains focus, or when showVoiceInput toggles; update the options passed to useQuery accordingly so microphone permission checks are less frequent and only run when needed.apps/desktop/src/renderer/hotkeys/stores/keyboardLayoutStore.ts (1)
52-69: ⚡ Quick winImport/subscribe failures escape the retry path.
onErroronly handles failures emitted by an established subscription. Ifawait import("renderer/lib/trpc-client")(or the synchronoussubscribecall) rejects/throws, the promise fromstartKeyboardLayoutSync()rejects and is swallowed byvoid, so there's no retry andmapstaysnullindefinitely. Wrapping the body in try/catch that schedules the same backoff retry would make startup resilient.♻️ Proposed guard
async function startKeyboardLayoutSync(): Promise<void> { - const { electronTrpcClient } = await import("renderer/lib/trpc-client"); - electronTrpcClient.keyboardLayout.changes.subscribe(undefined, { - onData: (data) => { - retryAttempt = 0; - applySnapshot(data); - }, - onError: (err) => { - console.error("[keyboardLayoutStore] subscription error:", err); - const idx = Math.min(retryAttempt, RETRY_BACKOFF_MS.length - 1); - const delay = RETRY_BACKOFF_MS[idx] ?? 10_000; - retryAttempt++; - setTimeout(() => { - void startKeyboardLayoutSync(); - }, delay); - }, - }); + const scheduleRetry = (err: unknown) => { + console.error("[keyboardLayoutStore] subscription error:", err); + const idx = Math.min(retryAttempt, RETRY_BACKOFF_MS.length - 1); + const delay = RETRY_BACKOFF_MS[idx] ?? 10_000; + retryAttempt++; + setTimeout(() => { + void startKeyboardLayoutSync(); + }, delay); + }; + try { + const { electronTrpcClient } = await import("renderer/lib/trpc-client"); + electronTrpcClient.keyboardLayout.changes.subscribe(undefined, { + onData: (data) => { + retryAttempt = 0; + applySnapshot(data); + }, + onError: scheduleRetry, + }); + } catch (err) { + scheduleRetry(err); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/renderer/hotkeys/stores/keyboardLayoutStore.ts` around lines 52 - 69, startKeyboardLayoutSync currently only retries on subscription onError but will fail-fast if the dynamic import or the synchronous subscribe call throws; wrap the entire body of startKeyboardLayoutSync in a try/catch so any thrown/rejected error is caught, log the error, compute the same backoff delay using retryAttempt and RETRY_BACKOFF_MS, increment retryAttempt, and schedule a retry by calling startKeyboardLayoutSync after that delay; keep the existing onError retry path for subscription errors and ensure applySnapshot and electronTrpcClient.keyboardLayout.changes.subscribe remain unchanged inside the try block.apps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.ts (1)
108-120: ⚡ Quick winDrop
FnLockfrom the unsupported-Fn gate (it’s not exposed viaKeyboardEvent)
getUnsupportedShortcutReasontreatsFnLockthe same as transientFn, butKeyboardEvent.getModifierState("FnLock")isn’t supported/exposed in browsers (including macOS Chromium), so the “FnLock blocks recording every shortcut” case is unlikely. IffnActiveever does evaluate true on some platform, this check runs before the Backspace/Delete unassign branch, which would then be blocked too.Consider gating on
getModifierState("Fn")only (or removing theFnLockcheck).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.ts` around lines 108 - 120, The gate that sets fnActive should stop checking for "FnLock" because KeyboardEvent.getModifierState("FnLock") isn’t exposed; update the fnActive assignment to only use event.getModifierState?.("Fn") === true (remove the "FnLock" check) so the existing logic in getUnsupportedShortcutReason (the fnActive, hasNonFnKey, isFnShortcutToken(key|code), UNSUPPORTED_FN_SHORTCUT_REASON branch) continues to work correctly and does not incorrectly block the Backspace/Delete unassign branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.spec/improvements/SUPER-869/BRIEF.md:
- Around line 47-53: The fenced code block containing keys id, label,
description, command and promptCommand lacks a language specifier; update the
opening backticks for that block (the triple-tick before the lines with id:
"droid" ...) to include a language (for example ```yaml or ```typescript) so the
snippet is markdown-compliant and gets proper syntax highlighting.
- Around line 32-41: The fenced code block containing the keys presetId, label,
description, command, args, promptTransport, promptArgs, and env should include
a language specifier; update the opening fence from ``` to ```typescript (or
another appropriate language) so the block becomes ```typescript and enables
proper syntax highlighting for the snippet in BRIEF.md.
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 408-423: Update the conditional className/markup for the <p
id="voice-input-status"> so that when setVoiceInputEnabled.isError is true the
rendered element includes the explicit "select-text cursor-text" classes (e.g.,
by appending these classes to the error branch of the className expression or
wrapping the error string in a span with those classes); keep the existing
non-error classes ("text-xs text-destructive" vs "text-xs
text-muted-foreground") for other states and only apply select-text cursor-text
to the error message rendered by setVoiceInputEnabled.isError.
---
Nitpick comments:
In `@apps/desktop/src/lib/trpc/routers/settings/voice-input.test.ts`:
- Around line 49-53: The test currently resolves the migration SQL using
process.cwd() when reading migrationSql, which makes the path
environment-dependent; change the resolution to be relative to the test file
instead (use __dirname or convert import.meta.url to a file path) and update the
resolve(...) call that wraps readFileSync so it builds the path from the test
module location to packages/local-db/drizzle/0042_add_voice_input_enabled.sql;
keep the readFileSync(migrationSql) usage but replace process.cwd() with a path
computed from the test file's directory so tests run correctly regardless of the
current working directory.
In `@apps/desktop/src/lib/trpc/routers/voice-input.ts`:
- Around line 75-138: The current voice dictation mutation maps all non-OK
OpenAI responses to a BAD_REQUEST TRPCError; update the error mapping in the
mutation that calls OPENAI_TRANSCRIPTION_URL (inside the .mutation handler) to
detect authentication failures (response.status === 401 || response.status ===
403) and throw a TRPCError with code "UNAUTHORIZED" (using the same message from
await readOpenAIError(response)); keep other non-OK responses as BAD_REQUEST.
Ensure you only change the error branch after the fetch and before parsing
response.json().
In `@apps/desktop/src/main/windows/main.ts`:
- Around line 130-146: Add a brief inline comment above the
window.webContents.session.setPermissionRequestHandler media-permission branch
to state that the app intentionally allows only audio and explicitly denies
combined audio+video for privacy/security—keep the existing check in the
permission === "media" branch that computes mediaTypes and calls
callback(mediaTypes.includes("audio") && !mediaTypes.includes("video")); do not
change logic, only add the explanatory comment near the mediaTypes handling to
document that current getUserMedia use is audio-only (voice dictation) and video
is intentionally disallowed.
In
`@apps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.ts`:
- Around line 108-120: The gate that sets fnActive should stop checking for
"FnLock" because KeyboardEvent.getModifierState("FnLock") isn’t exposed; update
the fnActive assignment to only use event.getModifierState?.("Fn") === true
(remove the "FnLock" check) so the existing logic in
getUnsupportedShortcutReason (the fnActive, hasNonFnKey,
isFnShortcutToken(key|code), UNSUPPORTED_FN_SHORTCUT_REASON branch) continues to
work correctly and does not incorrectly block the Backspace/Delete unassign
branch.
In `@apps/desktop/src/renderer/hotkeys/stores/keyboardLayoutStore.ts`:
- Around line 52-69: startKeyboardLayoutSync currently only retries on
subscription onError but will fail-fast if the dynamic import or the synchronous
subscribe call throws; wrap the entire body of startKeyboardLayoutSync in a
try/catch so any thrown/rejected error is caught, log the error, compute the
same backoff delay using retryAttempt and RETRY_BACKOFF_MS, increment
retryAttempt, and schedule a retry by calling startKeyboardLayoutSync after that
delay; keep the existing onError retry path for subscription errors and ensure
applySnapshot and electronTrpcClient.keyboardLayout.changes.subscribe remain
unchanged inside the try block.
In `@apps/desktop/src/renderer/routes/_authenticated/_dashboard/layout.tsx`:
- Around line 131-132: The two event listeners use inconsistent capture
settings: window.addEventListener("keyup", stopFromRelease, true) uses capture
while window.addEventListener("blur", stopFromRelease) does not; confirm
intended press-and-hold behavior and make the phase consistent. If you need to
intercept blur during capture like keyup, add the third arg true to the blur
listener (referencing stopFromRelease), otherwise remove the capture flag from
the keyup listener so both use the bubble phase; update whichever call (the
addEventListener for "keyup" or "blur") so both use the same capture boolean and
add a brief inline comment explaining why capture was chosen for
stopFromRelease.
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`:
- Around line 257-261: The permission polling in
electronTrpc.permissions.getStatus.useQuery is currently set to refetchInterval:
2000 while only gated by showVoiceInput; reduce the polling frequency or switch
to event-driven checks to avoid unnecessary overhead — e.g., change
refetchInterval to a less aggressive value (like 10_000+ ms) or remove periodic
polling and trigger a refetch when the component mounts, gains focus, or when
showVoiceInput toggles; update the options passed to useQuery accordingly so
microphone permission checks are less frequent and only run when needed.
In `@apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts`:
- Around line 19-31: getEditableElement currently reads document.activeElement
without guarding against null; add a defensive null check for activeElement
before performing instanceof or isContentEditable checks. Update the function
(getEditableElement) to first assign const activeElement =
document.activeElement; if (!activeElement) return
targetElement.querySelector("textarea, input, [contenteditable='true']");
otherwise proceed with the existing instanceof HTMLInputElement /
HTMLTextAreaElement / isContentEditable checks so TypeScript strict-null checks
are satisfied and runtime nulls are handled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 32d53db8-1f94-4c16-ac23-49b291f7b344
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (74)
.spec/improvements/SUPER-869/BRIEF.md.spec/improvements/SUPER-869/SCOPE.md.spec/improvements/SUPER-869/follow-ups.mdapps/desktop/package.jsonapps/desktop/scripts/build-local-prod.shapps/desktop/src/lib/trpc/routers/index.tsapps/desktop/src/lib/trpc/routers/permissions.test.tsapps/desktop/src/lib/trpc/routers/permissions/native-permissions.test.tsapps/desktop/src/lib/trpc/routers/permissions/native-permissions.tsapps/desktop/src/lib/trpc/routers/settings/index.tsapps/desktop/src/lib/trpc/routers/settings/voice-input.test.tsapps/desktop/src/lib/trpc/routers/voice-input.tsapps/desktop/src/main/windows/main.tsapps/desktop/src/renderer/components/Chat/ChatInterface/components/ChatInputFooter/ChatInputFooter.tsxapps/desktop/src/renderer/components/Chat/ChatInterface/components/TiptapPromptEditor/TiptapPromptEditor.tsxapps/desktop/src/renderer/hotkeys/display.tsapps/desktop/src/renderer/hotkeys/hooks/index.tsapps/desktop/src/renderer/hotkeys/hooks/useHotkey/useHotkey.test.tsxapps/desktop/src/renderer/hotkeys/hooks/useHotkey/useHotkey.tsapps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/index.tsapps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.test.tsapps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.tsapps/desktop/src/renderer/hotkeys/index.tsapps/desktop/src/renderer/hotkeys/registry.test.tsapps/desktop/src/renderer/hotkeys/registry.tsapps/desktop/src/renderer/hotkeys/stores/browserLocalStorage.tsapps/desktop/src/renderer/hotkeys/stores/hotkeyOverridesStore.tsapps/desktop/src/renderer/hotkeys/stores/keyboardLayoutStore.tsapps/desktop/src/renderer/hotkeys/stores/keyboardPreferencesStore.tsapps/desktop/src/renderer/hotkeys/types.tsapps/desktop/src/renderer/hotkeys/utils/binding.tsapps/desktop/src/renderer/hotkeys/utils/chord.tsapps/desktop/src/renderer/hotkeys/utils/fnKey.tsapps/desktop/src/renderer/hotkeys/utils/index.tsapps/desktop/src/renderer/hotkeys/utils/resolveHotkeyFromEvent.test.tsapps/desktop/src/renderer/hotkeys/utils/resolveHotkeyFromEvent.tsapps/desktop/src/renderer/routes/_authenticated/_dashboard/layout.tsxapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/FilesTab.tsxapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/components/FilesTab/hooks/useFilesTabBridge/useFilesTabBridge.tsapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/hooks/usePaneRegistry/components/ChatPane/components/WorkspaceChatInterface/components/ChatInputFooter/ChatInputFooter.tsxapps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/hooks/usePaneRegistry/components/TerminalPane/TerminalPane.tsxapps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.microphone-readiness.test.tsxapps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.test.tsxapps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsxapps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.voice-shortcut-link.test.tsxapps/desktop/src/renderer/routes/_authenticated/settings/behavior/page.tsxapps/desktop/src/renderer/routes/_authenticated/settings/keyboard/page.tsxapps/desktop/src/renderer/routes/_authenticated/settings/keyboard/voice-shortcut.test.tsxapps/desktop/src/renderer/routes/_authenticated/settings/utils/settings-search/settings-search.test.tsapps/desktop/src/renderer/routes/_authenticated/settings/utils/settings-search/settings-search.tsapps/desktop/src/renderer/routes/_authenticated/settings/utils/voice-shortcut-links/index.tsapps/desktop/src/renderer/routes/_authenticated/settings/utils/voice-shortcut-links/voice-shortcut-links.tsapps/desktop/src/renderer/screens/main/components/WorkspaceView/ContentView/TabsContent/Terminal/Terminal.tsxapps/desktop/src/renderer/voice-input/components/VoiceDictationIndicator/VoiceDictationIndicator.tsxapps/desktop/src/renderer/voice-input/components/VoiceDictationIndicator/index.tsapps/desktop/src/renderer/voice-input/events.tsapps/desktop/src/renderer/voice-input/focusTracking.tsapps/desktop/src/renderer/voice-input/hooks/useVoiceDictation/index.tsapps/desktop/src/renderer/voice-input/hooks/useVoiceDictation/useVoiceDictation.tsapps/desktop/src/renderer/voice-input/terminalVoiceTargets.tsapps/desktop/src/renderer/voice-input/types.tsapps/desktop/src/renderer/voice-input/useVoiceActivationGuard.test.tsapps/desktop/src/renderer/voice-input/useVoiceActivationGuard.tsapps/desktop/src/renderer/voice-input/voice-preferences.integration.test.tsxapps/desktop/src/renderer/voice-input/voiceDictationTarget.test.tsapps/desktop/src/renderer/voice-input/voiceDictationTarget.tsapps/desktop/src/shared/constants.tspackages/host-service/src/events/event-bus.tspackages/local-db/drizzle/0042_add_voice_input_enabled.sqlpackages/local-db/drizzle/meta/0042_snapshot.jsonpackages/local-db/drizzle/meta/_journal.jsonpackages/local-db/src/schema/schema.tspackages/ui/src/assets/icons/preset-icons/index.tspackages/workspace-fs/src/watch.ts
| <p | ||
| id="voice-input-status" | ||
| className={ | ||
| setVoiceInputEnabled.isError | ||
| ? "text-xs text-destructive" | ||
| : "text-xs text-muted-foreground" | ||
| } | ||
| > | ||
| {setVoiceInputEnabled.isError | ||
| ? "Voice preference could not be saved" | ||
| : isVoiceInputLoading | ||
| ? "Loading voice preference" | ||
| : voiceInputEnabled | ||
| ? "Voice control is enabled" | ||
| : "Voice control is disabled"} | ||
| </p> |
There was a problem hiding this comment.
Add select-text cursor-text classes to error message.
The error text "Voice preference could not be saved" violates the coding guideline requiring error text to be selectable with explicit select-text cursor-text classes in apps/desktop/**/*.{tsx,jsx} files.
🔧 Proposed fix
<p
id="voice-input-status"
- className={
+ className={cn(
+ "select-text cursor-text",
setVoiceInputEnabled.isError
? "text-xs text-destructive"
: "text-xs text-muted-foreground"
- }
+ )}
>As per coding guidelines: Error text must be selectable by users with explicit select-text cursor-text classes; renderer sets user-select: none on body.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <p | |
| id="voice-input-status" | |
| className={ | |
| setVoiceInputEnabled.isError | |
| ? "text-xs text-destructive" | |
| : "text-xs text-muted-foreground" | |
| } | |
| > | |
| {setVoiceInputEnabled.isError | |
| ? "Voice preference could not be saved" | |
| : isVoiceInputLoading | |
| ? "Loading voice preference" | |
| : voiceInputEnabled | |
| ? "Voice control is enabled" | |
| : "Voice control is disabled"} | |
| </p> | |
| <p | |
| id="voice-input-status" | |
| className={cn( | |
| "select-text cursor-text", | |
| setVoiceInputEnabled.isError | |
| ? "text-xs text-destructive" | |
| : "text-xs text-muted-foreground" | |
| )} | |
| > | |
| {setVoiceInputEnabled.isError | |
| ? "Voice preference could not be saved" | |
| : isVoiceInputLoading | |
| ? "Loading voice preference" | |
| : voiceInputEnabled | |
| ? "Voice control is enabled" | |
| : "Voice control is disabled"} | |
| </p> |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@apps/desktop/src/renderer/routes/_authenticated/settings/behavior/components/BehaviorSettings/BehaviorSettings.tsx`
around lines 408 - 423, Update the conditional className/markup for the <p
id="voice-input-status"> so that when setVoiceInputEnabled.isError is true the
rendered element includes the explicit "select-text cursor-text" classes (e.g.,
by appending these classes to the error branch of the className expression or
wrapping the error string in a span with those classes); keep the existing
non-error classes ("text-xs text-destructive" vs "text-xs
text-muted-foreground") for other states and only apply select-text cursor-text
to the error message rendered by setVoiceInputEnabled.isError.
There was a problem hiding this comment.
6 issues found across 75 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/desktop/src/renderer/hotkeys/stores/browserLocalStorage.ts">
<violation number="1" location="apps/desktop/src/renderer/hotkeys/stores/browserLocalStorage.ts:6">
P2: Local storage operations are not exception-safe; `window.localStorage` access should be wrapped in try/catch to prevent DOMException from crashing zustand persist flows.</violation>
</file>
<file name="apps/desktop/src/renderer/hotkeys/utils/chord.ts">
<violation number="1" location="apps/desktop/src/renderer/hotkeys/utils/chord.ts:63">
P1: AltGraph text-entry events are normalized into plain-key chords, which can cause false-positive hotkey matches on international keyboards.</violation>
</file>
<file name="apps/desktop/src/lib/trpc/routers/voice-input.ts">
<violation number="1" location="apps/desktop/src/lib/trpc/routers/voice-input.ts:71">
P1: audioBase64 input is decoded into memory before size limits are enforced, enabling potential memory exhaustion via oversized payload</violation>
</file>
<file name="apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts">
<violation number="1" location="apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts:23">
P1: Only use `document.activeElement` when it belongs to the current voice target container. Without a containment check, dictation can be inserted into whichever unrelated input is currently focused.</violation>
</file>
Architecture diagram
sequenceDiagram
participant User
participant UI as Desktop UI
participant Hotkey as Hotkey System
participant Guard as Voice Activation Guard
participant Settings as Settings Service
participant Mic as MediaRecorder
participant TRPC as tRPC Router
participant OpenAI as OpenAI API
participant Target as Voice Dictation Target
Note over User,Target: Voice Control Dictation Flow
User->>UI: Enable Voice Control in Settings
UI->>Settings: setVoiceInputEnabled(true)
Settings-->>UI: persisted
User->>UI: Press activation shortcut (e.g. ⌘⇧V)
UI->>Hotkey: VOICE_INPUT_TOGGLE fires
Hotkey->>Guard: runVoiceActivationHotkeyEvent()
alt Voice Control disabled
Guard-->>UI: { status: "disabled" }
UI->>UI: Show error: "Voice Control is off"
else No supported target focused
Guard->>Guard: evaluateVoiceActivationGuard()
Guard-->>UI: { status: "unsupported-target" }
UI->>UI: Show error: "Focus chat or terminal"
else Activation allowed
Guard->>Guard: check target (chat/terminal)
Guard-->>UI: { status: "allowed", target }
UI->>UI: Arm release-to-stop handler
end
Note over UI,Target: Dictation Recording (press-and-hold)
UI->>Mic: start(target)
Mic->>Mic: getUserMedia({ audio })
Mic-->>UI: MediaStream
UI->>Mic: new MediaRecorder(stream)
Mic-->>UI: Recording chunks
Note over UI,Target: Shortcut Released
User->>UI: Release activation key
UI->>Mic: stop()
Mic->>UI: onstop fires with audio chunks
UI->>UI: Build audio Blob from chunks
UI->>TRPC: transcribe({ audioBase64, mimeType })
Note over TRPC,OpenAI: App-Side Transcription via tRPC
TRPC->>TRPC: resolveOpenAIApiKey()
alt No API key found
TRPC-->>UI: TRPCError PRECONDITION_FAILED
UI->>UI: Show error: "Connect OpenAI in Settings"
else API key exists
TRPC->>OpenAI: POST /v1/audio/transcriptions
Note over TRPC,OpenAI: FormData: file, model=gpt-4o-mini-transcribe
OpenAI-->>TRPC: { text: "transcribed text" }
TRPC-->>UI: { text }
end
Note over UI,Target: Insert Transcription into Target
UI->>Target: insertTranscript(text)
alt Target is chat
Target->>Target: dispatch CustomEvent on [data-voice-input-target]
Note over Target: Chat components listen and insert via editor.chain().insertContent()
Target-->>UI: true
else Target is terminal
Target->>Target: terminalRuntimeRegistry.writeInput()
Target-->>UI: true
end
UI->>UI: Show success indicator (1.4s)
UI->>UI: Hide indicator, return to idle
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| const key = normalizeToken(event.code); | ||
| if (isIgnorableKey(key)) return null; | ||
| // AltGr is reported by Chromium as ctrlKey+altKey on Windows/Linux. | ||
| const altGraph = event.getModifierState?.("AltGraph") === true; |
There was a problem hiding this comment.
P1: AltGraph text-entry events are normalized into plain-key chords, which can cause false-positive hotkey matches on international keyboards.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/renderer/hotkeys/utils/chord.ts, line 63:
<comment>AltGraph text-entry events are normalized into plain-key chords, which can cause false-positive hotkey matches on international keyboards.</comment>
<file context>
@@ -0,0 +1,92 @@
+ const key = normalizeToken(event.code);
+ if (isIgnorableKey(key)) return null;
+ // AltGr is reported by Chromium as ctrlKey+altKey on Windows/Linux.
+ const altGraph = event.getModifierState?.("AltGraph") === true;
+ const mods: string[] = [];
+ if (event.metaKey) mods.push("meta");
</file context>
| transcribe: publicProcedure | ||
| .input( | ||
| z.object({ | ||
| audioBase64: z.string().min(1), |
There was a problem hiding this comment.
P1: audioBase64 input is decoded into memory before size limits are enforced, enabling potential memory exhaustion via oversized payload
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/lib/trpc/routers/voice-input.ts, line 71:
<comment>audioBase64 input is decoded into memory before size limits are enforced, enabling potential memory exhaustion via oversized payload</comment>
<file context>
@@ -0,0 +1,140 @@
+ transcribe: publicProcedure
+ .input(
+ z.object({
+ audioBase64: z.string().min(1),
+ mimeType: z.string().min(1).max(120),
+ }),
</file context>
| targetElement: HTMLElement, | ||
| ): HTMLInputElement | HTMLTextAreaElement | HTMLElement | null { | ||
| const activeElement = document.activeElement; | ||
| if (activeElement instanceof HTMLInputElement) return activeElement; |
There was a problem hiding this comment.
P1: Only use document.activeElement when it belongs to the current voice target container. Without a containment check, dictation can be inserted into whichever unrelated input is currently focused.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/renderer/voice-input/voiceDictationTarget.ts, line 23:
<comment>Only use `document.activeElement` when it belongs to the current voice target container. Without a containment check, dictation can be inserted into whichever unrelated input is currently focused.</comment>
<file context>
@@ -0,0 +1,145 @@
+ targetElement: HTMLElement,
+): HTMLInputElement | HTMLTextAreaElement | HTMLElement | null {
+ const activeElement = document.activeElement;
+ if (activeElement instanceof HTMLInputElement) return activeElement;
+ if (activeElement instanceof HTMLTextAreaElement) return activeElement;
+ if (activeElement instanceof HTMLElement && activeElement.isContentEditable) {
</file context>
| @@ -0,0 +1,16 @@ | |||
| import type { StateStorage } from "zustand/middleware"; | |||
There was a problem hiding this comment.
P2: Local storage operations are not exception-safe; window.localStorage access should be wrapped in try/catch to prevent DOMException from crashing zustand persist flows.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/renderer/hotkeys/stores/browserLocalStorage.ts, line 6:
<comment>Local storage operations are not exception-safe; `window.localStorage` access should be wrapped in try/catch to prevent DOMException from crashing zustand persist flows.</comment>
<file context>
@@ -0,0 +1,16 @@
+export const browserLocalStorage: StateStorage = {
+ getItem: (name) => {
+ if (typeof window === "undefined") return null;
+ return window.localStorage.getItem(name);
+ },
+ removeItem: (name) => {
</file context>
RAI v2 — SummaryPR-level: BLOCK (0 approved, 1 block, 2 abstain, 0 defer) Archetype classificationGeneral PR (
Per-strategy verdicts
|
| Model | input | output | cache create | cache read |
|---|---|---|---|---|
| total | 18 | 14,327 | 96,772 | 194,467 |
| sonnet | 18 | 14,327 | 96,772 | 194,467 |
RAI v2 — Author judgment pathway2 outstanding request(s) across 2 category(ies). Clarifications
Resolutions
|
Why
Voice Control gives desktop users a first-class dictation path inside Superset instead of relying on OS-level dictation or external tools. The goal is to make voice input feel native to the app: discoverable in Settings, controllable with a user-selected shortcut, constrained to supported text targets, and explicit about recording/processing state.
This is especially useful for the two highest-friction text-entry surfaces in the desktop app:
The implementation intentionally keeps the preference local, starts disabled by default, and adds microphone readiness UI so users understand what has to be enabled before dictation works.
What This Does
Voice Control settings and discoverability
voiceInputEnabledsetting with a default disabled state.Shortcut registration, editing, and persistence
event.key, modifier state, or without a stableevent.code.Dictation capture and insertion
MediaRecorder.User feedback while dictating
Microphone and permission behavior
Media
Screen Recording
TODO: Attach a short screen recording showing:
voicein Keyboard settings and seeing the Voice Control shortcut below the search bar.Settings Snapshot
Screen.Recording.2026-05-28.at.3.12.09.PM.mov
Validation
bun run --cwd apps/desktop typecheckgit ls-files -z | xargs -0 bunx @biomejs/biome@2.4.2 check./scripts/check-desktop-git-env.sh./scripts/check-git-ref-strings.shbash ./scripts/check-simple-git-usage.shbun test apps/desktop/src/renderer/hotkeys/hooks/useRecordHotkeys/useRecordHotkeys.test.ts apps/desktop/src/renderer/hotkeys/hooks/useHotkey/useHotkey.test.tsx apps/desktop/src/renderer/hotkeys/utils/resolveHotkeyFromEvent.test.ts apps/desktop/src/renderer/hotkeys/registry.test.ts apps/desktop/src/renderer/routes/_authenticated/settings/keyboard/voice-shortcut.test.tsx apps/desktop/src/renderer/voice-input/voiceDictationTarget.test.ts apps/desktop/src/renderer/voice-input/useVoiceActivationGuard.test.ts apps/desktop/src/renderer/voice-input/voice-preferences.integration.test.tsx apps/desktop/src/lib/trpc/routers/settings/voice-input.test.ts apps/desktop/src/lib/trpc/routers/permissions.test.ts apps/desktop/src/lib/trpc/routers/permissions/native-permissions.test.tsFocused voice/hotkey test run result:
126 pass, 0 fail.Note: the normal
bun run lintcommand cannot complete from my local fork-main checkout because that checkout currently has untracked nested worktrees under.claude/worktrees, and Biome scans those as nested root configurations. The tracked-file Biome check and the lint script's auxiliary checks pass.Risks and Notes
Summary by cubic
Adds Voice Control dictation to the desktop app with a configurable shortcut, native settings, and safe insertion into chat and terminal. Default is off; users can enable it in Settings and dictate via press-and-hold with clear recording/processing feedback.
New Features
MediaRecorderin the renderer and tRPC in the app; transcribes with OpenAI’sgpt-4o-mini-transcribe; press-and-hold to record, release to stop.Migration
voice_input_enabledsetting (defaults to false); no manual migration needed.Written for commit 499f1b3. Summary will update on new commits.
Review in cubic
Summary by CodeRabbit
New Features
Tests
Chores