feat: add voice dictation using OpenAI Whisper & ElevenLabs #3079

jackjackbits · 2025-06-25T21:34:49Z

Voice Dictation Feature - PR Summary

Overview

This PR adds voice dictation functionality to Goose Desktop, allowing users to input messages using their microphone with support for both OpenAI Whisper and ElevenLabs speech-to-text services.

Key Features

1. Voice Input UI

Microphone button in chat input area (next to send button)
Recording indicator with duration and file size monitoring
Real-time waveform visualization during recording
Visual feedback for recording/transcribing states

2. Dual Provider Support

OpenAI Whisper: Uses existing OpenAI API key, no additional configuration needed
ElevenLabs Speech-to-Text: Alternative provider with advanced features
Smart provider switching: Automatically available based on configured API keys

3. Settings & Configuration

New Voice Dictation section in Settings
Toggle to enable/disable the feature
Provider selection dropdown
ElevenLabs API key configuration with secure storage
Provider-specific information and features

4. Technical Implementation

Backend (Rust)

New /audio/transcribe endpoint for OpenAI Whisper
New /audio/transcribe/elevenlabs endpoint for ElevenLabs
/audio/config endpoint to check provider availability
25MB file size limit for both providers
Support for multiple audio formats (webm, mp3, mp4, m4a, wav)
Automatic API key migration to secure storage for ElevenLabs

Frontend (TypeScript)

useWhisper hook for recording management
useDictationSettings hook for settings persistence
WaveformVisualizer component for audio feedback
Microphone permission handling
Real-time size and duration monitoring
Automatic recording stop at 10 minutes or 25MB

5. Security & Privacy

All API keys stored securely
Audio data transmitted as base64 over HTTPS
No audio stored locally after transcription
Microphone permissions requested only when needed

File Changes

New Files

crates/goose-server/src/routes/audio.rs - Audio transcription endpoints
ui/desktop/src/hooks/useWhisper.ts - Recording and transcription logic
ui/desktop/src/hooks/useDictationSettings.ts - Settings management
ui/desktop/src/components/settings/dictation/DictationSection.tsx - Settings UI
ui/desktop/src/components/WaveformVisualizer.tsx - Audio visualization

Modified Files

ui/desktop/src/components/ChatInput.tsx - Added microphone button
ui/desktop/src/components/settings/SettingsView.tsx - Added dictation section
ui/desktop/src/main.ts - Added microphone permission handling
ui/desktop/src/preload.ts - Exposed permission APIs
Various server files to register new routes

Testing

All Rust tests passing
TypeScript compilation successful
ESLint and formatting checks passed
Manual testing completed with both providers

Future Enhancements

Real-time streaming transcription
Custom vocabulary support
Local Whisper model support
Voice activity detection

Breaking Changes

None - Feature is disabled by default and requires user opt-in.

Screenshots

- Add microphone button that appears when OpenAI is configured - Implement real-time waveform visualization during recording - Add backend /audio/transcribe endpoint with security measures: - 25MB file size limit with 413 status code - 30-second timeout for API calls - Proper authentication via X-Secret-Key - Add visual feedback during transcription - Show recording duration and estimated file size - Warn users when approaching 25MB limit - Auto-stop recording at 10 minutes or 25MB - Add comprehensive integration tests - Fix ESLint configuration and MessageCopyLink warning Security: API keys remain backend-only, no frontend exposure

Kvadratni · 2025-06-25T21:59:10Z

We all wanted this for a long time and attempted to implement it in different ways.
But I have one reservation and this is tying us to one provider.

And I wonder if it's worth bringing it into the settings page where you can enable this feature and then select from the list of providers which will then ask you for a token or reuse a token. So then you're not tied into OpenAI, you have 11 abs and others in there too.

jackjackbits · 2025-06-25T22:39:53Z

We all wanted this for a long time and attempted to implement it in different ways.

But I have one reservation and this is tying us to one provider.

And I wonder if it's worth bringing it into the settings page where you can enable this feature and then select from the list of providers which will then ask you for a token or reuse a token. So then you're not tied into OpenAI, you have 11 abs and others in there too.

hear you. i did it this way out of ease and piggybacking off something already configured as a bonus. i'm not sure it's really that better than local os dictation in the end. but putting in settings and maybe allowing local or other providers (i'm not sure who there is besides 11labs) would be a good approach. if local, it's effectively just a shortcut.

AaronGoldsmith · 2025-06-25T22:48:45Z

crates/goose-server/src/routes/audio.rs

+        })?;
+
+    let response = client
+        .post("https://api.openai.com/v1/audio/transcriptions")


Should we check/use the OPENAI_HOST from the environment? Similar to

goose/crates/goose/src/providers/openai.rs

Lines 54 to 56 in c3acddc

let host: String = config

.get_param("OPENAI_HOST")

.unwrap_or_else(|_| "https://api.openai.com".to_string());

hugomd · 2025-06-26T05:25:37Z

Have we considered pulling in https://github.com/openai/whisper (or https://github.com/m-bain/whisperX) directly?

It runs exceedingly well locally ✨

jackjackbits · 2025-06-26T10:51:48Z

Have we considered pulling in https://github.com/openai/whisper (or https://github.com/m-bain/whisperX) directly?

It runs exceedingly well locally ✨

local macos is pretty good. don't know if it warrants increasing project/binary size?

- Add microphone button to chat input with recording visualization - Support both OpenAI Whisper and ElevenLabs speech-to-text - Add Voice Dictation settings section with provider selection - Implement secure API key storage for ElevenLabs - Add real-time waveform visualization during recording - Handle microphone permissions properly - Add 25MB file size limit and 10-minute duration limit - Support multiple audio formats (webm, mp3, mp4, m4a, wav) - Feature is opt-in and disabled by default

jackjackbits · 2025-06-26T13:12:12Z

We all wanted this for a long time and attempted to implement it in different ways. But I have one reservation and this is tying us to one provider.

And I wonder if it's worth bringing it into the settings page where you can enable this feature and then select from the list of providers which will then ask you for a token or reuse a token. So then you're not tied into OpenAI, you have 11 abs and others in there too.

done

baxen

Looks great!

baxen · 2025-06-27T06:13:49Z

VOICE_DICTATION_PR.md

@@ -0,0 +1,80 @@
+# Voice Dictation Feature - PR Summary


nit: going to exclude this from the commit just to keep the top level of the code clean here - this is pretty well covered by inline comments

ah yes, forgot to remove that. sorry.

baxen · 2025-06-27T06:15:30Z

ui/desktop/src/main.ts

  }
 });

+// Handle macOS dictation


nit: also planning to remove this, this looks like a leftover from a different attempt maybe, it appears unreachable to me at the moment. please let me know if i got that wrong!

oops! again you're right. left over. thank you.

QBlockQ · 2025-06-27T13:31:51Z

@jackjackbits @Kvadratni

Stunning! We just built something similar in our Meeting app:

Multi-provider STT (Whisper + ElevenLabs + Web Speech fallback) with all processing moved to secure backend routes. Currently testing, but early results show: no vendor lock-in, zero client-side API exposure, smart caching should cut redundant calls significantly.

Your settings-based provider selection is exactly right - gives users choice while keeping tokens secure. The "11 labs and others" flexibility is gold for production apps.

One insight from building: 3-layer fallbacks seem essential since STT services can be unreliable during peak times.

Solid work! 🔥

* upstream/main: Add a reference for recipes (block#3099) feat: add voice dictation using OpenAI Whisper & ElevenLabs (block#3079) feat: new cli provider for claude code and gemini (block#3083) you forgot the important ones! (block#3105) hotfix: fix build (block#3102) Richer tool call ui messages (block#3104) Update linux instructions (block#3087)

* 'main' of github.com:block/goose: Fix clippy + test errors (#3120) Update goose help to include cli (#3095) add scheduler type setting (#3119) Add a reference for recipes (#3099) feat: add voice dictation using OpenAI Whisper & ElevenLabs (#3079) feat: new cli provider for claude code and gemini (#3083) you forgot the important ones! (#3105) hotfix: fix build (#3102) Richer tool call ui messages (#3104) Update linux instructions (#3087)

* origin/main: Added announcement modal (#3098) build: Add `just` to Hermit, correct ui/desktop's README (#3116) fix: Make the entire toolcall argument row clickable to expand (#3118) Fix clippy + test errors (#3120) Update goose help to include cli (#3095) add scheduler type setting (#3119) Add a reference for recipes (#3099) feat: add voice dictation using OpenAI Whisper & ElevenLabs (#3079) feat: new cli provider for claude code and gemini (#3083) you forgot the important ones! (#3105) hotfix: fix build (#3102) Richer tool call ui messages (#3104) Update linux instructions (#3087) Add flag for showing cost tracking (#3090) Improve config file editing and recovery fallback mechanisms (#3082)

Co-authored-by: jack <> Signed-off-by: Soroosh <[email protected]>

Co-authored-by: jack <>

AaronGoldsmith reviewed Jun 25, 2025

View reviewed changes

jackjackbits changed the title ~~feat: add voice dictation with OpenAI Whisper~~ feat: add voice dictation using OpenAI Whisper & ElevenLabs Jun 26, 2025

remove a few unused pieces

b0fd088

baxen approved these changes Jun 27, 2025

View reviewed changes

baxen merged commit 6ad95fe into block:main Jun 27, 2025
5 of 6 checks passed

chaitanyarahalkar mentioned this pull request Jul 5, 2025

Macos-only Sandboxing chaitanyarahalkar/goose#9

Open

emma-squared mentioned this pull request Jul 7, 2025

Desktop: cannot select OpenAI Whisper for voice dictation. #3280

Closed

s-soroosh pushed a commit to s-soroosh/goose that referenced this pull request Jul 18, 2025

feat: add voice dictation using OpenAI Whisper & ElevenLabs (block#3079)

058dd62

Co-authored-by: jack <> Signed-off-by: Soroosh <[email protected]>

cbruyndoncx pushed a commit to cbruyndoncx/goose that referenced this pull request Jul 20, 2025

feat: add voice dictation using OpenAI Whisper & ElevenLabs (block#3079)

bffb603

Co-authored-by: jack <>

jamadeo mentioned this pull request Aug 14, 2025

speech to text #2952

Closed

	let host: String = config
	.get_param("OPENAI_HOST")
	.unwrap_or_else(\|_\| "https://api.openai.com".to_string());

feat: add voice dictation using OpenAI Whisper & ElevenLabs #3079

feat: add voice dictation using OpenAI Whisper & ElevenLabs #3079

Uh oh!

Conversation

jackjackbits commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Voice Dictation Feature - PR Summary

Overview

Key Features

1. Voice Input UI

2. Dual Provider Support

3. Settings & Configuration

4. Technical Implementation

Backend (Rust)

Frontend (TypeScript)

5. Security & Privacy

File Changes

New Files

Modified Files

Testing

Future Enhancements

Breaking Changes

Screenshots

Uh oh!

Kvadratni commented Jun 25, 2025

Uh oh!

jackjackbits commented Jun 25, 2025

Uh oh!

AaronGoldsmith Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

hugomd commented Jun 26, 2025

Uh oh!

jackjackbits commented Jun 26, 2025

Uh oh!

jackjackbits commented Jun 26, 2025

Uh oh!

baxen left a comment

Choose a reason for hiding this comment

Uh oh!

baxen Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

jackjackbits Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

baxen Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

jackjackbits Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

QBlockQ commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jackjackbits commented Jun 25, 2025 •

edited

Loading