Add OpenAI Whisper Audio Transcription Support #584

Trynax · 2025-10-18T20:54:00Z

Related to #376 - Support for Voice Models

Implementation

Adds comprehensive OpenAI Whisper audio transcription support to Echo with whisper-1 and whisper-large-v3 models.

Key Features

✅ Audio transcription and translation
✅ Complete TypeScript SDK support
✅ Example components for Next.js and Vite

Architecture

const formData = new FormData();
formData.append('file', audioBlob);
formData.append('model', 'whisper-1');

const response = await fetch(`${ECHO_BASE_URL}/audio/transcriptions`, {
  method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'X-App-Id': appId },
  body: formData
});

Files Added

OpenAIAudioProvider.ts - Server-side audio processing
Audio model definitions and SDK integration
Example components and comprehensive tests
Enhanced provider factory and validation

Tests fail(model not found) against production API (expected - needs server deployment)

Provides clean foundation for future voice features (#376) and resolves previous merge conflict issues (#406).

vercel · 2025-10-18T20:54:08Z

@Trynax is attempting to deploy a commit to the Merit Systems Team on Vercel.

A member of the Team first needs to authorize it.

rsproule

High level:

try to use the external providers as much as possible for all types
should test that this works with the sdks directly instead of doing raw fetch
the examples look like they have updated the frontend, but i do not see the backend portion for this
document ur testing strategy (i know running this locally is a PITA, lmk if you run into problems here, we are working on making this easier)

rsproule · 2025-10-20T16:22:32Z

packages/app/server/src/clients/openai-audio-client.ts

+  }>;
+}
+
+export class OpenAIAudioClient {


we should not be rebuilding our own client here. we can use the actual open ai client like here

echo/packages/app/server/src/clients/openai-image-formdata-client.ts

Lines 11 to 15 in ca3ea92

const client = new OpenAI({

apiKey: process.env.ECHO_API_KEY || '',

baseURL: 'http://localhost:3070',

});

rsproule · 2025-10-20T16:22:52Z

packages/app/server/src/providers/OpenAIAudioProvider.ts

+import { BaseProvider } from './BaseProvider';
+import { ProviderType } from './ProviderType';
+import logger from '../logger';
+import { OpenAIAudioClient, AudioTranscriptionResponse } from '../clients/openai-audio-client';


use OpenAI's types directly

rsproule · 2025-10-20T16:23:30Z

packages/app/server/src/providers/OpenAIAudioProvider.ts

+          providerId: 'openai-audio',
+          provider: 'openai',
+          model: this.getModel(),
+          durationSeconds: durationMinutes * 60,


audioResponse.duration alr exists

rsproule · 2025-10-20T16:37:23Z

packages/app/server/src/providers/OpenAIAudioProvider.ts

+  }
+
+  override supportsStream(): boolean {
+    return false; // Audio transcription doesn't support streaming


https://platform.openai.com/docs/guides/speech-to-text#streaming

streaming does have some support

rsproule · 2025-10-20T16:39:39Z

packages/tests/provider-smoke/openai-audio-transcription.test.ts

+      formData.append('response_format', 'json');
+
+      const token = await getToken();
+      const response = await fetch(`${baseRouterUrl}/audio/transcriptions`, {


use the ai-sdk for all of this stuff in here like the other smoke tests

rsproule · 2025-10-20T16:40:01Z

packages/tests/provider-smoke/openai-audio-transcription.test.ts

+    }
+  });
+
+  it('whisper-1 transcription', async () => {


we can just iterate over these like the other tests, no need to write each manually

rsproule · 2025-10-20T16:40:25Z

packages/tests/provider-smoke/openai-audio-transcription.test.ts

+        method: 'POST',
+        headers: {
+          'Authorization': `Bearer ${token}`,
+          'X-App-Id': ECHO_APP_ID!,


this is not a thing. i think this is slopped up

Trynax · 2025-10-21T10:23:02Z

Yoo @rsproule

I've addressed all the feedback
Followed the image and text testing strategy but the tests fail with 422 "model not supported" errors, which I assume is expected since the backend hasn't been deployed yet right?

rsproule · 2025-10-22T18:38:00Z

Yoo @rsproule

I've addressed all the feedback Followed the image and text testing strategy but the tests fail with 422 "model not supported" errors, which I assume is expected since the backend hasn't been deployed yet right?

You should be able to point the tests at a local version of the backend which has your changes in it. You will need to fill it some portion of the .env (we are making this easier to run locally, i know its a pain rn)

Trynax added 2 commits October 18, 2025 16:05

Merge upstream changes and reapply audio transcription implementationy

816a3a8

Add audio model validation and rebuild SDK

1624afc

rsproule reviewed Oct 20, 2025

View reviewed changes

addresses code review feedbac

d615c83

rsproule mentioned this pull request Oct 22, 2025

Add Whisper Audio Transcription Support #406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenAI Whisper Audio Transcription Support #584

Add OpenAI Whisper Audio Transcription Support #584

Uh oh!

Trynax commented Oct 18, 2025

Uh oh!

vercel bot commented Oct 18, 2025

Uh oh!

rsproule left a comment

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

rsproule Oct 20, 2025

Uh oh!

Trynax commented Oct 21, 2025

Uh oh!

rsproule commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const client = new OpenAI({
	apiKey: process.env.ECHO_API_KEY \|\| '',
	baseURL: 'http://localhost:3070',
	});

Add OpenAI Whisper Audio Transcription Support #584

Are you sure you want to change the base?

Add OpenAI Whisper Audio Transcription Support #584

Uh oh!

Conversation

Trynax commented Oct 18, 2025

Implementation

Key Features

Architecture

Files Added

Uh oh!

vercel bot commented Oct 18, 2025

Uh oh!

rsproule left a comment

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rsproule Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Trynax commented Oct 21, 2025

Uh oh!

rsproule commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants