Skip to content

Conversation

Trynax
Copy link

@Trynax Trynax commented Oct 18, 2025

Related to #376 - Support for Voice Models

Implementation

Adds comprehensive OpenAI Whisper audio transcription support to Echo with whisper-1 and whisper-large-v3 models.

Key Features

  • ✅ Audio transcription and translation
  • ✅ Complete TypeScript SDK support
  • ✅ Example components for Next.js and Vite

Architecture

const formData = new FormData();
formData.append('file', audioBlob);
formData.append('model', 'whisper-1');

const response = await fetch(`${ECHO_BASE_URL}/audio/transcriptions`, {
  method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'X-App-Id': appId },
  body: formData
});

Files Added

  • OpenAIAudioProvider.ts - Server-side audio processing
  • Audio model definitions and SDK integration
  • Example components and comprehensive tests
  • Enhanced provider factory and validation

Tests fail(model not found) against production API (expected - needs server deployment)

Provides clean foundation for future voice features (#376) and resolves previous merge conflict issues (#406).

@vercel
Copy link
Contributor

vercel bot commented Oct 18, 2025

@Trynax is attempting to deploy a commit to the Merit Systems Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@rsproule rsproule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level:

  • try to use the external providers as much as possible for all types
  • should test that this works with the sdks directly instead of doing raw fetch
  • the examples look like they have updated the frontend, but i do not see the backend portion for this
  • document ur testing strategy (i know running this locally is a PITA, lmk if you run into problems here, we are working on making this easier)

}>;
}

export class OpenAIAudioClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not be rebuilding our own client here. we can use the actual open ai client like here

const client = new OpenAI({
apiKey: process.env.ECHO_API_KEY || '',
baseURL: 'http://localhost:3070',
});

import { BaseProvider } from './BaseProvider';
import { ProviderType } from './ProviderType';
import logger from '../logger';
import { OpenAIAudioClient, AudioTranscriptionResponse } from '../clients/openai-audio-client';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use OpenAI's types directly

providerId: 'openai-audio',
provider: 'openai',
model: this.getModel(),
durationSeconds: durationMinutes * 60,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audioResponse.duration alr exists

}

override supportsStream(): boolean {
return false; // Audio transcription doesn't support streaming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formData.append('response_format', 'json');

const token = await getToken();
const response = await fetch(`${baseRouterUrl}/audio/transcriptions`, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the ai-sdk for all of this stuff in here like the other smoke tests

}
});

it('whisper-1 transcription', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just iterate over these like the other tests, no need to write each manually

method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'X-App-Id': ECHO_APP_ID!,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a thing. i think this is slopped up

@Trynax
Copy link
Author

Trynax commented Oct 21, 2025

Yoo @rsproule

I've addressed all the feedback
Followed the image and text testing strategy but the tests fail with 422 "model not supported" errors, which I assume is expected since the backend hasn't been deployed yet right?

@rsproule
Copy link
Contributor

Yoo @rsproule

I've addressed all the feedback Followed the image and text testing strategy but the tests fail with 422 "model not supported" errors, which I assume is expected since the backend hasn't been deployed yet right?

You should be able to point the tests at a local version of the backend which has your changes in it. You will need to fill it some portion of the .env (we are making this easier to run locally, i know its a pain rn)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants