-
Notifications
You must be signed in to change notification settings - Fork 25
Add OpenAI Whisper Audio Transcription Support #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add OpenAI Whisper Audio Transcription Support #584
Conversation
@Trynax is attempting to deploy a commit to the Merit Systems Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level:
- try to use the external providers as much as possible for all types
- should test that this works with the sdks directly instead of doing raw fetch
- the examples look like they have updated the frontend, but i do not see the backend portion for this
- document ur testing strategy (i know running this locally is a PITA, lmk if you run into problems here, we are working on making this easier)
}>; | ||
} | ||
|
||
export class OpenAIAudioClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not be rebuilding our own client here. we can use the actual open ai client like here
echo/packages/app/server/src/clients/openai-image-formdata-client.ts
Lines 11 to 15 in ca3ea92
const client = new OpenAI({ | |
apiKey: process.env.ECHO_API_KEY || '', | |
baseURL: 'http://localhost:3070', | |
}); | |
import { BaseProvider } from './BaseProvider'; | ||
import { ProviderType } from './ProviderType'; | ||
import logger from '../logger'; | ||
import { OpenAIAudioClient, AudioTranscriptionResponse } from '../clients/openai-audio-client'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use OpenAI's types directly
providerId: 'openai-audio', | ||
provider: 'openai', | ||
model: this.getModel(), | ||
durationSeconds: durationMinutes * 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioResponse.duration
alr exists
} | ||
|
||
override supportsStream(): boolean { | ||
return false; // Audio transcription doesn't support streaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://platform.openai.com/docs/guides/speech-to-text#streaming
streaming does have some support
formData.append('response_format', 'json'); | ||
|
||
const token = await getToken(); | ||
const response = await fetch(`${baseRouterUrl}/audio/transcriptions`, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the ai-sdk for all of this stuff in here like the other smoke tests
} | ||
}); | ||
|
||
it('whisper-1 transcription', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just iterate over these like the other tests, no need to write each manually
method: 'POST', | ||
headers: { | ||
'Authorization': `Bearer ${token}`, | ||
'X-App-Id': ECHO_APP_ID!, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a thing. i think this is slopped up
Yoo @rsproule I've addressed all the feedback |
You should be able to point the tests at a local version of the backend which has your changes in it. You will need to fill it some portion of the .env (we are making this easier to run locally, i know its a pain rn) |
Related to #376 - Support for Voice Models
Implementation
Adds comprehensive OpenAI Whisper audio transcription support to Echo with
whisper-1
andwhisper-large-v3
models.Key Features
Architecture
Files Added
OpenAIAudioProvider.ts
- Server-side audio processingTests fail(model not found) against production API (expected - needs server deployment)
Provides clean foundation for future voice features (#376) and resolves previous merge conflict issues (#406).