OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

Compatible with the OpenAI audio/transcriptions and audio/translations API
Does not connect to the OpenAI API and does not require an OpenAI API Key
Not affiliated with OpenAI in any way

API Compatibility:

/v1/audio/transcriptions
/v1/audio/translations

Parameter Support:

file
model (only whisper-1 exists, so this is ignored)
language
prompt (not yet supported)
temperature
response_format:
- json
- text
- srt
- vtt
- verbose_json *(partial support, some fields missing)

Details:

CUDA or CPU support (automatically detected)
float32, float16 or bfloat16 support (automatically detected)

Tested whisper models:

openai/whisper-large-v2 (the default)
openai/whisper-large-v3
distil-whisper/distil-medium.en
openai/whisper-tiny.en
...

Version: 0.1.0, Last update: 2024-03-15

API Documentation

Usage

OpenAI Speech to text guide
OpenAI API Transcription Reference
OpenAI API Translation Reference

Installation instructions

You will need to install CUDA for your operating system if you want to use CUDA.

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg
sudo apt install ffmpeg

Usage

Usage: whisper.py [-m <model_name>] [-d <device>] [-t <dtype>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Ex. distil-whisper/distil-medium.en (default: openai/whisper-large-v2)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:1 (default: auto)
-t DTYPE, --dtype DTYPE
                      Set the torch data type for processing (float32, float16, bfloat16) (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)
--preload             Preload model and exit. (default: False)

Sample API Usage

You can use it like this:

curl -s http://localhost:8000/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transcriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

Options can be set via whisper.env.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OpenedAI Whisper

API Documentation

Usage

Installation instructions

Usage

Sample API Usage

Docker support

Files

README.md

Latest commit

History

README.md

File metadata and controls

OpenedAI Whisper

API Documentation

Usage

Installation instructions

Usage

Sample API Usage

Docker support