2.2.14 Backend: Speaches

speaches, aka faster-whisper-server

Handle: speaches URL: http://localhost:34331

Speaches (formerly faster-whisper-server) is an OpenAI API compatible server that aims to be Ollama for TTS/STT models.

GPU and CPU support.
Easily deployable using Docker.
OpenAI API compatible.

Starting

# [Optional] pre-pull the image
harbor pull speaches

# Start the service
harbor up speaches

# [Optional] Download TTS model
# This command is sub-optimal as it downloads all weights,
# whereas speaches only needs ONNX
harbor hf download $(harbor speaches tts_model)

# [Optional] observe logs
harbor logs speaches

Out of the box, Harbor will:

Share global caches (from HuggingFace, Ollama, Llama.cpp and vLLM)
Pre-connect speaches as default STT and TTS backend for Open WebUI, when run together
- ℹ️ Due to uni-directional config bindings - Harbor will override the Speech-to-Text and Text-to-Speech configuration in Open WebUI, when speaches is started after Open WebUI when running speeches, you can provide additional overrides via open-webui/configs/config.override.json manually if needed
If Nvidia Docker Toolkit support is detected on the host - Use -cuda version of the image
- ℹ️ speaches requires CUDA 12.6 and above to function properly. Use nvidia-smi to check your version and update if possible. Otherwise, you can enforce CPU version by deleting the compose.x.speaches.nvidia.yml file in Harbor's workspace

Note

If you're seeing any kind of file system permission errors you'll need to ensure that files written from within a container are accessible to your user.

Configuration

Tip

When transcribing for the very first time, the service will download the model weights, which may take some time. Track in service logs via:

harbor logs speaches

Following configuration options are available:

# Get/set STT model to use
harbor speaches stt_model
harbor speaches stt_model Systran/faster-distil-whisper-large-v3

# Get/set TTS model to use
harbor speaches tts_model
harbor speaches tts_model hexgrad/Kokoro-82M
# Get/set TTS voice to use
# For default TTS (Kokoro, see https://github.com/thewh1teagle/kokoro-onnx?tab=readme-ov-file#voices)
# For other TTS models, see the model's documentation
harbor speaches tts_voice
harbor speaches tts_voice af

# Get/set docker label to use
harbor speaches version
harbor speaches version latest

All of the above options can also be set via the harbor config command.

Run the following command to see all available config options:

harbor config ls | grep SPEACHES

# Port on the host where OpenAI-compatible API will be exposed
SPEACHES_HOST_PORT             34331
# Docker tag to use for the image
SPEACHES_VERSION               latest

# STT Model in user/repo format
SPEACHES_STT_MODEL             Systran/faster-distil-whisper-large-v3
# TTS Model in user/repo format
SPEACHES_TTS_MODEL             hexgrad/Kokoro-82M
# TTS Voice
SPEACHES_TTS_VOICE             af

See Harbor's environment configuration guide to set arbitrary environment variables for the service.

Models

You can hit the /models endpoint when the service is running to get a list of supported models.

curl $(harbor url speaches)/v1/models

See more examples of the API in the http catalog.

Home | CLI Reference | Services | Adding New Service | Compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.2.14 Backend: Speaches

speaches, aka faster-whisper-server

Starting

Configuration

Models

Clone this wiki locally