-
-
Notifications
You must be signed in to change notification settings - Fork 76
2.2.14 Backend: Speaches
Handle:
speaches
URL: http://localhost:34331
Speaches (formerly faster-whisper-server
) is an OpenAI API compatible server that aims to be Ollama for TTS/STT models.
- GPU and CPU support.
- Easily deployable using Docker.
- OpenAI API compatible.
# [Optional] pre-pull the image
harbor pull speaches
# Start the service
harbor up speaches
# [Optional] Download TTS model
# This command is sub-optimal as it downloads all weights,
# whereas speaches only needs ONNX
harbor hf download $(harbor speaches tts_model)
# [Optional] observe logs
harbor logs speaches
Out of the box, Harbor will:
- Share global caches (from HuggingFace, Ollama, Llama.cpp and vLLM)
- Pre-connect speaches as default STT and TTS backend for Open WebUI, when run together
- ℹ️ Due to uni-directional config bindings - Harbor will override the Speech-to-Text and Text-to-Speech configuration in Open WebUI, when
speaches
is started after Open WebUI when runningspeeches
, you can provide additional overrides viaopen-webui/configs/config.override.json
manually if needed
- ℹ️ Due to uni-directional config bindings - Harbor will override the Speech-to-Text and Text-to-Speech configuration in Open WebUI, when
- If Nvidia Docker Toolkit support is detected on the host - Use
-cuda
version of the image- ℹ️
speaches
requires CUDA 12.6 and above to function properly. Usenvidia-smi
to check your version and update if possible. Otherwise, you can enforce CPU version by deleting thecompose.x.speaches.nvidia.yml
file in Harbor's workspace
- ℹ️
Note
If you're seeing any kind of file system permission errors you'll need to ensure that files written from within a container are accessible to your user.
Tip
When transcribing for the very first time, the service will download the model weights, which may take some time. Track in service logs via:
harbor logs speaches
Following configuration options are available:
# Get/set STT model to use
harbor speaches stt_model
harbor speaches stt_model Systran/faster-distil-whisper-large-v3
# Get/set TTS model to use
harbor speaches tts_model
harbor speaches tts_model hexgrad/Kokoro-82M
# Get/set TTS voice to use
# For default TTS (Kokoro, see https://github.com/thewh1teagle/kokoro-onnx?tab=readme-ov-file#voices)
# For other TTS models, see the model's documentation
harbor speaches tts_voice
harbor speaches tts_voice af
# Get/set docker label to use
harbor speaches version
harbor speaches version latest
All of the above options can also be set via the harbor config
command.
Run the following command to see all available config options:
harbor config ls | grep SPEACHES
# Port on the host where OpenAI-compatible API will be exposed
SPEACHES_HOST_PORT 34331
# Docker tag to use for the image
SPEACHES_VERSION latest
# STT Model in user/repo format
SPEACHES_STT_MODEL Systran/faster-distil-whisper-large-v3
# TTS Model in user/repo format
SPEACHES_TTS_MODEL hexgrad/Kokoro-82M
# TTS Voice
SPEACHES_TTS_VOICE af
See Harbor's environment configuration guide to set arbitrary environment variables for the service.
You can hit the /models
endpoint when the service is running to get a list of supported models.
curl $(harbor url speaches)/v1/models
See more examples of the API in the http catalog.