-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat: Add Vosk STT plugin for offline speech-to-text #4466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e37f717
b800e4a
f6487b0
b9f2638
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,202 @@ | ||||||
| # Vosk Plugin for LiveKit Agents | ||||||
|
|
||||||
| Offline speech-to-text plugin using [Vosk](https://alphacephei.com/vosk/) for the LiveKit Agents framework. | ||||||
|
|
||||||
| ## Features | ||||||
|
|
||||||
| - **Offline Processing**: Runs entirely locally without internet connection | ||||||
| - **No API Keys Required**: Completely free, no cloud service costs | ||||||
| - **Multi-language Support**: 20+ languages including English, Spanish, French, German, Chinese, Russian, and more | ||||||
| - **Streaming Recognition**: Real-time transcription with interim results | ||||||
| - **Word-level Timestamps**: Precise timing information for each word | ||||||
| - **Speaker Diarization**: Optional speaker identification (requires speaker model) | ||||||
| - **Privacy-focused**: All processing happens on your device | ||||||
|
|
||||||
| ## Installation | ||||||
|
|
||||||
| ```bash | ||||||
| pip install livekit-plugins-vosk | ||||||
| ``` | ||||||
|
|
||||||
| ## Download Models | ||||||
|
|
||||||
| Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models | ||||||
|
|
||||||
| ### Quick Start - Small English Model (~40MB) | ||||||
|
|
||||||
| ```bash | ||||||
| mkdir -p ~/.cache/vosk/models | ||||||
| cd ~/.cache/vosk/models | ||||||
| wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip | ||||||
| unzip vosk-model-small-en-us-0.15.zip | ||||||
| ``` | ||||||
|
|
||||||
| ### Available Models | ||||||
|
|
||||||
| - **English**: `vosk-model-en-us-0.22` (large), `vosk-model-small-en-us-0.15` (small) | ||||||
| - **Spanish**: `vosk-model-es-0.42` | ||||||
| - **French**: `vosk-model-fr-0.22` | ||||||
| - **German**: `vosk-model-de-0.21` | ||||||
| - **Chinese**: `vosk-model-cn-0.22` | ||||||
| - **Russian**: `vosk-model-ru-0.42` | ||||||
| - **And many more...** | ||||||
|
|
||||||
| See the [full model list](https://alphacephei.com/vosk/models). | ||||||
|
|
||||||
| ## Usage | ||||||
|
|
||||||
| ### Basic Example | ||||||
|
|
||||||
| ```python | ||||||
| from livekit.agents import JobContext, cli, WorkerOptions | ||||||
| from livekit.plugins import vosk | ||||||
| import os | ||||||
|
|
||||||
| async def entrypoint(ctx: JobContext): | ||||||
| await ctx.connect() | ||||||
|
|
||||||
| # Path to your downloaded Vosk model | ||||||
| model_path = os.path.expanduser("~/.cache/vosk/models/vosk-model-small-en-us-0.15") | ||||||
|
|
||||||
| # Create STT instance | ||||||
| stt_instance = vosk.STT( | ||||||
| model_path=model_path, | ||||||
| language="en", | ||||||
| sample_rate=16000, | ||||||
| enable_words=True, | ||||||
| ) | ||||||
|
|
||||||
| # Use in streaming mode | ||||||
| stream = stt_instance.stream() | ||||||
|
|
||||||
| # Process audio frames... | ||||||
| # stream.push_frame(audio_frame) | ||||||
|
|
||||||
| # Get transcription events | ||||||
| async for event in stream: | ||||||
| if event.type == "final_transcript": | ||||||
| print(f"Final: {event.alternatives[0].text}") | ||||||
| elif event.type == "interim_transcript": | ||||||
| print(f"Interim: {event.alternatives[0].text}") | ||||||
|
|
||||||
| if __name__ == "__main__": | ||||||
| cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) | ||||||
| ``` | ||||||
|
|
||||||
| ### With AgentSession | ||||||
|
|
||||||
| ```python | ||||||
| from livekit.agents import Agent, AgentSession, JobContext, cli, WorkerOptions | ||||||
| from livekit.plugins import vosk, silero | ||||||
| import os | ||||||
|
|
||||||
| async def entrypoint(ctx: JobContext): | ||||||
| await ctx.connect() | ||||||
|
|
||||||
| model_path = os.path.expanduser("~/.cache/vosk/models/vosk-model-en-us-0.22") | ||||||
|
|
||||||
| agent = Agent( | ||||||
| instructions="You are a helpful voice assistant.", | ||||||
| ) | ||||||
|
|
||||||
| session = AgentSession( | ||||||
| vad=silero.VAD.load(), | ||||||
| stt=vosk.STT( | ||||||
| model_path=model_path, | ||||||
| language="en", | ||||||
| enable_words=True, | ||||||
| ), | ||||||
| llm="openai/gpt-4o", | ||||||
| tts="cartesia/sonic-2", | ||||||
| ) | ||||||
|
|
||||||
| await session.start(agent=agent, room=ctx.room) | ||||||
| await session.generate_reply(instructions="greet the user") | ||||||
|
|
||||||
| if __name__ == "__main__": | ||||||
| cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) | ||||||
| ``` | ||||||
|
|
||||||
| ### With Speaker Diarization | ||||||
|
|
||||||
| ```python | ||||||
| stt_instance = vosk.STT( | ||||||
| model_path="/path/to/vosk-model-en-us-0.22", | ||||||
| speaker_model_path="/path/to/vosk-model-spk-0.4", | ||||||
| language="en", | ||||||
| enable_words=True, | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| ## Configuration Options | ||||||
|
|
||||||
| ### STT Constructor | ||||||
|
|
||||||
| - **`model_path`** (required): Path to the Vosk model directory | ||||||
| - **`language`** (default: `"en"`): Language code for metadata | ||||||
| - **`sample_rate`** (default: `16000`): Audio sample rate in Hz | ||||||
| - **`enable_words`** (default: `True`): Include word-level timestamps | ||||||
| - **`max_alternatives`** (default: `0`): Number of alternative transcriptions (0 = disabled) | ||||||
| - **`speaker_model_path`** (optional): Path to speaker identification model | ||||||
|
|
||||||
| ## Supported Languages | ||||||
|
|
||||||
| Vosk supports 20+ languages: | ||||||
|
|
||||||
| - English (US, Indian) | ||||||
| - Spanish | ||||||
| - French | ||||||
| - German | ||||||
| - Italian | ||||||
| - Portuguese | ||||||
| - Chinese | ||||||
| - Russian | ||||||
| - Japanese | ||||||
| - Turkish | ||||||
| - Vietnamese | ||||||
| - Dutch | ||||||
| - Catalan | ||||||
| - Arabic | ||||||
| - Greek | ||||||
| - Farsi | ||||||
| - Filipino | ||||||
| - Ukrainian | ||||||
| - Kazakh | ||||||
| - Swedish | ||||||
| - And more... | ||||||
|
|
||||||
| See https://alphacephei.com/vosk/models for the complete list. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace bare URL with a Markdown link in the supported-languages section. This keeps the README compliant with MD034. ♻️ Suggested fix-See https://alphacephei.com/vosk/models for the complete list.
+See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.📝 Committable suggestion
Suggested change
🧰 Tools🪛 markdownlint-cli2 (0.18.1)168-168: Bare URL used (MD034, no-bare-urls) 🤖 Prompt for AI Agents |
||||||
|
|
||||||
| ## Performance Tips | ||||||
|
|
||||||
| 1. **Model Size**: Smaller models (~50MB) are faster but less accurate. Larger models (~1GB) provide better accuracy. | ||||||
| 2. **Sample Rate**: Vosk works best with 16kHz audio. The plugin automatically resamples if needed. | ||||||
| 3. **CPU Usage**: Vosk runs on CPU. For production, use a server with adequate CPU resources. | ||||||
| 4. **Memory**: Load models once and reuse them across multiple streams to save memory. The plugin automatically caches loaded models globally. | ||||||
| 5. **Prewarming**: Call `stt_instance.prewarm()` at startup to load models into memory before the first request, reducing initial latency. | ||||||
|
|
||||||
| ## Advantages | ||||||
|
|
||||||
| - ✅ **Privacy**: All processing is local, no data sent to cloud | ||||||
| - ✅ **Cost**: Completely free, no API fees | ||||||
| - ✅ **Latency**: Lower latency without network round-trip | ||||||
| - ✅ **Reliability**: Works offline, no internet dependency | ||||||
| - ✅ **Compliance**: Easier to meet data residency requirements | ||||||
|
|
||||||
| ## Limitations | ||||||
|
|
||||||
| - Requires pre-downloaded models (50MB - 1GB) | ||||||
| - Accuracy may be lower than latest cloud models for some languages | ||||||
| - Requires local compute resources (CPU/memory) | ||||||
| - Model updates require manual download | ||||||
|
|
||||||
| ## License | ||||||
|
|
||||||
| Apache 2.0 | ||||||
|
|
||||||
| ## Links | ||||||
|
|
||||||
| - [Vosk Website](https://alphacephei.com/vosk/) | ||||||
| - [Vosk GitHub](https://github.com/alphacep/vosk-api) | ||||||
| - [Vosk Models](https://alphacephei.com/vosk/models) | ||||||
| - [LiveKit Agents](https://docs.livekit.io/agents/) | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Copyright 2025 LiveKit, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Vosk plugin for LiveKit Agents | ||
|
|
||
| Offline speech-to-text with [Vosk](https://alphacephei.com/vosk/). | ||
| """ | ||
|
|
||
| from .stt import STT, SpeechStream | ||
| from .version import __version__ | ||
|
|
||
| __all__ = ["STT", "SpeechStream", "__version__"] | ||
|
|
||
|
|
||
| from livekit.agents import Plugin | ||
|
|
||
| from .log import logger | ||
|
|
||
|
|
||
| class VoskPlugin(Plugin): | ||
| def __init__(self) -> None: | ||
| super().__init__(__name__, __version__, __package__, logger) | ||
|
|
||
|
|
||
| Plugin.register_plugin(VoskPlugin()) | ||
|
|
||
| # Cleanup docs of unexported modules | ||
| _module = dir() | ||
| NOT_IN_ALL = [m for m in _module if m not in __all__] | ||
|
|
||
| __pdoc__ = {} | ||
|
|
||
| for n in NOT_IN_ALL: | ||
| __pdoc__[n] = False |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Copyright 2025 LiveKit, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import logging | ||
|
|
||
| logger = logging.getLogger(__name__) |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,87 @@ | ||||||||||
| # Copyright 2025 LiveKit, Inc. | ||||||||||
| # | ||||||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||
| # you may not use this file except in compliance with the License. | ||||||||||
| # You may obtain a copy of the License at | ||||||||||
| # | ||||||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||
| # | ||||||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||||||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||
| # See the License for the specific language governing permissions and | ||||||||||
| # limitations under the License. | ||||||||||
|
|
||||||||||
| """Vosk model management utilities.""" | ||||||||||
|
|
||||||||||
| from pathlib import Path | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class VoskModels: | ||||||||||
| """ | ||||||||||
| Pre-defined Vosk model identifiers. | ||||||||||
|
|
||||||||||
| Download models from: https://alphacephei.com/vosk/models | ||||||||||
| """ | ||||||||||
|
|
||||||||||
| # English models | ||||||||||
| EN_US_SMALL = "vosk-model-small-en-us-0.15" | ||||||||||
| EN_US = "vosk-model-en-us-0.22" | ||||||||||
| EN_US_LARGE = "vosk-model-en-us-0.22-lgraph" | ||||||||||
|
|
||||||||||
| # Other languages | ||||||||||
| CN = "vosk-model-cn-0.22" | ||||||||||
| DE = "vosk-model-de-0.21" | ||||||||||
| ES = "vosk-model-es-0.42" | ||||||||||
| FR = "vosk-model-fr-0.22" | ||||||||||
| IT = "vosk-model-it-0.22" | ||||||||||
| JA = "vosk-model-ja-0.22" | ||||||||||
| PT = "vosk-model-pt-0.3" | ||||||||||
| RU = "vosk-model-ru-0.42" | ||||||||||
| TR = "vosk-model-tr-0.3" | ||||||||||
| VI = "vosk-model-vi-0.4" | ||||||||||
|
|
||||||||||
| # Speaker identification model | ||||||||||
| SPEAKER_MODEL = "vosk-model-spk-0.4" | ||||||||||
|
|
||||||||||
|
|
||||||||||
| DEFAULT_MODEL_DIR = Path.home() / ".cache" / "vosk" / "models" | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def validate_model_path(model_path: str | Path) -> Path: | ||||||||||
| """ | ||||||||||
|
Comment on lines
+51
to
+52
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: Python 3.9 does not support PEP 604’s union operator ( What to use on Python 3.9Use from pathlib import Path
from typing import Union
StrOrPath = Union[str, Path]
About
|
||||||||||
| def validate_model_path(model_path: str | Path) -> Path: | |
| """ | |
| def validate_model_path(model_path: Union[str, Path]) -> Path: | |
| """ |
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py` around
lines 51 - 52, The type annotation on validate_model_path uses the Python 3.10
union operator (str | Path) which breaks Python 3.9; update the function
signature for validate_model_path to use typing.Union[str, Path] and add the
corresponding import from typing (Union) at the top of the module so the
annotation is compatible with Python 3.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace bare model-download URL with a Markdown link.
Markdownlint MD034 disallows bare URLs in prose.
♻️ Suggested fix
📝 Committable suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
23-23: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents