feat: Add Vosk STT plugin for offline speech-to-text #4466

darshankparmar · 2026-01-07T12:47:52Z

This PR introduces livekit-plugins-vosk, a new offline Speech-to-Text (STT) plugin for the LiveKit Agents framework. This plugin enables fully local, privacy-focused speech recognition without relying on external API dependencies.

Dependencies

vosk>=0.3.50
numpy>=1.26

Supported Languages

Vosk supports 20+ languages:

English (US, Indian)
Spanish
French
German
Italian
Portuguese
Chinese
Russian
Japanese
Turkish
And more...

See https://alphacephei.com/vosk/models for the complete list.

Summary by CodeRabbit

New Features
- Added Vosk plugin for offline, multilingual speech-to-text with streaming recognition, word-level timestamps, and optional speaker diarization.
Documentation
- Added comprehensive README with installation, model download, configuration, and usage examples.
Chores
- Made the Vosk plugin available as an optional install and included it in the workspace for easier installation and discovery.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

darshankparmar · 2026-01-08T03:10:21Z

Hi @davidzhao 👋
This PR is ready for review.
Please let me know if you’d like any changes or additional tests.
Thanks!

coderabbitai · 2026-01-24T08:02:17Z

📝 Walkthrough

Walkthrough

This PR adds a new Vosk-based LiveKit Agents plugin (livekit-plugins-vosk) with packaging, optional dependency wiring, model management, a streaming STT implementation (with speaker diarization support), documentation, and automatic plugin registration.

Changes

Cohort / File(s)	Summary
Documentation `CONTRIBUTING.md`, `livekit-plugins/livekit-plugins-vosk/README.md`	CONTRIBUTING example pip command updated to include `-p livekit.plugins.vosk`. Added a comprehensive README for the Vosk plugin describing install, model download, usage, config, languages, and limitations.
Workspace & Packaging `pyproject.toml`, `livekit-plugins/livekit-plugins-vosk/pyproject.toml`, `livekit-agents/pyproject.toml`	Added `livekit-plugins-vosk` as a workspace source. Created pyproject.toml for `livekit-plugins-vosk` (Hatch build), declaring dependencies including `vosk>=0.3.45`, `numpy>=1.26`, and `livekit-agents>=1.3.12`. Added optional dependency `vosk = ["livekit-plugins-vosk>=1.3.12"]` to `livekit-agents`.
Plugin bootstrap & metadata `livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py`, `.../version.py`, `.../log.py`	Added `VoskPlugin` class with automatic registration on import, module-level `__version__ = "1.3.12"`, and a module logger. Exports STT, SpeechStream, and version.
Model management `livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py`	Added `VoskModels` identifiers for multiple languages, `DEFAULT_MODEL_DIR`, and `validate_model_path()` which checks model directory and required files (`am/final.mdl`, `conf/model.conf`).
STT & streaming logic `livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py`	Implemented `STT` and `SpeechStream` classes with model caching, optional speaker model support, prewarm/recognize/stream APIs, PCM16 conversion, resampling, async frame processing, and `_parse_vosk_result()` to emit SpeechEvent data (final/interim, word timestamps, diarization).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant STT as STT Instance
    participant Stream as SpeechStream
    participant Audio as Audio Processing
    participant Recognizer as KaldiRecognizer
    participant Parser as Result Parser

    App->>STT: stream()
    activate STT
    STT->>Stream: create SpeechStream
    deactivate STT
    activate Stream

    loop Frame Flow
        App->>Stream: push_frame(AudioFrame)
        activate Audio
        Stream->>Audio: convert_frame_to_pcm16()
        Audio-->>Stream: pcm16 bytes
        Stream->>Audio: resample if needed
        Audio-->>Stream: resampled bytes
        deactivate Audio

        Stream->>Recognizer: accept_waveform(pcm_data)
        activate Recognizer
        Recognizer-->>Stream: JSON result
        deactivate Recognizer

        Stream->>Parser: parse_vosk_result(json)
        activate Parser
        Parser-->>Stream: SpeechEvent
        deactivate Parser

        Stream-->>App: emit SpeechEvent
    end

    App->>Stream: flush()
    Stream->>Recognizer: get_result()
    Recognizer-->>Stream: final JSON
    Stream->>Parser: parse_vosk_result(json, is_final=true)
    Parser-->>Stream: final SpeechEvent
    Stream-->>App: emit final SpeechEvent
    deactivate Stream

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

add missing plugins to dependencies #4593: Adds plugin optional-dependency entries to livekit-agents/pyproject.toml (similar location and intent to this PR's addition of vosk).

Suggested reviewers

davidzhao
tinalenguyen
longcw

Poem

🐰 I hop on bytes and models fine,

Offline whispers turned to line by line.
Frames to words, with speakers clear —
A tiny plugin, bringing sound near.
Hooray for Vosk, hop, and cheer! 🎙️🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a Vosk STT plugin for offline speech-to-text. It is clear, concise, and directly reflects the primary objective of the pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6487b0 and b9f2638.

📒 Files selected for processing (3)

livekit-agents/pyproject.toml
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
livekit-plugins/livekit-plugins-vosk/pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (2)

livekit-agents/pyproject.toml
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class

Applied to files:

livekit-plugins/livekit-plugins-vosk/pyproject.toml

🔇 Additional comments (2)

livekit-plugins/livekit-plugins-vosk/pyproject.toml (2)

1-24: Solid project/build metadata setup.
Everything looks consistent with Hatch usage and standard packaging metadata.

25-29: Keep the current vosk version requirement unchanged—0.3.50 is not yet available on PyPI.

The current vosk>=0.3.45 dependency is correct. PyPI only publishes up to version 0.3.45 (released Dec 14, 2022); version 0.3.50 exists only in the upstream repository and is not available on the package index. Bumping to vosk>=0.3.50 would break installations. Align the PR objectives with the actual available version.

Likely an incorrect or invalid review comment.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py`:
- Around line 51-52: The type annotation on validate_model_path uses the Python
3.10 union operator (str | Path) which breaks Python 3.9; update the function
signature for validate_model_path to use typing.Union[str, Path] and add the
corresponding import from typing (Union) at the top of the module so the
annotation is compatible with Python 3.9.

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py`:
- Around line 410-428: _in _convert_audio_buffer_to_pcm16_, the current
resampling logic only uses the first element of AudioResampler.push()
(frames[0]), which drops any additional resampled frames; update the function to
iterate through all frames returned by resampler.push(merged_frame), concatenate
them (or merge/append into a single combined frame replacing merged_frame)
before calling _convert_frame_to_pcm16, ensuring no resampled audio is lost—use
the same approach as the earlier resampling handling that loops over frames to
build the complete merged_frame.
- Around line 82-129: The constructor (__init__) currently calls
validate_model_path for both acoustic models and speaker models, but Vosk
speaker models use a flat layout and will fail that strict check; modify
initialization to use a lighter validation for speaker models: either create a
new validate_speaker_model_path that only verifies that the provided
speaker_model_path exists and is a directory/file (or None), or conditionally
call a simpler check when setting self._speaker_model_path instead of
validate_model_path; update the assignment in __init__ (the line assigning
self._speaker_model_path and the STTOptions speaker_model_path handling) to use
the new lighter validator so diarization initialization no longer raises
FileNotFoundError for valid Vosk speaker model packages.

In `@livekit-plugins/livekit-plugins-vosk/README.md`:
- Line 23: Replace the bare URL in the README sentence "Vosk requires
pre-downloaded models. Download from: https://alphacephei.com/vosk/models" with
a Markdown link to satisfy MD034; edit that line so the URL is formatted as link
text (e.g., "Download the Vosk models" linking to
https://alphacephei.com/vosk/models) to remove the bare URL while preserving the
same destination and meaning.
- Line 168: Replace the bare URL "See https://alphacephei.com/vosk/models for
the complete list." in the supported-languages section of README.md with a
Markdown link (e.g., "See the VOSK models page
(https://alphacephei.com/vosk/models)" replaced by "See the VOSK models page:
[VOSK models](https://alphacephei.com/vosk/models)") to satisfy MD034; update
the exact line containing that sentence so the link is in Markdown form rather
than a raw URL.

🧹 Nitpick comments (1)

livekit-plugins/livekit-plugins-vosk/pyproject.toml (1)
14-24: Missing Python version classifiers.

The classifiers only list Python 3.9 and 3.10, but the requires-python = ">=3.9.0" constraint and livekit-agents support suggest Python 3.11, 3.12, and 3.13 should also be listed for consistency with the parent project.
Suggested classifiers update
 classifiers = [
     "Intended Audience :: Developers",
     "License :: OSI Approved :: Apache Software License",
     "Topic :: Multimedia :: Sound/Audio",
     "Topic :: Multimedia :: Video",
     "Topic :: Scientific/Engineering :: Artificial Intelligence",
     "Programming Language :: Python :: 3",
     "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
     "Programming Language :: Python :: 3 :: Only",
 ]

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and f6487b0.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

CONTRIBUTING.md
livekit-agents/pyproject.toml
livekit-plugins/livekit-plugins-vosk/README.md
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/py.typed
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
livekit-plugins/livekit-plugins-vosk/pyproject.toml
pyproject.toml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py

🧠 Learnings (2)

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class

Applied to files:

livekit-plugins/livekit-plugins-vosk/pyproject.toml
CONTRIBUTING.md

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

Applied to files:

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

🧬 Code graph analysis (1)

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py (1)

livekit-agents/livekit/agents/plugin.py (2)

Plugin (13-56)

register_plugin (31-36)

🪛 markdownlint-cli2 (0.18.1)

livekit-plugins/livekit-plugins-vosk/README.md

23-23: Bare URL used

(MD034, no-bare-urls)

168-168: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: type-check (3.13)
GitHub Check: type-check (3.9)
GitHub Check: unit-tests

🔇 Additional comments (18)

pyproject.toml (1)

55-55: LGTM!

The workspace member addition follows the established pattern for plugin registration.

CONTRIBUTING.md (1)

96-97: LGTM!

Adding the Vosk plugin to the mypy typechecking command ensures CI coverage for the new plugin.

livekit-agents/pyproject.toml (1)

111-111: Verify version alignment with other plugins.

The Vosk optional dependency uses >=1.3.10 while most other plugins in this file use >=1.3.12. Based on the PR commit message mentioning "update vosk plugin version to align with monorepo synchronization," verify this version is intentional or if it should be >=1.3.12 for consistency.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py (1)

1-15: LGTM!

Standard version module following the established plugin pattern with proper license header.

livekit-plugins/livekit-plugins-vosk/pyproject.toml (1)

25-29: Verify Vosk version requirement.

The PR description states vosk>=0.3.50 as a dependency, but this file declares vosk>=0.3.45. Please verify which version is the intended minimum requirement to ensure documentation and code are aligned.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py (1)

15-17: LGTM!

Standard logging module setup following Python best practices with __name__ for proper logger hierarchy.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py (1)

20-49: Model identifiers and default cache directory are well organized.

Nice centralization of constants for reuse.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py (10)

48-69: Thread-safe model caching looks solid.

Cache lock and lazy loading are straightforward.

71-78: STTOptions keeps configuration tidy.

Clear defaults and focused surface area.

131-142: Prewarm path is clean and safe.

Graceful logging on failure is fine for a best-effort warmup.

143-149: Model/provider accessors are clear.

151-204: Batch recognition flow is clean and offloaded appropriately.

205-215: Stream constructor wiring looks correct.

218-234: Stream initialization matches parent expectations.

235-387: Streaming loop and flush handling are well structured.

389-407: PCM16 conversion and downmixing are handled cleanly.

431-519: Result parsing maps Vosk output to SpeechEvent cleanly.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py (1)

20-45: Plugin registration and exports align with the LiveKit plugin pattern.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-24T08:13:29Z

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py

+def validate_model_path(model_path: str | Path) -> Path:
+    """


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Python 3.9 PEP 604 union operator str | Path support

💡 Result:

Python 3.9 does not support PEP 604’s union operator (X | Y) at runtime—it was introduced in Python 3.10. So str | Path is only supported on 3.10+. [1], [2], [3]

What to use on Python 3.9

Use typing.Union (or Optional when None is involved):

from pathlib import Path from typing import Union StrOrPath = Union[str, Path]

Optional[T] is equivalent to Union[T, None] (and T | None in 3.10+). [4]

About from __future__ import annotations on 3.9

It can defer evaluation of annotations (so the code may import/define without immediately erroring), but Python 3.9 still can’t evaluate str | Path as a type union when annotations are later resolved (e.g., via typing.get_type_hints). The portable solution remains Union[str, Path]. [3], [4]

Sources: PEP 604 (Python-Version 3.10) [1]; Python 3.10 “What’s New” (PEP 604) [2]; typing spec note “Before Python 3.10 … did not support |” [3]; Python 3.9 typing.Union/Optional docs [4].

🏁 Script executed:

fd "models.py" --path "*/vosk/*" -x cat -n {}

Repository: livekit/agents

Length of output: 273

🏁 Script executed:

find . -name "models.py" -path "*/vosk/*" | head -5

Repository: livekit/agents

Length of output: 127

🏁 Script executed:

cat -n ./livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | sed -n '45,60p'

Repository: livekit/agents

Length of output: 514

🏁 Script executed:

cat -n ./livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | head -30

Repository: livekit/agents

Length of output: 1204

Use Union[str, Path] instead of str | Path for Python 3.9 compatibility.

PEP 604's union operator (|) was introduced in Python 3.10; Python 3.9 does not support this syntax at runtime. The coding guidelines require Python 3.9+ compatibility.

Suggested fix

-from pathlib import Path +from pathlib import Path +from typing import Union @@ -def validate_model_path(model_path: str | Path) -> Path: +def validate_model_path(model_path: Union[str, Path]) -> Path:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def validate_model_path(model_path: str | Path) -> Path:

"""

def validate_model_path(model_path: Union[str, Path]) -> Path:

"""

🤖 Prompt for AI Agents

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py` around lines 51 - 52, The type annotation on validate_model_path uses the Python 3.10 union operator (str | Path) which breaks Python 3.9; update the function signature for validate_model_path to use typing.Union[str, Path] and add the corresponding import from typing (Union) at the top of the module so the annotation is compatible with Python 3.9.

coderabbitai · 2026-01-24T08:13:29Z

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

+    def __init__(
+        self,
+        *,
+        model_path: str,
+        language: str = "en",
+        sample_rate: int = 16000,
+        enable_words: bool = True,
+        max_alternatives: int = 0,
+        speaker_model_path: str | None = None,
+    ):
+        """
+        Create a new instance of Vosk STT.
+
+        Args:
+            model_path: Path to the Vosk model directory. Download models from
+                https://alphacephei.com/vosk/models
+            language: Language code for metadata (e.g., "en", "es", "fr")
+            sample_rate: Audio sample rate in Hz. Vosk typically uses 16000.
+            enable_words: Whether to include word-level timestamps in results
+            max_alternatives: Number of alternative transcriptions to return (0 = disabled)
+            speaker_model_path: Optional path to speaker identification model for diarization
+        """
+        # Validate model path exists
+        self._model_path = validate_model_path(model_path)
+        self._speaker_model_path = (
+            validate_model_path(speaker_model_path) if speaker_model_path else None
+        )
+
+        super().__init__(
+            capabilities=stt.STTCapabilities(
+                streaming=True,
+                interim_results=True,
+                diarization=bool(speaker_model_path),
+                aligned_transcript="word" if enable_words else False,
+                offline_recognize=True,
+            )
+        )
+
+        self._opts = STTOptions(
+            model_path=str(self._model_path),
+            sample_rate=sample_rate,
+            language=language,
+            enable_words=enable_words,
+            max_alternatives=max_alternatives,
+            speaker_model_path=str(self._speaker_model_path) if self._speaker_model_path else None,
+        )
+
+        self._label = f"vosk-{language}"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for the validate_model_path function definition find . -name "*.py" -type f | xargs grep -l "def validate_model_path" | head -5

Repository: livekit/agents

Length of output: 127

🏁 Script executed:

#!/bin/bash # Look for imports and usage of validate_model_path in the STT file grep -n "validate_model_path" livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

Repository: livekit/agents

Length of output: 251

🏁 Script executed:

#!/bin/bash # Find files related to validation or model handling fd -e py . livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/ | head -20

Repository: livekit/agents

Length of output: 394

🏁 Script executed:

#!/bin/bash # Read the models.py file to see validate_model_path implementation cat -n livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | head -100

Repository: livekit/agents

Length of output: 3264

🌐 Web query:

Vosk speaker model vosk-model-spk-0.4 directory structure contents files

💡 Result:

vosk-model-spk-0.4 (speaker ID model) unzips to a single folder (often renamed to model-spk/) that contains no subdirectories—just these files at the top level:

model-spk/ final.ext.raw mean.vec mfcc.conf README.txt transform.mat

This model is published as the “Speaker identification model” vosk-model-spk-0.4 on the official Vosk models page. [1] The above on-disk file set is shown directly in an example project’s model-spk folder listing. [2]

Sources: [1] [2]

Speaker model validation fails with validate_model_path.

The validate_model_path function enforces acoustic model structure (checking for am/final.mdl and conf/model.conf). However, Vosk speaker models (vosk-model-spk-0.4) have a flat directory structure with files like final.ext.raw, mean.vec, and mfcc.conf—no am/ or conf/ subdirectories. Passing a speaker model path to validate_model_path will raise FileNotFoundError at init, breaking diarization.

Implement a dedicated validation function for speaker models that only checks for path existence and directory status, or conditionally apply the lighter validation when speaker_model_path is provided.

🤖 Prompt for AI Agents

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py` around lines 82 - 129, The constructor (__init__) currently calls validate_model_path for both acoustic models and speaker models, but Vosk speaker models use a flat layout and will fail that strict check; modify initialization to use a lighter validation for speaker models: either create a new validate_speaker_model_path that only verifies that the provided speaker_model_path exists and is a directory/file (or None), or conditionally call a simpler check when setting self._speaker_model_path instead of validate_model_path; update the assignment in __init__ (the line assigning self._speaker_model_path and the STTOptions speaker_model_path handling) to use the new lighter validator so diarization initialization no longer raises FileNotFoundError for valid Vosk speaker model packages.

coderabbitai · 2026-01-24T08:13:29Z

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

+def _convert_audio_buffer_to_pcm16(buffer: AudioBuffer, target_sample_rate: int) -> bytes:
+    """
+    Convert AudioBuffer to PCM16 format for Vosk.
+    """
+    # Merge all frames in the buffer
+    merged_frame = buffer.merge()  # type: ignore
+
+    # Resample if needed
+    if merged_frame.sample_rate != target_sample_rate:
+        resampler = rtc.AudioResampler(
+            merged_frame.sample_rate,
+            target_sample_rate,
+            quality=rtc.AudioResamplerQuality.HIGH,
+        )
+        frames = resampler.push(merged_frame)
+        if frames:
+            merged_frame = frames[0]
+
+    return _convert_frame_to_pcm16(merged_frame)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the file and understand the imports find . -name "stt.py" -path "*/vosk/*" | head -5

Repository: livekit/agents

Length of output: 124

🏁 Script executed:

#!/bin/bash # Search for AudioResampler definition and usage rg -n "class AudioResampler|def push" -S -A 3 | head -50

Repository: livekit/agents

Length of output: 3884

🏁 Script executed:

#!/bin/bash # Look for type stubs (.pyi files) that might define AudioResampler fd -e pyi | xargs rg -l "AudioResampler" 2>/dev/null | head -10

Repository: livekit/agents

Length of output: 40

🏁 Script executed:

#!/bin/bash # Check imports in the stt.py file to see where rtc comes from head -50 livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py | rg -n "import|from"

Repository: livekit/agents

Length of output: 514

🏁 Script executed:

#!/bin/bash # Search for AudioResampler usage throughout the codebase rg "AudioResampler" -n -B 2 -A 5

Repository: livekit/agents

Length of output: 35142

🏁 Script executed:

#!/bin/bash # Check if there are Python stubs or type hints in the codebase find . -name "*.pyi" | xargs grep -l "AudioResampler" 2>/dev/null

Repository: livekit/agents

Length of output: 42

🏁 Script executed:

#!/bin/bash # Look at the full context of the function in question sed -n '410,428p' livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

Repository: livekit/agents

Length of output: 706

🏁 Script executed:

#!/bin/bash # Check if there are any other usages of resampler.push() pattern in codebase rg "resampler\.push\(" -n -B 2 -A 5

Repository: livekit/agents

Length of output: 17680

Handle all frames returned by AudioResampler.push() to prevent audio loss.

AudioResampler.push() returns a list of frames. Using only frames[0] drops remaining resampled audio. The same file handles this correctly at lines 320-325 by iterating through all frames. Concatenate all resampled frames:

Suggested fix

- frames = resampler.push(merged_frame) - if frames: - merged_frame = frames[0] - - return _convert_frame_to_pcm16(merged_frame) + frames = resampler.push(merged_frame) + if frames: + return b"".join(_convert_frame_to_pcm16(f) for f in frames) + + return _convert_frame_to_pcm16(merged_frame)

🤖 Prompt for AI Agents

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py` around lines 410 - 428, _in _convert_audio_buffer_to_pcm16_, the current resampling logic only uses the first element of AudioResampler.push() (frames[0]), which drops any additional resampled frames; update the function to iterate through all frames returned by resampler.push(merged_frame), concatenate them (or merge/append into a single combined frame replacing merged_frame) before calling _convert_frame_to_pcm16, ensuring no resampled audio is lost—use the same approach as the earlier resampling handling that loops over frames to build the complete merged_frame.

coderabbitai · 2026-01-24T08:13:29Z

livekit-plugins/livekit-plugins-vosk/README.md

+
+## Download Models
+
+Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models


⚠️ Potential issue | 🟡 Minor

Replace bare model-download URL with a Markdown link.

Markdownlint MD034 disallows bare URLs in prose.

♻️ Suggested fix

-Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models +Vosk requires pre-downloaded models. Download from: [Vosk models](https://alphacephei.com/vosk/models)

Based on static analysis hints, please address MD034.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models

Vosk requires pre-downloaded models. Download from: [Vosk models](https://alphacephei.com/vosk/models)

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

23-23: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In `@livekit-plugins/livekit-plugins-vosk/README.md` at line 23, Replace the bare URL in the README sentence "Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models" with a Markdown link to satisfy MD034; edit that line so the URL is formatted as link text (e.g., "Download the Vosk models" linking to https://alphacephei.com/vosk/models) to remove the bare URL while preserving the same destination and meaning.

coderabbitai · 2026-01-24T08:13:29Z

livekit-plugins/livekit-plugins-vosk/README.md

+- Swedish
+- And more...
+
+See https://alphacephei.com/vosk/models for the complete list.


⚠️ Potential issue | 🟡 Minor

Replace bare URL with a Markdown link in the supported-languages section.

This keeps the README compliant with MD034.

♻️ Suggested fix

-See https://alphacephei.com/vosk/models for the complete list. +See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.

Based on static analysis hints, please address MD034.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

See https://alphacephei.com/vosk/models for the complete list.

See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

168-168: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In `@livekit-plugins/livekit-plugins-vosk/README.md` at line 168, Replace the bare URL "See https://alphacephei.com/vosk/models for the complete list." in the supported-languages section of README.md with a Markdown link (e.g., "See the VOSK models page (https://alphacephei.com/vosk/models)" replaced by "See the VOSK models page: [VOSK models](https://alphacephei.com/vosk/models)") to satisfy MD034; update the exact line containing that sentence so the link is in Markdown form rather than a raw URL.

darshankparmar added 2 commits January 7, 2026 12:35

feat: Add Vosk STT plugin for offline speech-to-text

e37f717

fix: update vosk plugin version to align with monorepo synchronization

b800e4a

Merge branch 'main' into feat/vosk-stt-plugin

f6487b0

update version vosk - 1.3.12

b9f2638

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Vosk STT plugin for offline speech-to-text #4466

feat: Add Vosk STT plugin for offline speech-to-text #4466

Uh oh!

darshankparmar commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

darshankparmar commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

coderabbitai bot Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Download Models

		Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models

	See https://alphacephei.com/vosk/models for the complete list.
	See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.

feat: Add Vosk STT plugin for offline speech-to-text #4466

Are you sure you want to change the base?

feat: Add Vosk STT plugin for offline speech-to-text #4466

Uh oh!

Conversation

darshankparmar commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependencies

Supported Languages

Summary by CodeRabbit

Uh oh!

darshankparmar commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

What to use on Python 3.9

About from __future__ import annotations on 3.9

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

darshankparmar commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

About `from future import annotations` on 3.9