Skip to content

Conversation

@darshankparmar
Copy link
Contributor

@darshankparmar darshankparmar commented Jan 7, 2026

This PR introduces livekit-plugins-vosk, a new offline Speech-to-Text (STT) plugin for the LiveKit Agents framework. This plugin enables fully local, privacy-focused speech recognition without relying on external API dependencies.

Dependencies

  • vosk>=0.3.50
  • numpy>=1.26

Supported Languages

Vosk supports 20+ languages:

  • English (US, Indian)
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • Chinese
  • Russian
  • Japanese
  • Turkish
  • And more...

See https://alphacephei.com/vosk/models for the complete list.

Summary by CodeRabbit

  • New Features

    • Added Vosk plugin for offline, multilingual speech-to-text with streaming recognition, word-level timestamps, and optional speaker diarization.
  • Documentation

    • Added comprehensive README with installation, model download, configuration, and usage examples.
  • Chores

    • Made the Vosk plugin available as an optional install and included it in the workspace for easier installation and discovery.

✏️ Tip: You can customize this high-level summary in your review settings.

@darshankparmar
Copy link
Contributor Author

Hi @davidzhao 👋
This PR is ready for review.
Please let me know if you’d like any changes or additional tests.
Thanks!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 24, 2026

📝 Walkthrough

Walkthrough

This PR adds a new Vosk-based LiveKit Agents plugin (livekit-plugins-vosk) with packaging, optional dependency wiring, model management, a streaming STT implementation (with speaker diarization support), documentation, and automatic plugin registration.

Changes

Cohort / File(s) Summary
Documentation
CONTRIBUTING.md, livekit-plugins/livekit-plugins-vosk/README.md
CONTRIBUTING example pip command updated to include -p livekit.plugins.vosk. Added a comprehensive README for the Vosk plugin describing install, model download, usage, config, languages, and limitations.
Workspace & Packaging
pyproject.toml, livekit-plugins/livekit-plugins-vosk/pyproject.toml, livekit-agents/pyproject.toml
Added livekit-plugins-vosk as a workspace source. Created pyproject.toml for livekit-plugins-vosk (Hatch build), declaring dependencies including vosk>=0.3.45, numpy>=1.26, and livekit-agents>=1.3.12. Added optional dependency vosk = ["livekit-plugins-vosk>=1.3.12"] to livekit-agents.
Plugin bootstrap & metadata
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py, .../version.py, .../log.py
Added VoskPlugin class with automatic registration on import, module-level __version__ = "1.3.12", and a module logger. Exports STT, SpeechStream, and version.
Model management
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py
Added VoskModels identifiers for multiple languages, DEFAULT_MODEL_DIR, and validate_model_path() which checks model directory and required files (am/final.mdl, conf/model.conf).
STT & streaming logic
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
Implemented STT and SpeechStream classes with model caching, optional speaker model support, prewarm/recognize/stream APIs, PCM16 conversion, resampling, async frame processing, and _parse_vosk_result() to emit SpeechEvent data (final/interim, word timestamps, diarization).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant STT as STT Instance
    participant Stream as SpeechStream
    participant Audio as Audio Processing
    participant Recognizer as KaldiRecognizer
    participant Parser as Result Parser

    App->>STT: stream()
    activate STT
    STT->>Stream: create SpeechStream
    deactivate STT
    activate Stream

    loop Frame Flow
        App->>Stream: push_frame(AudioFrame)
        activate Audio
        Stream->>Audio: convert_frame_to_pcm16()
        Audio-->>Stream: pcm16 bytes
        Stream->>Audio: resample if needed
        Audio-->>Stream: resampled bytes
        deactivate Audio

        Stream->>Recognizer: accept_waveform(pcm_data)
        activate Recognizer
        Recognizer-->>Stream: JSON result
        deactivate Recognizer

        Stream->>Parser: parse_vosk_result(json)
        activate Parser
        Parser-->>Stream: SpeechEvent
        deactivate Parser

        Stream-->>App: emit SpeechEvent
    end

    App->>Stream: flush()
    Stream->>Recognizer: get_result()
    Recognizer-->>Stream: final JSON
    Stream->>Parser: parse_vosk_result(json, is_final=true)
    Parser-->>Stream: final SpeechEvent
    Stream-->>App: emit final SpeechEvent
    deactivate Stream
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • davidzhao
  • tinalenguyen
  • longcw

Poem

🐰 I hop on bytes and models fine,

Offline whispers turned to line by line.
Frames to words, with speakers clear —
A tiny plugin, bringing sound near.
Hooray for Vosk, hop, and cheer! 🎙️🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a Vosk STT plugin for offline speech-to-text. It is clear, concise, and directly reflects the primary objective of the pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6487b0 and b9f2638.

📒 Files selected for processing (3)
  • livekit-agents/pyproject.toml
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
  • livekit-plugins/livekit-plugins-vosk/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (2)
  • livekit-agents/pyproject.toml
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class

Applied to files:

  • livekit-plugins/livekit-plugins-vosk/pyproject.toml
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-vosk/pyproject.toml (2)

1-24: Solid project/build metadata setup.
Everything looks consistent with Hatch usage and standard packaging metadata.


25-29: Keep the current vosk version requirement unchanged—0.3.50 is not yet available on PyPI.

The current vosk>=0.3.45 dependency is correct. PyPI only publishes up to version 0.3.45 (released Dec 14, 2022); version 0.3.50 exists only in the upstream repository and is not available on the package index. Bumping to vosk>=0.3.50 would break installations. Align the PR objectives with the actual available version.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py`:
- Around line 51-52: The type annotation on validate_model_path uses the Python
3.10 union operator (str | Path) which breaks Python 3.9; update the function
signature for validate_model_path to use typing.Union[str, Path] and add the
corresponding import from typing (Union) at the top of the module so the
annotation is compatible with Python 3.9.

In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py`:
- Around line 410-428: _in _convert_audio_buffer_to_pcm16_, the current
resampling logic only uses the first element of AudioResampler.push()
(frames[0]), which drops any additional resampled frames; update the function to
iterate through all frames returned by resampler.push(merged_frame), concatenate
them (or merge/append into a single combined frame replacing merged_frame)
before calling _convert_frame_to_pcm16, ensuring no resampled audio is lost—use
the same approach as the earlier resampling handling that loops over frames to
build the complete merged_frame.
- Around line 82-129: The constructor (__init__) currently calls
validate_model_path for both acoustic models and speaker models, but Vosk
speaker models use a flat layout and will fail that strict check; modify
initialization to use a lighter validation for speaker models: either create a
new validate_speaker_model_path that only verifies that the provided
speaker_model_path exists and is a directory/file (or None), or conditionally
call a simpler check when setting self._speaker_model_path instead of
validate_model_path; update the assignment in __init__ (the line assigning
self._speaker_model_path and the STTOptions speaker_model_path handling) to use
the new lighter validator so diarization initialization no longer raises
FileNotFoundError for valid Vosk speaker model packages.

In `@livekit-plugins/livekit-plugins-vosk/README.md`:
- Line 23: Replace the bare URL in the README sentence "Vosk requires
pre-downloaded models. Download from: https://alphacephei.com/vosk/models" with
a Markdown link to satisfy MD034; edit that line so the URL is formatted as link
text (e.g., "Download the Vosk models" linking to
https://alphacephei.com/vosk/models) to remove the bare URL while preserving the
same destination and meaning.
- Line 168: Replace the bare URL "See https://alphacephei.com/vosk/models for
the complete list." in the supported-languages section of README.md with a
Markdown link (e.g., "See the VOSK models page
(https://alphacephei.com/vosk/models)" replaced by "See the VOSK models page:
[VOSK models](https://alphacephei.com/vosk/models)") to satisfy MD034; update
the exact line containing that sentence so the link is in Markdown form rather
than a raw URL.
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-vosk/pyproject.toml (1)

14-24: Missing Python version classifiers.

The classifiers only list Python 3.9 and 3.10, but the requires-python = ">=3.9.0" constraint and livekit-agents support suggest Python 3.11, 3.12, and 3.13 should also be listed for consistency with the parent project.

Suggested classifiers update
 classifiers = [
     "Intended Audience :: Developers",
     "License :: OSI Approved :: Apache Software License",
     "Topic :: Multimedia :: Sound/Audio",
     "Topic :: Multimedia :: Video",
     "Topic :: Scientific/Engineering :: Artificial Intelligence",
     "Programming Language :: Python :: 3",
     "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
     "Programming Language :: Python :: 3 :: Only",
 ]
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and f6487b0.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • CONTRIBUTING.md
  • livekit-agents/pyproject.toml
  • livekit-plugins/livekit-plugins-vosk/README.md
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/py.typed
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
  • livekit-plugins/livekit-plugins-vosk/pyproject.toml
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class

Applied to files:

  • livekit-plugins/livekit-plugins-vosk/pyproject.toml
  • CONTRIBUTING.md
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

Applied to files:

  • livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py (1)
livekit-agents/livekit/agents/plugin.py (2)
  • Plugin (13-56)
  • register_plugin (31-36)
🪛 markdownlint-cli2 (0.18.1)
livekit-plugins/livekit-plugins-vosk/README.md

23-23: Bare URL used

(MD034, no-bare-urls)


168-168: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)
  • GitHub Check: unit-tests
🔇 Additional comments (18)
pyproject.toml (1)

55-55: LGTM!

The workspace member addition follows the established pattern for plugin registration.

CONTRIBUTING.md (1)

96-97: LGTM!

Adding the Vosk plugin to the mypy typechecking command ensures CI coverage for the new plugin.

livekit-agents/pyproject.toml (1)

111-111: Verify version alignment with other plugins.

The Vosk optional dependency uses >=1.3.10 while most other plugins in this file use >=1.3.12. Based on the PR commit message mentioning "update vosk plugin version to align with monorepo synchronization," verify this version is intentional or if it should be >=1.3.12 for consistency.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/version.py (1)

1-15: LGTM!

Standard version module following the established plugin pattern with proper license header.

livekit-plugins/livekit-plugins-vosk/pyproject.toml (1)

25-29: Verify Vosk version requirement.

The PR description states vosk>=0.3.50 as a dependency, but this file declares vosk>=0.3.45. Please verify which version is the intended minimum requirement to ensure documentation and code are aligned.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/log.py (1)

15-17: LGTM!

Standard logging module setup following Python best practices with __name__ for proper logger hierarchy.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py (1)

20-49: Model identifiers and default cache directory are well organized.

Nice centralization of constants for reuse.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py (10)

48-69: Thread-safe model caching looks solid.

Cache lock and lazy loading are straightforward.


71-78: STTOptions keeps configuration tidy.

Clear defaults and focused surface area.


131-142: Prewarm path is clean and safe.

Graceful logging on failure is fine for a best-effort warmup.


143-149: Model/provider accessors are clear.


151-204: Batch recognition flow is clean and offloaded appropriately.


205-215: Stream constructor wiring looks correct.


218-234: Stream initialization matches parent expectations.


235-387: Streaming loop and flush handling are well structured.


389-407: PCM16 conversion and downmixing are handled cleanly.


431-519: Result parsing maps Vosk output to SpeechEvent cleanly.

livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/__init__.py (1)

20-45: Plugin registration and exports align with the LiveKit plugin pattern.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +51 to +52
def validate_model_path(model_path: str | Path) -> Path:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Python 3.9 PEP 604 union operator str | Path support

💡 Result:

Python 3.9 does not support PEP 604’s union operator (X | Y) at runtime—it was introduced in Python 3.10. So str | Path is only supported on 3.10+. [1], [2], [3]

What to use on Python 3.9

Use typing.Union (or Optional when None is involved):

from pathlib import Path
from typing import Union

StrOrPath = Union[str, Path]

Optional[T] is equivalent to Union[T, None] (and T | None in 3.10+). [4]

About from __future__ import annotations on 3.9

It can defer evaluation of annotations (so the code may import/define without immediately erroring), but Python 3.9 still can’t evaluate str | Path as a type union when annotations are later resolved (e.g., via typing.get_type_hints). The portable solution remains Union[str, Path]. [3], [4]

Sources: PEP 604 (Python-Version 3.10) [1]; Python 3.10 “What’s New” (PEP 604) [2]; typing spec note “Before Python 3.10 … did not support |” [3]; Python 3.9 typing.Union/Optional docs [4].


🏁 Script executed:

fd "models.py" --path "*/vosk/*" -x cat -n {}

Repository: livekit/agents

Length of output: 273


🏁 Script executed:

find . -name "models.py" -path "*/vosk/*" | head -5

Repository: livekit/agents

Length of output: 127


🏁 Script executed:

cat -n ./livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | sed -n '45,60p'

Repository: livekit/agents

Length of output: 514


🏁 Script executed:

cat -n ./livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | head -30

Repository: livekit/agents

Length of output: 1204


Use Union[str, Path] instead of str | Path for Python 3.9 compatibility.

PEP 604's union operator (|) was introduced in Python 3.10; Python 3.9 does not support this syntax at runtime. The coding guidelines require Python 3.9+ compatibility.

Suggested fix
-from pathlib import Path
+from pathlib import Path
+from typing import Union
@@
-def validate_model_path(model_path: str | Path) -> Path:
+def validate_model_path(model_path: Union[str, Path]) -> Path:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def validate_model_path(model_path: str | Path) -> Path:
"""
def validate_model_path(model_path: Union[str, Path]) -> Path:
"""
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py` around
lines 51 - 52, The type annotation on validate_model_path uses the Python 3.10
union operator (str | Path) which breaks Python 3.9; update the function
signature for validate_model_path to use typing.Union[str, Path] and add the
corresponding import from typing (Union) at the top of the module so the
annotation is compatible with Python 3.9.

Comment on lines +82 to +129
def __init__(
self,
*,
model_path: str,
language: str = "en",
sample_rate: int = 16000,
enable_words: bool = True,
max_alternatives: int = 0,
speaker_model_path: str | None = None,
):
"""
Create a new instance of Vosk STT.

Args:
model_path: Path to the Vosk model directory. Download models from
https://alphacephei.com/vosk/models
language: Language code for metadata (e.g., "en", "es", "fr")
sample_rate: Audio sample rate in Hz. Vosk typically uses 16000.
enable_words: Whether to include word-level timestamps in results
max_alternatives: Number of alternative transcriptions to return (0 = disabled)
speaker_model_path: Optional path to speaker identification model for diarization
"""
# Validate model path exists
self._model_path = validate_model_path(model_path)
self._speaker_model_path = (
validate_model_path(speaker_model_path) if speaker_model_path else None
)

super().__init__(
capabilities=stt.STTCapabilities(
streaming=True,
interim_results=True,
diarization=bool(speaker_model_path),
aligned_transcript="word" if enable_words else False,
offline_recognize=True,
)
)

self._opts = STTOptions(
model_path=str(self._model_path),
sample_rate=sample_rate,
language=language,
enable_words=enable_words,
max_alternatives=max_alternatives,
speaker_model_path=str(self._speaker_model_path) if self._speaker_model_path else None,
)

self._label = f"vosk-{language}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the validate_model_path function definition
find . -name "*.py" -type f | xargs grep -l "def validate_model_path" | head -5

Repository: livekit/agents

Length of output: 127


🏁 Script executed:

#!/bin/bash
# Look for imports and usage of validate_model_path in the STT file
grep -n "validate_model_path" livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

Repository: livekit/agents

Length of output: 251


🏁 Script executed:

#!/bin/bash
# Find files related to validation or model handling
fd -e py . livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/ | head -20

Repository: livekit/agents

Length of output: 394


🏁 Script executed:

#!/bin/bash
# Read the models.py file to see validate_model_path implementation
cat -n livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/models.py | head -100

Repository: livekit/agents

Length of output: 3264


🌐 Web query:

Vosk speaker model vosk-model-spk-0.4 directory structure contents files

💡 Result:

vosk-model-spk-0.4 (speaker ID model) unzips to a single folder (often renamed to model-spk/) that contains no subdirectories—just these files at the top level:

model-spk/
  final.ext.raw
  mean.vec
  mfcc.conf
  README.txt
  transform.mat

This model is published as the “Speaker identification model” vosk-model-spk-0.4 on the official Vosk models page. [1] The above on-disk file set is shown directly in an example project’s model-spk folder listing. [2]

Sources: [1] [2]


Speaker model validation fails with validate_model_path.

The validate_model_path function enforces acoustic model structure (checking for am/final.mdl and conf/model.conf). However, Vosk speaker models (vosk-model-spk-0.4) have a flat directory structure with files like final.ext.raw, mean.vec, and mfcc.conf—no am/ or conf/ subdirectories. Passing a speaker model path to validate_model_path will raise FileNotFoundError at init, breaking diarization.

Implement a dedicated validation function for speaker models that only checks for path existence and directory status, or conditionally apply the lighter validation when speaker_model_path is provided.

🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py` around
lines 82 - 129, The constructor (__init__) currently calls validate_model_path
for both acoustic models and speaker models, but Vosk speaker models use a flat
layout and will fail that strict check; modify initialization to use a lighter
validation for speaker models: either create a new validate_speaker_model_path
that only verifies that the provided speaker_model_path exists and is a
directory/file (or None), or conditionally call a simpler check when setting
self._speaker_model_path instead of validate_model_path; update the assignment
in __init__ (the line assigning self._speaker_model_path and the STTOptions
speaker_model_path handling) to use the new lighter validator so diarization
initialization no longer raises FileNotFoundError for valid Vosk speaker model
packages.

Comment on lines +410 to +428
def _convert_audio_buffer_to_pcm16(buffer: AudioBuffer, target_sample_rate: int) -> bytes:
"""
Convert AudioBuffer to PCM16 format for Vosk.
"""
# Merge all frames in the buffer
merged_frame = buffer.merge() # type: ignore

# Resample if needed
if merged_frame.sample_rate != target_sample_rate:
resampler = rtc.AudioResampler(
merged_frame.sample_rate,
target_sample_rate,
quality=rtc.AudioResamplerQuality.HIGH,
)
frames = resampler.push(merged_frame)
if frames:
merged_frame = frames[0]

return _convert_frame_to_pcm16(merged_frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the file and understand the imports
find . -name "stt.py" -path "*/vosk/*" | head -5

Repository: livekit/agents

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Search for AudioResampler definition and usage
rg -n "class AudioResampler|def push" -S -A 3 | head -50

Repository: livekit/agents

Length of output: 3884


🏁 Script executed:

#!/bin/bash
# Look for type stubs (.pyi files) that might define AudioResampler
fd -e pyi | xargs rg -l "AudioResampler" 2>/dev/null | head -10

Repository: livekit/agents

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Check imports in the stt.py file to see where rtc comes from
head -50 livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py | rg -n "import|from"

Repository: livekit/agents

Length of output: 514


🏁 Script executed:

#!/bin/bash
# Search for AudioResampler usage throughout the codebase
rg "AudioResampler" -n -B 2 -A 5

Repository: livekit/agents

Length of output: 35142


🏁 Script executed:

#!/bin/bash
# Check if there are Python stubs or type hints in the codebase
find . -name "*.pyi" | xargs grep -l "AudioResampler" 2>/dev/null

Repository: livekit/agents

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Look at the full context of the function in question
sed -n '410,428p' livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py

Repository: livekit/agents

Length of output: 706


🏁 Script executed:

#!/bin/bash
# Check if there are any other usages of resampler.push() pattern in codebase
rg "resampler\.push\(" -n -B 2 -A 5

Repository: livekit/agents

Length of output: 17680


Handle all frames returned by AudioResampler.push() to prevent audio loss.

AudioResampler.push() returns a list of frames. Using only frames[0] drops remaining resampled audio. The same file handles this correctly at lines 320-325 by iterating through all frames. Concatenate all resampled frames:

Suggested fix
-        frames = resampler.push(merged_frame)
-        if frames:
-            merged_frame = frames[0]
-
-    return _convert_frame_to_pcm16(merged_frame)
+        frames = resampler.push(merged_frame)
+        if frames:
+            return b"".join(_convert_frame_to_pcm16(f) for f in frames)
+
+    return _convert_frame_to_pcm16(merged_frame)
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/livekit/plugins/vosk/stt.py` around
lines 410 - 428, _in _convert_audio_buffer_to_pcm16_, the current resampling
logic only uses the first element of AudioResampler.push() (frames[0]), which
drops any additional resampled frames; update the function to iterate through
all frames returned by resampler.push(merged_frame), concatenate them (or
merge/append into a single combined frame replacing merged_frame) before calling
_convert_frame_to_pcm16, ensuring no resampled audio is lost—use the same
approach as the earlier resampling handling that loops over frames to build the
complete merged_frame.


## Download Models

Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace bare model-download URL with a Markdown link.

Markdownlint MD034 disallows bare URLs in prose.

♻️ Suggested fix
-Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models
+Vosk requires pre-downloaded models. Download from: [Vosk models](https://alphacephei.com/vosk/models)
Based on static analysis hints, please address MD034.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Vosk requires pre-downloaded models. Download from: https://alphacephei.com/vosk/models
Vosk requires pre-downloaded models. Download from: [Vosk models](https://alphacephei.com/vosk/models)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

23-23: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/README.md` at line 23, Replace the bare
URL in the README sentence "Vosk requires pre-downloaded models. Download from:
https://alphacephei.com/vosk/models" with a Markdown link to satisfy MD034; edit
that line so the URL is formatted as link text (e.g., "Download the Vosk models"
linking to https://alphacephei.com/vosk/models) to remove the bare URL while
preserving the same destination and meaning.

- Swedish
- And more...

See https://alphacephei.com/vosk/models for the complete list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace bare URL with a Markdown link in the supported-languages section.

This keeps the README compliant with MD034.

♻️ Suggested fix
-See https://alphacephei.com/vosk/models for the complete list.
+See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.
Based on static analysis hints, please address MD034.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
See https://alphacephei.com/vosk/models for the complete list.
See the [Vosk models list](https://alphacephei.com/vosk/models) for the complete list.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

168-168: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-vosk/README.md` at line 168, Replace the bare
URL "See https://alphacephei.com/vosk/models for the complete list." in the
supported-languages section of README.md with a Markdown link (e.g., "See the
VOSK models page (https://alphacephei.com/vosk/models)" replaced by "See the
VOSK models page: [VOSK models](https://alphacephei.com/vosk/models)") to
satisfy MD034; update the exact line containing that sentence so the link is in
Markdown form rather than a raw URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant