Skip to content

Conversation

@kimdwkimdw
Copy link
Contributor

@kimdwkimdw kimdwkimdw commented Sep 8, 2025

RTZR(ReturnZero) STT Plugin for LiveKit Agents

This PR introduces a new ReturnZero (RTZR) plugin for the LiveKit Agents framework, enabling real-time speech recognition through the RTZR Open API.

ReturnZero is a leading speech AI provider in Korea, widely recognized for its superior Korean STT accuracy, as highlighted in Awesome Korean Speech Recognition. This integration allows developers to build high-quality, real-time Korean conversational agents on top of LiveKit.

Components

  • stt.py – Primary implementation of the RTZR STT class, built on RTZROpenAPIClient. Handles WebSocket connection management, PCM (LINEAR16) audio streaming, endpoint detection, and transcript event emission.
  • SpeechStream – Streaming recognition pipeline with interim/final transcript events, error handling, and server-side endpoint detection support.

Example Usage

We adapted the simplest voice AI example to use RTZR STT instead of Azure:

from dotenv import dotenv_values, load_dotenv

from livekit import agents
from livekit.agents import Agent, AgentSession, RoomInputOptions
from livekit.plugins import (
    noise_cancellation,
    openai,
    rtzr,
    silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")
config = dotenv_values(".env.local")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")


async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        llm=openai.LLM.with_azure(
            model="gpt-4.1-mini",
            azure_deployment=config["AZURE_LLM_DEPLOYMENT"],
            azure_endpoint=config["AZURE_OPENAI_ENDPOINT"],
            api_version="2024-12-01-preview",
        ),
        stt=rtzr.STT(model="sommers_ko", language="ko"),
        tts=openai.TTS.with_azure(
            model="gpt-4o-mini-tts",
            voice="ash",
            azure_deployment=config["AZURE_TTS_DEPLOYMENT"],
            azure_endpoint=config["AZURE_OPENAI_ENDPOINT"],
            api_version="2025-03-01-preview",
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await session.generate_reply(instructions="Greet the user and offer your assistance.")


if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

API Authentication

To use RTZR STT, you need to obtain client credentials from ReturnZero. Set them in your environment:

export RTZR_CLIENT_ID="your_client_id_here"
export RTZR_CLIENT_SECRET="your_client_secret_here"

Make sure to also include them in your .env.local file for local development.

Manual Tests Done

  • Verified connection establishment with RTZR WebSocket API.
  • Confirmed interim and final transcript events for Korean speech.
  • Confirmed EOS handling and proper cleanup after session end.
  • Compared results with Azure STT baseline – RTZR showed higher accuracy in Korean speech transcription, particularly in colloquial and conversational contexts.

References

@CLAassistant
Copy link

CLAassistant commented Sep 8, 2025

CLA assistant check
All committers have signed the CLA.

@kimdwkimdw
Copy link
Contributor Author

@longcw @theomonnom Could you review this PR?

Copy link
Contributor

@longcw longcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! something nit

@kimdwkimdw
Copy link
Contributor Author

@longcw simplified logger with @utils.log_exceptions(logger=logger)

@longcw longcw merged commit f54447c into livekit:main Oct 10, 2025
9 checks passed
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
meetakshay99 added a commit to meetakshay99/agents that referenced this pull request Oct 30, 2025
* staging: (114 commits)
  Added min_confidence_threshold for deepgram flux.
  livekit-agents 1.2.15 (livekit#3658)
  fix livekit#3650 cartesia version backward compatibility (livekit#3651)
  Unprompted STT Reconnection at startup (livekit#3649)
  enable zero retention mode in elevenlabs (livekit#3647)
  fix: heartbeat (livekit#3648)
  feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498)
  turn_detection: reduce max_endpointing_delay to 3s (livekit#3640)
  fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573)
  add backwards compatibility for google's realtime model (livekit#3630)
  Align Google STT plugin with official documentation (livekit#3628)
  feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524)
  fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608)
  chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624)
  Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376)
  feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594)
  catch delete_room errors and disable delete_room_on_close by default (livekit#3600)
  lift google realtime api out of beta (livekit#3614)
  fix: lock pyav to <16 due to build issue (livekit#3593)
  Updating Cartesia Version (livekit#3570)
  ...
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants