Align Google STT plugin with official documentation #3628

mrkowalski · 2025-10-12T13:23:16Z

This PR aligns Livekit's Google plugin with STT API documentation: https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.StreamingRecognizeResponse

Especially this:

results: This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one is_final=true result (the newly settled portion), followed by zero or more is_final=false results (the interim results).

It effectively ignores min_confidence_threshold for final results, which closes #3495.

I have smoke-tested it with the following models:

latest_long
chirp_2
chirp_3

Corresponding unit-tests are in tests/test_plugin_google_stt.py

davidzhao

awesome work @mrkowalski!

benorama · 2025-10-21T21:14:13Z

Do you have a running example of STT using Google chirp 3 ?

When trying the following basic agent with ADK v1.16.0, I'm getting a timeout 500 error or 400 "Chirp 3 does not currently support word timestamps".
The same code works fine by simply replacing STT to use Deepgram.

from dotenv import load_dotenv

# ruff: noqa: E402
load_dotenv()

from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomInputOptions,
    WorkerOptions,
    cli,
)
from livekit.plugins import google, noise_cancellation, silero
from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins.turn_detector.multilingual import MultilingualModel

LANGUAGE = "en-US"
LANGUAGE_CODE = "en"
ASSISTANT_PROMPT = "You are a helpful voice AI assistant."

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions=ASSISTANT_PROMPT)


async def entrypoint(ctx: JobContext) -> None:
    session: AgentSession = AgentSession(
        vad=silero.VAD.load(),
        # stt=f"deepgram/nova-3:{LANGUAGE_CODE}",
        stt=google.STT(languages=[LANGUAGE], location="us", model="chirp_3", min_confidence_threshold=0.0, spoken_punctuation=False),
        llm="google/gemini-2.5-flash",
        tts=google.TTS(language=LANGUAGE, voice_name=f"{LANGUAGE}-chirp3-HD-Achernar"),
        # preemptive_generation=True,
        turn_detection=EnglishModel() if LANGUAGE_CODE == "en" else MultilingualModel(),
    )

    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    # Join the room and connect to the user
    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Thanks.

heyitskim191296 · 2025-10-22T09:08:46Z

@benorama I set enable_word_time_offsets to False and the above error disappears.

Here is a full example:

stt=google.STT(model="chirp_3",
location='us',
languages=["fil-PH"],
min_confidence_threshold=0.0,
enable_word_time_offsets=False
)

benorama · 2025-10-22T12:16:33Z

Thanks @heyitskim191296 .
It works now, even if I'm still facing gRPC connection timeout: locally, the agent starts working after an initial 60s timeout. But that's probably another issue.

* staging: (114 commits) Added min_confidence_threshold for deepgram flux. livekit-agents 1.2.15 (livekit#3658) fix livekit#3650 cartesia version backward compatibility (livekit#3651) Unprompted STT Reconnection at startup (livekit#3649) enable zero retention mode in elevenlabs (livekit#3647) fix: heartbeat (livekit#3648) feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498) turn_detection: reduce max_endpointing_delay to 3s (livekit#3640) fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573) add backwards compatibility for google's realtime model (livekit#3630) Align Google STT plugin with official documentation (livekit#3628) feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524) fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608) chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624) Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376) feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594) catch delete_room errors and disable delete_room_on_close by default (livekit#3600) lift google realtime api out of beta (livekit#3614) fix: lock pyav to <16 due to build issue (livekit#3593) Updating Cartesia Version (livekit#3570) ...

mrkowalski added 2 commits October 12, 2025 14:43

fixes livekit#3495

21ba58f

livekit#3495 - tests and a following fix

ceb9bc4

mrkowalski mentioned this pull request Oct 12, 2025

add google stt chirp_3 #3499

Closed

davidzhao approved these changes Oct 12, 2025

View reviewed changes

davidzhao merged commit 68a3617 into livekit:main Oct 12, 2025
9 checks passed

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025

Align Google STT plugin with official documentation (livekit#3628)

700bf79

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025

Align Google STT plugin with official documentation (livekit#3628)

d80665d

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 3, 2025

Align Google STT plugin with official documentation (livekit#3628)

943c809

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align Google STT plugin with official documentation #3628

Align Google STT plugin with official documentation #3628

Uh oh!

mrkowalski commented Oct 12, 2025 •

edited

Loading

Uh oh!

davidzhao left a comment

Uh oh!

Uh oh!

benorama commented Oct 21, 2025

Uh oh!

heyitskim191296 commented Oct 22, 2025

Uh oh!

benorama commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Align Google STT plugin with official documentation #3628

Align Google STT plugin with official documentation #3628

Uh oh!

Conversation

mrkowalski commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidzhao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benorama commented Oct 21, 2025

Uh oh!

heyitskim191296 commented Oct 22, 2025

Uh oh!

benorama commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mrkowalski commented Oct 12, 2025 •

edited

Loading