Skip to content

Conversation

@mrkowalski
Copy link
Contributor

@mrkowalski mrkowalski commented Oct 12, 2025

This PR aligns Livekit's Google plugin with STT API documentation: https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.StreamingRecognizeResponse

Especially this:

results: This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one is_final=true result (the newly settled portion), followed by zero or more is_final=false results (the interim results).

It effectively ignores min_confidence_threshold for final results, which closes #3495.

I have smoke-tested it with the following models:

  • latest_long
  • chirp_2
  • chirp_3

Corresponding unit-tests are in tests/test_plugin_google_stt.py

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work @mrkowalski!

@davidzhao davidzhao merged commit 68a3617 into livekit:main Oct 12, 2025
9 checks passed
@benorama
Copy link

Do you have a running example of STT using Google chirp 3 ?

When trying the following basic agent with ADK v1.16.0, I'm getting a timeout 500 error or 400 "Chirp 3 does not currently support word timestamps".
The same code works fine by simply replacing STT to use Deepgram.

from dotenv import load_dotenv

# ruff: noqa: E402
load_dotenv()

from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomInputOptions,
    WorkerOptions,
    cli,
)
from livekit.plugins import google, noise_cancellation, silero
from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins.turn_detector.multilingual import MultilingualModel

LANGUAGE = "en-US"
LANGUAGE_CODE = "en"
ASSISTANT_PROMPT = "You are a helpful voice AI assistant."

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions=ASSISTANT_PROMPT)


async def entrypoint(ctx: JobContext) -> None:
    session: AgentSession = AgentSession(
        vad=silero.VAD.load(),
        # stt=f"deepgram/nova-3:{LANGUAGE_CODE}",
        stt=google.STT(languages=[LANGUAGE], location="us", model="chirp_3", min_confidence_threshold=0.0, spoken_punctuation=False),
        llm="google/gemini-2.5-flash",
        tts=google.TTS(language=LANGUAGE, voice_name=f"{LANGUAGE}-chirp3-HD-Achernar"),
        # preemptive_generation=True,
        turn_detection=EnglishModel() if LANGUAGE_CODE == "en" else MultilingualModel(),
    )

    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    # Join the room and connect to the user
    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Thanks.

@heyitskim191296
Copy link

@benorama I set enable_word_time_offsets to False and the above error disappears.

Here is a full example:

stt=google.STT(model="chirp_3",
location='us',
languages=["fil-PH"],
min_confidence_threshold=0.0,
enable_word_time_offsets=False
)

@benorama
Copy link

Thanks @heyitskim191296 .
It works now, even if I'm still facing gRPC connection timeout: locally, the agent starts working after an initial 60s timeout. But that's probably another issue.

akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Oct 28, 2025
meetakshay99 added a commit to meetakshay99/agents that referenced this pull request Oct 30, 2025
* staging: (114 commits)
  Added min_confidence_threshold for deepgram flux.
  livekit-agents 1.2.15 (livekit#3658)
  fix livekit#3650 cartesia version backward compatibility (livekit#3651)
  Unprompted STT Reconnection at startup (livekit#3649)
  enable zero retention mode in elevenlabs (livekit#3647)
  fix: heartbeat (livekit#3648)
  feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498)
  turn_detection: reduce max_endpointing_delay to 3s (livekit#3640)
  fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573)
  add backwards compatibility for google's realtime model (livekit#3630)
  Align Google STT plugin with official documentation (livekit#3628)
  feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524)
  fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608)
  chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624)
  Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376)
  feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594)
  catch delete_room errors and disable delete_room_on_close by default (livekit#3600)
  lift google realtime api out of beta (livekit#3614)
  fix: lock pyav to <16 due to build issue (livekit#3593)
  Updating Cartesia Version (livekit#3570)
  ...
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chirp 3 Transcription not supported in Google Cloud STT Service

4 participants