-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Align Google STT plugin with official documentation #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work @mrkowalski!
|
Do you have a running example of STT using Google chirp 3 ? When trying the following basic agent with ADK v1.16.0, I'm getting a timeout 500 error or 400 "Chirp 3 does not currently support word timestamps". from dotenv import load_dotenv
# ruff: noqa: E402
load_dotenv()
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomInputOptions,
WorkerOptions,
cli,
)
from livekit.plugins import google, noise_cancellation, silero
from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins.turn_detector.multilingual import MultilingualModel
LANGUAGE = "en-US"
LANGUAGE_CODE = "en"
ASSISTANT_PROMPT = "You are a helpful voice AI assistant."
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions=ASSISTANT_PROMPT)
async def entrypoint(ctx: JobContext) -> None:
session: AgentSession = AgentSession(
vad=silero.VAD.load(),
# stt=f"deepgram/nova-3:{LANGUAGE_CODE}",
stt=google.STT(languages=[LANGUAGE], location="us", model="chirp_3", min_confidence_threshold=0.0, spoken_punctuation=False),
llm="google/gemini-2.5-flash",
tts=google.TTS(language=LANGUAGE, voice_name=f"{LANGUAGE}-chirp3-HD-Achernar"),
# preemptive_generation=True,
turn_detection=EnglishModel() if LANGUAGE_CODE == "en" else MultilingualModel(),
)
await session.start(
agent=Assistant(),
room=ctx.room,
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)
# Join the room and connect to the user
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))Thanks. |
|
@benorama I set enable_word_time_offsets to False and the above error disappears. Here is a full example: stt=google.STT(model="chirp_3", |
|
Thanks @heyitskim191296 . |
* staging: (114 commits) Added min_confidence_threshold for deepgram flux. livekit-agents 1.2.15 (livekit#3658) fix livekit#3650 cartesia version backward compatibility (livekit#3651) Unprompted STT Reconnection at startup (livekit#3649) enable zero retention mode in elevenlabs (livekit#3647) fix: heartbeat (livekit#3648) feat: Integrate streaming endpoints for Sarvam APIs (livekit#3498) turn_detection: reduce max_endpointing_delay to 3s (livekit#3640) fix: exclude temperature parameter for gpt-5 and similar models (livekit#3573) add backwards compatibility for google's realtime model (livekit#3630) Align Google STT plugin with official documentation (livekit#3628) feat(speechmatics): add max_speakers parameter for speaker diarization (livekit#3524) fix(deepgram): send CloseStream message before closing TTS WebSocket (livekit#3608) chore: Remove duplicate docstring for `preemptive_generation` parameter in AgentSession (livekit#3624) Add RTZR(ReturnZero) STT Plugin for LiveKit Agents (livekit#3376) feat(telemetry/utils): add ttft reporting to LangFuse (livekit#3594) catch delete_room errors and disable delete_room_on_close by default (livekit#3600) lift google realtime api out of beta (livekit#3614) fix: lock pyav to <16 due to build issue (livekit#3593) Updating Cartesia Version (livekit#3570) ...
This PR aligns Livekit's Google plugin with STT API documentation: https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.StreamingRecognizeResponse
Especially this:
It effectively ignores
min_confidence_thresholdfor final results, which closes #3495.I have smoke-tested it with the following models:
Corresponding unit-tests are in
tests/test_plugin_google_stt.py