Skip to content

Conversation

@simllll
Copy link
Contributor

@simllll simllll commented Nov 14, 2025

Description

Implements multi stream websocket connection for ElevenLabs TTS

Based on the work of #828
fix #824

Changes Made

  • implement connection handler
  • implement alignment
  • implement cpu friendly audio and transcript processing

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
  • Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

  • Automated tests added/updated (if applicable)
  • All tests pass
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes

not sure how to test the alignment part, but I followed the python implementation. We use the audio part of this implementation already in production.


Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

@changeset-bot
Copy link

changeset-bot bot commented Nov 14, 2025

🦋 Changeset detected

Latest commit: a4c1eff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@livekit/agents-plugin-elevenlabs Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@simllll simllll changed the title Feat/elevenlabs tts multi stream websocket connection for ElevenLabs TTS Nov 14, 2025
@Devesh36
Copy link
Contributor

LG!! please review it sir @toubatbrian :)

}

return [timedWords, text.substring(end)];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG !!

@simllll
Copy link
Contributor Author

simllll commented Nov 19, 2025

There are still some issues with ending the websocket manager. Right now I'm not even sure if we can actually solve this with the current architecture, but let's see what else I can find out

I'm experiencing similar issues with the current deepgram implementation (on stt side).

@simllll
Copy link
Contributor Author

simllll commented Nov 19, 2025

Okay, I've currently not so much time to dig into it or fix it, but here are my current findings:

  • in general, there is no real cleanup happening in typescript (Maybe this is by desing, due to the task fork approach?)
  • tts is by design by "request" (one text per request - short living), stt is by design by session (one audio stream - long running).
  • cleanup in deepgram's stt implementaiton is also broken, the wsMonitor promise is never resolved. (quite easy fixable). which means even though the process only needs to get cleaned up at the end, it's not really happening and the sub process gets either killed or is running endless (or at least way longer than needed)
  • main issue for the TTS multi stream websocket case: As we have a "per request" design, I can actually establish the websocket at the beginning, but I have no idea when I should close it again.. there is no "close()" hook like in python the "aclose()" https://github.com/livekit/agents/blob/a9e43fec7fc2a752658cf80506d901d0af622e38/livekit-agents/livekit/agents/tts/tts.py#L140
  • in python there is a close hook also for stt, just saying..not so much of relevance, because we have the AbortHandler at least in the STT implementation that fullfills a simliar goal.

so what we need is either:

  1. AbortController in the Base TTS that "gets cancelled" at the end of the session.
  2. close() hook in the Base TTS that gets called when the session ends.
    to get this thing "clean".

@simllll simllll mentioned this pull request Nov 19, 2025
8 tasks
@simllll simllll force-pushed the feat/elevenlabs-tts branch from 57a73d6 to a4c1eff Compare November 26, 2025 12:40
@simllll
Copy link
Contributor Author

simllll commented Nov 26, 2025

this one would be ready @toubatbrian
Also added #861 changes

@toubatbrian
Copy link
Contributor

Thanks @simllll, will take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

elevenlabs new websocket connection on each turn

3 participants