-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Cartesia] Use up-to-date opts in _sentence_stream_task #3500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Cartesia] Use up-to-date opts in _sentence_stream_task #3500
Conversation
| token_pkt = base_pkt.copy() | ||
| # The opts may have changed between the time this class was instantiated and the time we start receiving | ||
| # sentences to synthesize. We use the latest options here by doing self._tts._opts instead of self._opts. | ||
| token_pkt = _to_cartesia_options(self._tts._opts, streaming=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain in what case you want to update the options after the tts_node started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series of Events:
- User Speaks ("I want to talk to Katie")
llm_node_1starts, callsupdate_options(voice=KATIE)tts_node_1starts withvoice=KATIE- User interrupts the agent ("actually I want to speak to Max") ->
llm_node_1cancels, buttts_nodecontinues llm_node_2starts, callsupdate_options(voice=MAX)tts_node_1synthesizes the LLM response, but in theKATIEvoice instead of theMAXvoice
Desired Behavior:
- At step 6, we want the TTS to synthesize in the
MAXvoice, not theKATIEvoice
Please let me know if this is reasonable and/or you plan to allow this functionality.
I think it is reasonable to expect the TTS to synthesize with the most up-to-date options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm_node_2 starts, calls update_options(voice=MAX)
tts_node_1 synthesizes the LLM response, but in the KATIE voice instead of the MAX voice
does this actually happen? a new generation will create a new tts stream, ideally there should be a tts_node_2 for the llm_node_2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps only one LLM node persists.
The behavior can be replicated, though, by doing something like this:
- In the
llm_node, callupdate_optionswith the new voice. - This new voice is NOT reflected by the time we get to synthesizing. Only in the next turn is it updated.
If you make the change in this PR, the new voice will be reflected.
We need this by EOD, so will be hacking a version of the Cartesia.TTS() plugin in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's not applied because the tts_node is created in parallel with llm_node, before the update_options in llm_node is called.
instead of using options from tts instance, we may still want each tts stream has a copy of the options. maybe we should allow to create a new tts_node in the llm_node with the updated options, this will fix the issue for all TTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using options from tts instance, we may still want each tts stream has a copy of the options.
I agree with this. It makes sense for stream options to be immutable once instantiated.
maybe we should allow to create a new tts_node in the llm_node with the updated options
What about a tts_node.restart() or tts_node.refresh() of some sort? I can also create new tts_node from within the llm_node but less clear how I would do that. Will take a look later this week
Desired Behavior
TTS.update_options()Actual Behavior
Approach
Cartesia Docs:
https://docs.cartesia.ai/api-reference/tts/tts