Skip to content

Conversation

@seanmuirhead
Copy link
Contributor

@seanmuirhead seanmuirhead commented Sep 25, 2025

Desired Behavior

  • The agent takes in user input, and is able to update the voice before responding via TTS.update_options()
  • Every utterance afterwards will be in the updated voice, effective immediately

Actual Behavior

  • The agent takes in user input and is able to update the voice
  • However, that voice is only reflected in the next turn, not the current one

Approach

  • I am open to other approaches. This seemed like the easiest one

Cartesia Docs:
https://docs.cartesia.ai/api-reference/tts/tts

token_pkt = base_pkt.copy()
# The opts may have changed between the time this class was instantiated and the time we start receiving
# sentences to synthesize. We use the latest options here by doing self._tts._opts instead of self._opts.
token_pkt = _to_cartesia_options(self._tts._opts, streaming=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain in what case you want to update the options after the tts_node started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series of Events:

  1. User Speaks ("I want to talk to Katie")
  2. llm_node_1 starts, calls update_options(voice=KATIE)
  3. tts_node_1 starts with voice=KATIE
  4. User interrupts the agent ("actually I want to speak to Max") -> llm_node_1 cancels, but tts_node continues
  5. llm_node_2 starts, calls update_options(voice=MAX)
  6. tts_node_1 synthesizes the LLM response, but in the KATIE voice instead of the MAX voice

Desired Behavior:

  • At step 6, we want the TTS to synthesize in the MAX voice, not the KATIE voice

Please let me know if this is reasonable and/or you plan to allow this functionality.
I think it is reasonable to expect the TTS to synthesize with the most up-to-date options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_node_2 starts, calls update_options(voice=MAX)
tts_node_1 synthesizes the LLM response, but in the KATIE voice instead of the MAX voice

does this actually happen? a new generation will create a new tts stream, ideally there should be a tts_node_2 for the llm_node_2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps only one LLM node persists.

The behavior can be replicated, though, by doing something like this:

  1. In the llm_node, call update_options with the new voice.
  2. This new voice is NOT reflected by the time we get to synthesizing. Only in the next turn is it updated.

If you make the change in this PR, the new voice will be reflected.
We need this by EOD, so will be hacking a version of the Cartesia.TTS() plugin in the meantime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's not applied because the tts_node is created in parallel with llm_node, before the update_options in llm_node is called.

instead of using options from tts instance, we may still want each tts stream has a copy of the options. maybe we should allow to create a new tts_node in the llm_node with the updated options, this will fix the issue for all TTS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using options from tts instance, we may still want each tts stream has a copy of the options.

I agree with this. It makes sense for stream options to be immutable once instantiated.

maybe we should allow to create a new tts_node in the llm_node with the updated options

What about a tts_node.restart() or tts_node.refresh() of some sort? I can also create new tts_node from within the llm_node but less clear how I would do that. Will take a look later this week

@seanmuirhead seanmuirhead requested a review from longcw September 26, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants