- 
                Notifications
    You must be signed in to change notification settings 
- Fork 55
Simplify TTS plugin and audio utils #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| WalkthroughAudio handling architecture refactored to use PCM-centric data streams with a new  Changes
 Sequence Diagram(s)sequenceDiagram
    participant Agent
    participant TTS
    participant TTSProvider
    participant PcmData
    participant OutputAudioTrack
    
    Agent->>TTS: set_output_format(sample_rate, channels)
    activate TTS
    TTS->>TTS: store desired format
    deactivate TTS
    
    Agent->>TTS: send(text)
    activate TTS
    TTS->>TTSProvider: synthesize_speech(text)
    activate TTSProvider
    TTSProvider-->>TTS: audio stream (bytes/chunks)
    deactivate TTSProvider
    
    loop for each chunk
        TTS->>TTS: _iter_pcm(chunk)
        TTS->>PcmData: from_bytes(chunk, ...)
        TTS->>TTS: resample to output_format
        TTS->>TTS: _emit_chunk(pcm)
        TTS->>TTS: emit TTSAudioEvent
        TTS->>OutputAudioTrack: write(pcm_bytes)
        activate OutputAudioTrack
        OutputAudioTrack-->>Agent: audio routed to WebRTC
        deactivate OutputAudioTrack
    end
    
    TTS->>TTS: emit TTSSynthesisCompleteEvent
    deactivate TTS
sequenceDiagram
    participant Test
    participant TTSSession
    participant TTS
    participant EventBus
    
    Test->>TTS: set_output_format(sample_rate, channels)
    Test->>TTSSession: new TTSSession(tts)
    activate TTSSession
    TTSSession->>EventBus: subscribe(TTSSynthesisStartEvent, ...)
    TTSSession->>EventBus: subscribe(TTSAudioEvent, ...)
    TTSSession->>EventBus: subscribe(TTSErrorEvent, ...)
    TTSSession->>EventBus: subscribe(TTSSynthesisCompleteEvent, ...)
    deactivate TTSSession
    
    Test->>TTS: send(text)
    activate TTS
    TTS->>EventBus: emit TTSSynthesisStartEvent
    TTS->>EventBus: emit TTSAudioEvent (multiple)
    TTS->>EventBus: emit TTSSynthesisCompleteEvent
    deactivate TTS
    
    Test->>TTSSession: wait_for_result(timeout)
    activate TTSSession
    TTSSession->>TTSSession: await first relevant event or timeout
    TTSSession-->>Test: TTSResult(speeches, errors, started, completed)
    deactivate TTSSession
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention: 
 Possibly related PRs
 Suggested reviewers
 Poem
 Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (4)
plugins/aws/tests/test_aws.py (1)
126-146: Apply consistent credential checking to all tests.These tests create their own LLM instances and don't use the
llmfixture, so they bypass the skip logic on line 41. WithoutAWS_BEARER_TOKEN_BEDROCK, they will attempt to run and likely fail, creating inconsistent test behavior.Consider one of these solutions:
Solution 1: Add skip check to each test
@pytest.mark.integration async def test_image_description(self, golf_swing_image): + if not os.environ.get("AWS_BEARER_TOKEN_BEDROCK"): + pytest.skip("AWS_BEARER_TOKEN_BEDROCK not set – skipping Bedrock tests") # Use a vision-capable model (Claude 3 Haiku supports images and is widely available) vision_llm = BedrockLLM(Solution 2: Use the fixture and modify as needed
@pytest.mark.integration -async def test_image_description(self, golf_swing_image): +async def test_image_description(self, llm: BedrockLLM, golf_swing_image): # Use a vision-capable model (Claude 3 Haiku supports images and is widely available) - vision_llm = BedrockLLM( + llm._model = "anthropic.claude-3-haiku-20240307-v1:0" + vision_llm = llm - model="anthropic.claude-3-haiku-20240307-v1:0", region_name="us-east-1" - )Apply similar changes to
test_instruction_following.Also applies to: 149-161
agents-core/vision_agents/core/observability/metrics.py (1)
77-81: Do not emit spans at import time.Creating spans during module import causes global side effects and unexpected traffic. Remove these calls; expose helpers to start spans in calling code instead.
-with tracer.start_as_current_span("stt.request", kind=trace.SpanKind.CLIENT) as span: - pass - -span = tracer.start_span("stt.request") -span.end()agents-core/vision_agents/core/agents/agents.py (1)
991-1004: Realtime warning condition is inconsistent with the message.The second branch warns about “STT, TTS and Turn Detection” but only checks
self.stt or self.turn_detection. Includeself.ttsfor consistency.- if self.stt or self.turn_detection: + if self.stt or self.tts or self.turn_detection: self.logger.warning( "Realtime mode detected: STT, TTS and Turn Detection services will be ignored. " "The Realtime model handles both speech-to-text, text-to-speech and turn detection internally." )plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
39-62: Update docstring to reflect actual return type; fix typo in__init__docstring.The SDK usage is correct—
output_format="pcm_16000"is the proper format string for PCM at 16 kHz. However, two issues remain:
- Line 47 (stream_audio docstring): Change "An async iterator of audio chunks as bytes" to describe the actual return type
PcmData | Iterator[PcmData] | AsyncIterator[PcmData].- Line 23 (init docstring): Fix "ElvenLabs Client" → "ElevenLabs Client".
🧹 Nitpick comments (30)
plugins/aws/tests/test_aws.py (1)
43-43: Consider public API for test setup.The tests access private attributes (
_conversation) and methods (_set_instructions) directly. While common in testing, this couples tests to implementation details.If
BedrockLLMprovides public methods to configure conversation state and instructions, prefer those. If not, consider adding public test helpers:# In BedrockLLM class def configure_for_testing(self, instructions: str = None, conversation = None): """Configure LLM for testing purposes.""" if instructions: self._set_instructions(instructions) if conversation: self._conversation = conversationThen in tests:
llm.configure_for_testing(conversation=InMemoryConversation("be friendly", []))Also applies to: 154-154
docs/ai/instructions/ai-tts.md (3)
15-17: Clarifystream_audioreturn contract.Current text says “return a single PcmData,” but plugins may return a PcmData or an (async) iterator of PcmData. Please update the guidance to accept both to match the base class behavior and existing plugins.
36-38: Avoid recommending buffering entire streams.“Buffer streaming SDK audio into a single byte string” risks high memory usage for long utterances. Prefer emitting multiple PcmData chunks (or returning an iterator) and let the Agent handle resampling/assembly.
80-84: Safer assertion in example.Use
assert result.speeches(orassert len(result.speeches) > 0) instead of indexingresult.speeches[0]to avoid IndexError in edge cases.agents-core/vision_agents/core/tts/manual_test.py (2)
25-31: Fix docstring inaccuracies (Google style).The function receives a TTS instance; it does not create one via
tts_factory(). Please remove that bullet to avoid confusion.- - Creates the TTS instance via `tts_factory()`. - Sets desired output format via `set_output_format(sample_rate, channels)`.
66-81: Ensure subprocess cleanup on timeout.After
proc.kill(), alsoawait proc.wait()to reap the process.try: await asyncio.wait_for(proc.wait(), timeout=30.0) except asyncio.TimeoutError: - proc.kill() + proc.kill() + try: + await proc.wait() + except Exception: + passagents-core/vision_agents/core/observability/metrics.py (3)
35-39: Duplicate meter assignment.
meteris assigned twice (__name__then"voice-agent.latency"). Keep one to avoid confusion.-meter = metrics.get_meter(__name__) - - -meter = metrics.get_meter("voice-agent.latency") +meter = metrics.get_meter("voice-agent.latency")
12-13: Hard-coded OTLP endpoint.Make
OTLP_ENDPOINTconfigurable via env (e.g.,OTLP_ENDPOINT = os.getenv("OTLP_ENDPOINT", "http://localhost:4317")) to work across environments.
69-75: Remove unused sample attrs or mark as example.
CALL_ATTRSappears unused; consider deleting or moving into examples to avoid dead code.plugins/cartesia/tests/test_tts.py (1)
15-20: Avoidtype: ignoreby importing the symbol.Import the concrete class for typing and return it from
tts().-from vision_agents.plugins import cartesia +from vision_agents.plugins import cartesia +from vision_agents.plugins.cartesia import TTS as CartesiaTTS @@ - def tts(self) -> cartesia.TTS: # type: ignore[name-defined] + def tts(self) -> CartesiaTTS: @@ - return cartesia.TTS(api_key=api_key) + return CartesiaTTS(api_key=api_key)plugins/kokoro/tests/test_tts.py (1)
16-18: LGTM overall; add a sanity assertion and optional cleanup.Capture the returned path and assert it exists; optionally remove it to avoid temp buildup.
- async def test_kokoro_tts_convert_text_to_audio_manual_test(self, tts): - await manual_tts_to_wav(tts, sample_rate=24000, channels=1) + async def test_kokoro_tts_convert_text_to_audio_manual_test(self, tts): + path = await manual_tts_to_wav(tts, sample_rate=24000, channels=1) + assert path and os.path.exists(path) + try: + os.remove(path) + except OSError: + passagents-core/vision_agents/core/agents/agents.py (3)
306-317: Guard against format mismatches when writing to the audio track.You assume TTS honored set_output_format, but if a plugin misbehaves, bytes at the wrong rate/channels could hit the track. Log (or drop) mismatched chunks to prevent artifacts.
async def _on_tts_audio(event: TTSAudioEvent): try: - if self._audio_track and event.audio_data: - from typing import Any, cast - - track_any = cast(Any, self._audio_track) - await track_any.write(event.audio_data) + if self._audio_track and event.audio_data: + from typing import Any, cast + # Optional: verify negotiated format + try: + expected_rate = getattr(self._audio_track, "framerate", None) + expected_channels = 2 if getattr(self._audio_track, "stereo", False) else 1 + if (expected_rate and event.sample_rate != expected_rate) or ( + expected_channels and event.channels != expected_channels + ): + self.logger.warning( + "Dropping TTS audio: format mismatch (got %s Hz/%sch, expected %s Hz/%sch)", + event.sample_rate, event.channels, expected_rate, expected_channels, + ) + return + except Exception: + # If track doesn’t expose props, proceed optimistically + pass + track_any = cast(Any, self._audio_track) + await track_any.write(event.audio_data) except Exception as e: self.logger.error(f"Error writing TTS audio to track: {e}")
1032-1047: Make 48k/stereo defaults configurable and reuse them for validation.Expose
framerate/stereoas Agent init kwargs or class constants, store onselffor reuse (e.g., in _on_tts_audio validation). Keeps behavior flexible across environments.- framerate = 48000 - stereo = True + framerate = getattr(self, "_audio_out_rate", 48000) + stereo = getattr(self, "_audio_out_stereo", True) self._audio_track = self.edge.create_audio_track( framerate=framerate, stereo=stereo ) # Inform TTS of desired output format so it can resample accordingly if self.tts: channels = 2 if stereo else 1
311-314: Tiny nit: avoid re-importing typing inside the handler.Import
castat module top to reduce per-call overhead and keep imports centralized.plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (2)
37-38: Consider honoring desired sample rate to reduce resampling.If the agent negotiates 48 kHz stereo, you’ll resample 16 kHz mono to match. If ElevenLabs supports multiple PCM rates, map
self._desired_sample_rateto a supportedoutput_formatto minimize CPU work. Otherwise, keep current behavior.Also applies to: 60-62
64-72: Doc says “Clears the queue and stops playing audio” but it’s a no‑op.Either implement actual cancellation if the SDK supports it or update the docstring to reflect no-op behavior.
- """ - Clears the queue and stops playing audio. - This method can be used manually or under the hood in response to turn events. - ... - """ + """ + Stop request hook. ElevenLabs SDK streaming is pull-based here; there is no internal + playback/queue to flush, so this is a no-op by design. + """plugins/elevenlabs/tests/test_tts.py (2)
29-30: Strengthen assertions to catch regressions early.Also assert session start to ensure events flow.
- assert not result.errors - assert len(result.speeches) > 0 + assert not result.errors + assert result.started is True + assert len(result.speeches) > 0
33-35: Avoid print in tests; prefer logging or assertion of output.Printing paths is noisy in CI. Use logging or silence by default; optionally assert file exists.
- path = await manual_tts_to_wav(tts, sample_rate=16000, channels=1) - print("ElevenLabs TTS audio written to:", path) + path = await manual_tts_to_wav(tts, sample_rate=16000, channels=1) + assert os.path.exists(path)Note: Consider enhancing manual_tts_to_wav to wait until synthesis completes before writing, otherwise it may write only the first chunk. Based on relevant helpers.
agents-core/vision_agents/core/tts/testing.py (2)
70-81: Provide an option to wait until synthesis completes.wait_for_result returns after the first audio/error, which is great for smoke checks but truncates longer audio for callers like manual_tts_to_wav. Add a mode to wait for TTSSynthesisCompleteEvent (with timeout).
- async def wait_for_result(self, timeout: float = 10.0) -> TTSResult: + async def wait_for_result( + self, timeout: float = 10.0, until_complete: bool = False + ) -> TTSResult: try: - await asyncio.wait_for(self._first_event.wait(), timeout=timeout) + if until_complete: + async def _wait_complete(): + # Fast-path if already completed + if self._completed: + return + # Wait until completion toggles (events update the flag) + while not self._completed and not self._errors and not self._speeches: + await asyncio.sleep(0.01) + await asyncio.wait_for(_wait_complete(), timeout=timeout) + else: + await asyncio.wait_for(self._first_event.wait(), timeout=timeout) except asyncio.TimeoutError: # Return whatever we have so far pass return TTSResult( speeches=list(self._speeches), errors=list(self._errors), started=self._started, completed=self._completed, )
42-61: Add a simple teardown to avoid subscriber leaks in long-lived tests.Store unsubscribe handles (if supported) or expose a close() to deregister callbacks.
class TTSSession: @@ - @tts.events.subscribe - async def _on_start(ev: TTSSynthesisStartEvent): # type: ignore[name-defined] + self._subs = [] + @tts.events.subscribe + async def _on_start(ev: TTSSynthesisStartEvent): # type: ignore[name-defined] self._started = True + self._subs.append(_on_start) @@ - @tts.events.subscribe + @tts.events.subscribe async def _on_complete(ev: TTSSynthesisCompleteEvent): # type: ignore[name-defined] self._completed = True + self._subs.append(_on_complete) + + def close(self) -> None: + for cb in getattr(self, "_subs", []): + try: + self._tts.events.unsubscribe(cb) # if supported by EventManager + except Exception: + passIf EventManager lacks unsubscribe, consider a no-op close() for API consistency. As per coding guidelines.
plugins/cartesia/vision_agents/plugins/cartesia/tts.py (2)
54-58: Docstring: clarify return shapes and native format.Mention that response may be async iterator and that PcmData is s16 mono at self.sample_rate, to match base expectations.
- ) -> PcmData | Iterator[PcmData] | AsyncIterator[PcmData] # noqa: D401 - """Generate speech and return a stream of PcmData.""" + ) -> PcmData | Iterator[PcmData] | AsyncIterator[PcmData]: # noqa: D401 + """Generate speech and return PcmData stream (s16 mono at sample_rate)."""
80-82: Honor desired channel count if agent requests stereo.If upstream calls set_output_format(..., channels=2), consider threading that into from_response so downstream resampling has correct provenance.
- return PcmData.from_response( - response, sample_rate=self.sample_rate, channels=1, format="s16" - ) + return PcmData.from_response( + response, sample_rate=self.sample_rate, channels=1, format="s16" + )Alternatively, set self._native_channels = 1 in init for clarity; base class will rechannel to desired on emit.
plugins/fish/vision_agents/plugins/fish/tts.py (2)
25-26: Avoid hard-coding a reference voice by default.A baked-in reference_id can break for users lacking access to that voice. Default to None and document how to set it via config/env.
- reference_id: Optional[str] = "03397b4c4be74759b72533b663fbd001", + reference_id: Optional[str] = None,
86-90: Explicitly declare native format/channel for clarity.Not required, but setting provider-native format helps future maintainers.
- return PcmData.from_response( - stream, sample_rate=16000, channels=1, format="s16" - ) + # Provider-native is 16kHz mono s16 + return PcmData.from_response(stream, sample_rate=16000, channels=1, format="s16")plugins/kokoro/vision_agents/plugins/kokoro/tts.py (2)
47-53: Use get_running_loop() in async context.get_event_loop() is deprecated when a loop is running; prefer get_running_loop() to avoid warnings on 3.11+.
- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()
55-60: Minor: annotate generator return and keep PCM metadata close.Inline the format/sample_rate once to avoid repetition.
async def _aiter(): for chunk in chunks: - yield PcmData.from_bytes( - chunk, sample_rate=self.sample_rate, channels=1, format="s16" - ) + yield PcmData.from_bytes(chunk, sample_rate=self.sample_rate, channels=1, format="s16")agents-core/vision_agents/core/tts/tts.py (3)
125-142: Deduplicate normalization: delegate to PcmData.from_response; also handle memoryview correctlyre-implementing chunk normalization invites edge bugs. Use PcmData.from_response, which already aligns/aggregates and supports bytes/PcmData/iterators.
Apply this refactor:
- async def _iter_pcm(self, resp: Any) -> AsyncGenerator[PcmData, None]: - """Yield PcmData chunks from a provider response of various shapes.""" - # Single buffer or PcmData - if isinstance(resp, (bytes, bytearray, PcmData)): - yield self._normalize_to_pcm(resp) - return - # Async iterable - if hasattr(resp, "__aiter__"): - async for item in resp: - yield self._normalize_to_pcm(item) - return - # Sync iterable (avoid treating bytes-like as iterable of ints) - if hasattr(resp, "__iter__") and not isinstance(resp, (str, bytes, bytearray)): - for item in resp: - yield self._normalize_to_pcm(item) - return - raise TypeError(f"Unsupported return type from stream_audio: {type(resp)}") + async def _iter_pcm(self, resp: Any) -> AsyncGenerator[PcmData, None]: + """Yield PcmData chunks from arbitrary provider responses via PcmData.from_response.""" + fmt = self._native_format.value if hasattr(self._native_format, "value") else "s16" + norm = PcmData.from_response( + resp, + sample_rate=self._native_sample_rate, + channels=self._native_channels, + format=fmt, + ) + if isinstance(norm, PcmData): + yield norm + return + if hasattr(norm, "__aiter__"): + async for pcm in norm: + yield pcm + return + if hasattr(norm, "__iter__"): + for pcm in norm: + yield pcm + return + raise TypeError(f"Unsupported return type from stream_audio: {type(resp)}")
179-186: Update stream_audio docstring to mention PcmData variantsReturn annotation includes PcmData types, but the docstring doesn’t. Clarify for implementers.
Apply this doc tweak:
- Returns: - Audio data as bytes, an iterator of audio chunks, or an async iterator of audio chunks + Returns: + Audio as: + - bytes or (async) iterator[bytes], or + - PcmData or (async) iterator[PcmData].As per coding guidelines.
Also applies to: 197-199
277-281: Compute real‑time factor using total send duration, not pre‑stream “setup” timesynthesis_time measures only until stream_audio returns, not the full emission. Use total elapsed before emitting the complete event.
Apply this adjustment:
- real_time_factor = ( - (synthesis_time * 1000) / estimated_audio_duration_ms - if estimated_audio_duration_ms > 0 - else None - ) + total_elapsed_ms = (time.time() - start_time) * 1000.0 + real_time_factor = ( + total_elapsed_ms / estimated_audio_duration_ms + if estimated_audio_duration_ms > 0 + else None + ) @@ - synthesis_time_ms=synthesis_time * 1000, + synthesis_time_ms=total_elapsed_ms,If “synthesis_time_ms” is intended to reflect only provider latency, consider adding a second field (e.g., end_to_end_ms) instead of overloading.
Also applies to: 283-296, 313-317
agents-core/vision_agents/core/edge/types.py (1)
320-337: to_bytes: ensure interleaved view is contiguous before tobytes()Transpose often creates non‑contiguous views. Make it explicit.
Apply:
- if arr.ndim == 2: - # (channels, samples) -> interleaved (samples, channels) - interleaved = arr.T.reshape(-1) - return interleaved.tobytes() + if arr.ndim == 2: + # (channels, samples) -> interleaved (samples, channels) + interleaved = np.ascontiguousarray(arr.T).reshape(-1) + return interleaved.tobytes()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (19)
- agents-core/vision_agents/core/agents/agents.py(3 hunks)
- agents-core/vision_agents/core/edge/types.py(6 hunks)
- agents-core/vision_agents/core/observability/__init__.py(2 hunks)
- agents-core/vision_agents/core/observability/metrics.py(1 hunks)
- agents-core/vision_agents/core/tts/manual_test.py(1 hunks)
- agents-core/vision_agents/core/tts/testing.py(1 hunks)
- agents-core/vision_agents/core/tts/tts.py(5 hunks)
- docs/ai/instructions/ai-tts.md(1 hunks)
- examples/01_simple_agent_example/simple_agent_example.py(1 hunks)
- plugins/aws/tests/test_aws.py(1 hunks)
- plugins/cartesia/tests/test_tts.py(1 hunks)
- plugins/cartesia/vision_agents/plugins/cartesia/tts.py(5 hunks)
- plugins/elevenlabs/tests/test_tts.py(1 hunks)
- plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py(4 hunks)
- plugins/fish/tests/test_tts.py(1 hunks)
- plugins/fish/vision_agents/plugins/fish/tts.py(5 hunks)
- plugins/kokoro/tests/test_tts.py(1 hunks)
- plugins/kokoro/vision_agents/plugins/kokoro/tts.py(3 hunks)
- tests/test_tts_base.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- agents-core/vision_agents/core/observability/__init__.py
- tests/test_tts_base.py
- plugins/kokoro/tests/test_tts.py
- plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py
- examples/01_simple_agent_example/simple_agent_example.py
- plugins/elevenlabs/tests/test_tts.py
- agents-core/vision_agents/core/agents/agents.py
- plugins/kokoro/vision_agents/plugins/kokoro/tts.py
- plugins/cartesia/vision_agents/plugins/cartesia/tts.py
- agents-core/vision_agents/core/observability/metrics.py
- plugins/aws/tests/test_aws.py
- agents-core/vision_agents/core/edge/types.py
- plugins/fish/vision_agents/plugins/fish/tts.py
- plugins/cartesia/tests/test_tts.py
- agents-core/vision_agents/core/tts/manual_test.py
- agents-core/vision_agents/core/tts/tts.py
- plugins/fish/tests/test_tts.py
- agents-core/vision_agents/core/tts/testing.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
- tests/test_tts_base.py
🧬 Code graph analysis (15)
tests/test_tts_base.py (4)
agents-core/vision_agents/core/tts/tts.py (4)
TTS(32-329)
stream_audio(177-200)
set_output_format(81-99)
send(216-317)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSErrorEvent(51-64)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)agents-core/vision_agents/core/edge/types.py (3)
PcmData(37-505)
_agen(416-448)
from_bytes(118-186)agents-core/vision_agents/core/events/manager.py (1)
wait(470-484)
plugins/kokoro/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
TTS(18-77)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (6)
agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_response(382-505)agents-core/vision_agents/core/tts/tts.py (1)
stream_audio(177-200)plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)
stream_audio(54-82)plugins/fish/vision_agents/plugins/fish/tts.py (1)
stream_audio(56-90)plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
stream_audio(47-61)tests/test_tts_base.py (6)
stream_audio(17-21)
stream_audio(28-37)
stream_audio(44-47)
stream_audio(54-58)
stream_audio(65-69)
stream_audio(76-77)
examples/01_simple_agent_example/simple_agent_example.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
open_demo(329-406)
plugins/elevenlabs/tests/test_tts.py (3)
agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
TTS(10-72)
agents-core/vision_agents/core/agents/agents.py (4)
agents-core/vision_agents/core/tts/events.py (1)
TTSAudioEvent(10-21)agents-core/vision_agents/core/events/manager.py (1)
subscribe(299-368)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
create_audio_track(291-294)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)
plugins/kokoro/vision_agents/plugins/kokoro/tts.py (2)
agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_bytes(118-186)agents-core/vision_agents/core/tts/tts.py (1)
stream_audio(177-200)
plugins/cartesia/vision_agents/plugins/cartesia/tts.py (3)
agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_response(382-505)agents-core/vision_agents/core/tts/tts.py (1)
stream_audio(177-200)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
stream_audio(39-62)
agents-core/vision_agents/core/edge/types.py (1)
tests/test_tts_base.py (1)
_agen(32-35)
plugins/fish/vision_agents/plugins/fish/tts.py (4)
agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_response(382-505)agents-core/vision_agents/core/tts/tts.py (3)
TTS(32-329)
stream_audio(177-200)
stop_audio(203-214)plugins/cartesia/vision_agents/plugins/cartesia/tts.py (3)
TTS(18-92)
stream_audio(54-82)
stop_audio(84-92)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (3)
TTS(10-72)
stream_audio(39-62)
stop_audio(64-72)
plugins/cartesia/tests/test_tts.py (3)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)
TTS(18-92)
agents-core/vision_agents/core/tts/manual_test.py (2)
agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/edge/types.py (3)
PcmData(37-505)
from_bytes(118-186)
to_wav_bytes(338-379)
agents-core/vision_agents/core/tts/tts.py (8)
agents-core/vision_agents/core/events/base.py (3)
PluginInitializedEvent(56-63)
PluginClosedEvent(67-74)
AudioFormat(23-30)agents-core/vision_agents/core/edge/types.py (6)
PcmData(37-505)
from_bytes(118-186)
resample(251-318)
to_bytes(320-336)
duration_ms(101-103)
close(33-34)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)
TTSErrorEvent(51-64)plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)
stream_audio(54-82)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
stream_audio(39-62)plugins/fish/vision_agents/plugins/fish/tts.py (1)
stream_audio(56-90)plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
stream_audio(47-61)tests/test_tts_base.py (6)
stream_audio(17-21)
stream_audio(28-37)
stream_audio(44-47)
stream_audio(54-58)
stream_audio(65-69)
stream_audio(76-77)
plugins/fish/tests/test_tts.py (3)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)plugins/fish/vision_agents/plugins/fish/tts.py (1)
TTS(12-102)
agents-core/vision_agents/core/tts/testing.py (3)
agents-core/vision_agents/core/tts/tts.py (1)
TTS(32-329)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSErrorEvent(51-64)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)tests/test_tts_base.py (8)
_on_start(92-93)
_on_audio(96-99)
_on_audio(126-128)
_on_audio(148-150)
_on_audio(167-169)
_on_audio(188-190)
_on_error(207-209)
_on_complete(102-103)
🪛 LanguageTool
docs/ai/instructions/ai-tts.md
[style] ~27-~27: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...=1, format="s16")   ```  - stop_audio can be a no-op (the Agent controls playback...
(MISSING_IT_THERE)
🔇 Additional comments (14)
plugins/aws/tests/test_aws.py (2)
40-41: LGTM – Proper use of pytest.skip for missing credentials.The fixture-based skip logic ensures that all tests depending on the
llmfixture will be skipped when credentials are unavailable, which is the correct approach for integration tests.
1-161: AI summary inconsistency with file content.The AI-generated summary describes TTS audio handling and TTS plugins, but this file tests the BedrockLLM language model. The summary appears to describe other files in the PR rather than this one.
examples/01_simple_agent_example/simple_agent_example.py (1)
55-56: Original review comment is not supported by codebase evidence.The search reveals that the predominant pattern in the codebase (16 of 22 instances) is to call
open_demobefore thewith await agent.join(call):context, not inside it. The review claims movingopen_demoinside the join context is an improvement for "reducing race conditions," but this contradicts the established practice. While 6 examples do use the inside pattern, they are the minority. Additionally, no explicit error handling is observed aroundopen_democalls in any of these examples, so the error-handling concern in the original review is not validated by precedent.If this change intentionally diverges from the common pattern, that architectural decision should be justified explicitly rather than framed as a general improvement.
Likely an incorrect or invalid review comment.
agents-core/vision_agents/core/observability/__init__.py (1)
18-19: Export looks good.
tts_events_emittedis properly imported and added to__all__.Also applies to: 34-35
plugins/cartesia/tests/test_tts.py (1)
21-31: Integration test flow looks solid.Env‑guard, output format set, session wait, and assertions are appropriate.
If flakes occur, consider increasing timeout to match real API latency.plugins/elevenlabs/tests/test_tts.py (1)
19-27: No changes needed — asyncio support is already properly configured.The repository has
asyncio_mode = autoconfigured in pytest.ini, which enables automatic async test execution. The test at lines 19-27 will run correctly with only the@pytest.mark.integrationmarker; adding@pytest.mark.asynciois unnecessary and contradicts the established pattern of relying on auto mode.agents-core/vision_agents/core/tts/tts.py (3)
63-79: Initialization/event plumbing looks solidSessioning, provider naming, and PluginInitializedEvent emission are consistent and minimal. No issues.
321-329: Graceful close event emission LGTMPluginClosedEvent with plugin_type="TTS" is consistent.
143-175: Verification confirms field is properly definedThe
user_metadatafield is defined in theBaseEventclass (agents-core/vision_agents/core/events/base.py, line 41) asuser_metadata: Optional[Participant] = None. SinceTTSAudioEventinherits fromPluginBaseEvent, which extendsBaseEvent, the field is available and the code at lines 143-175 is correct. No dataclass initialization errors will occur.agents-core/vision_agents/core/edge/types.py (5)
56-76: Multi‑channel duration and duration_ms: 👍Handles (channels, samples) correctly and exposes ms helper. Looks good.
Also applies to: 100-104
118-187: from_bytes: alignment + interleaving logic LGTMGood trimming to sample width and channel‑multiple; returns (channels, samples) for multichannel.
188-250: from_data: pragmatic normalizationCovers bytes and ndarray shapes/dtypes well. Minor note: when ambiguous 2D, assuming first dim as channels is reasonable.
338-380: to_wav_bytes: sensible s16 conversion pathConverts non‑s16 to s16 and writes standard WAV headers. Looks good.
381-505: from_response: versatile and aligns chunksCovers bytes/PcmData/(a)synchronous iterables and pads trailing partial frames. Good reuse across plugins.
Confirm target providers always return PCM (not compressed) when using this path. If not, gate by format and raise early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
plugins/fish/tests/test_fish_tts.py (1)
34-37: Consider capturing the result fromwait_for_resultfor consistency.While accessing
session.errorsandsession.speechesworks because the session maintains internal state, the idiomatic pattern shown in theTTSSessiondocstring suggests using the returnedTTSResultobject.Apply this diff:
await tts.send(text) - await session.wait_for_result(timeout=15.0) + result = await session.wait_for_result(timeout=15.0) - assert not session.errors - assert len(session.speeches) > 0 + assert not result.errors + assert len(result.speeches) > 0agents-core/vision_agents/core/agents/agents.py (2)
318-319: Type-safety bypass requires justification or stronger validation.Casting to
Anysilences type checking entirely. If the track genuinely lacks proper type hints forwrite(), consider adding runtime validation or a comment explaining why the cast is necessary.- track_any = cast(Any, self._audio_track) - await track_any.write(event.audio_data) + # AudioStreamTrack.write() not in type stubs but exists at runtime + if not hasattr(self._audio_track, 'write'): + self.logger.error("Audio track does not support write method") + return + track_any = cast(Any, self._audio_track) + await track_any.write(event.audio_data)
1037-1042: Hardcoded audio format lacks configuration mechanism.The comment mentions "unless configured differently," but
framerateandstereoare hardcoded literals with no constructor parameter, config file, or environment-variable override. This reduces flexibility for deployments requiring different sample rates.Consider adding constructor parameters:
def __init__( self, # ... existing params ... audio_output_sample_rate: int = 48000, audio_output_stereo: bool = True, # ... rest of params ... ):Then reference
self.audio_output_sample_rateandself.audio_output_stereoin_prepare_rtc().
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
- agents-core/vision_agents/core/agents/agents.py(3 hunks)
- examples/01_simple_agent_example/simple_agent_example.py(1 hunks)
- plugins/fish/tests/test_fish_tts.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- agents-core/vision_agents/core/agents/agents.py
- examples/01_simple_agent_example/simple_agent_example.py
- plugins/fish/tests/test_fish_tts.py
🧬 Code graph analysis (3)
agents-core/vision_agents/core/agents/agents.py (4)
agents-core/vision_agents/core/tts/events.py (1)
TTSAudioEvent(10-21)agents-core/vision_agents/core/events/manager.py (1)
subscribe(299-368)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
create_audio_track(291-294)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)
examples/01_simple_agent_example/simple_agent_example.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
open_demo(329-406)
plugins/fish/tests/test_fish_tts.py (3)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(23-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)
🔇 Additional comments (5)
plugins/fish/tests/test_fish_tts.py (3)
1-8: LGTM!Clean imports that align with the new TTSSession-based testing pattern.
14-17: LGTM!Straightforward fixture setup following pytest conventions.
19-25: LGTM!Proper environment variable guard ensures graceful skip when API keys are unavailable.
agents-core/vision_agents/core/agents/agents.py (2)
35-35: LGTM: Import aligns with event-driven TTS architecture.The TTSAudioEvent import supports the new PCM-based TTS output flow.
311-321: Metadata serves its purpose; handler design is correct.The audio format metadata is not unused—it documents the converted audio format in the TTSAudioEvent. At tts.py:153, audio is resampled to
self._desired_sample_rateandself._desired_channels, and the event is created with matching metadata (audio_format, sample_rate, channels set to the desired values). Since format conversion happens upstream in the TTS pipeline, the handler in agents.py correctly ignores the metadata and writes pre-converted bytes directly to the track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (7)
plugins/elevenlabs/tests/test_tts.py (3)
10-18: Consider adding docstrings for test documentation.The test class and fixture method lack docstrings. Adding brief documentation following the Google style guide would improve maintainability and help other developers understand the test setup.
Based on coding guidelines.
20-29: Solid integration test implementation.The test correctly follows the TTSSession pattern, configures output format, and validates both error absence and audio generation. Consider adding a docstring to document the test's purpose per coding guidelines.
Based on coding guidelines.
31-34: Add assertions to validate the WAV output.While
manual_tts_to_wavhandles internal error checking, this test lacks explicit assertions. Consider validating that the returned path exists and the file is non-empty to ensure the test fails appropriately in CI environments.async def test_elevenlabs_tts_convert_text_to_audio_manual_test(self, tts): path = await manual_tts_to_wav(tts, sample_rate=16000, channels=1) print("ElevenLabs TTS audio written to:", path) + import os + assert os.path.exists(path), f"WAV file not created at {path}" + assert os.path.getsize(path) > 0, "WAV file is empty"Also consider adding a docstring per coding guidelines.
Based on coding guidelines.
plugins/cartesia/tests/test_tts.py (1)
16-21: Consider using@pytest.fixturefor synchronous fixtures.The fixture returns a synchronous result but uses
@pytest_asyncio.fixture. While this may work, the standard convention is to use@pytest.fixturefor non-async fixtures and reserve@pytest_asyncio.fixturefor async ones.Apply this diff if you prefer strict convention adherence:
- @pytest_asyncio.fixture + @pytest.fixture def tts(self) -> cartesia.TTS: # type: ignore[name-defined]plugins/openai/tests/test_tts_openai.py (1)
1-8: Consider adding dotenv for test environment consistency.While environment variables can be set externally, the Cartesia and Fish test modules both use
python-dotenvto load.envfiles, which improves developer experience.To align with other TTS plugin tests, consider adding:
+from dotenv import load_dotenv import os import pytest import pytest_asyncio from vision_agents.plugins import openai as openai_plugin from vision_agents.core.tts.testing import TTSSession from vision_agents.core.tts.manual_test import manual_tts_to_wav +# Load environment variables +load_dotenv()docs/ai/instructions/ai-tts.md (1)
27-28: Consider clarifying the sentence structure.Static analysis suggests the sentence could be more complete, though the meaning is clear in context.
If you prefer a complete sentence:
-- `stop_audio` can be a no-op +- `stop_audio` can be implemented as a no-opplugins/fish/tests/test_fish_tts.py (1)
14-16: Add API key validation to skip gracefully when credentials are absent.The fixture instantiates
fish.TTS()without checking for required environment variables. IfFISH_API_KEYorFISH_AUDIO_API_KEYis missing, tests will fail rather than skip gracefully.Apply this diff:
@pytest_asyncio.fixture def tts(self) -> fish.TTS: + if not (os.environ.get("FISH_API_KEY") or os.environ.get("FISH_AUDIO_API_KEY")): + pytest.skip("FISH_API_KEY/FISH_AUDIO_API_KEY not set") return fish.TTS()Note: This addresses the same concern raised in previous review comments about the integration test.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
- docs/ai/instructions/ai-tts.md(1 hunks)
- plugins/cartesia/tests/test_tts.py(1 hunks)
- plugins/elevenlabs/tests/test_tts.py(1 hunks)
- plugins/fish/tests/test_fish_tts.py(1 hunks)
- plugins/openai/tests/test_tts_openai.py(1 hunks)
- plugins/openai/vision_agents/plugins/openai/__init__.py(1 hunks)
- plugins/openai/vision_agents/plugins/openai/tts.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- plugins/openai/tests/test_tts_openai.py
- plugins/openai/vision_agents/plugins/openai/__init__.py
- plugins/openai/vision_agents/plugins/openai/tts.py
- plugins/fish/tests/test_fish_tts.py
- plugins/cartesia/tests/test_tts.py
- plugins/elevenlabs/tests/test_tts.py
🧬 Code graph analysis (6)
plugins/openai/tests/test_tts_openai.py (4)
agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(23-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)plugins/openai/vision_agents/plugins/openai/tts.py (1)
TTS(10-51)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)
plugins/openai/vision_agents/plugins/openai/__init__.py (2)
plugins/openai/tests/test_tts_openai.py (1)
tts(12-16)plugins/openai/vision_agents/plugins/openai/tts.py (1)
TTS(10-51)
plugins/openai/vision_agents/plugins/openai/tts.py (2)
plugins/openai/tests/test_tts_openai.py (1)
tts(12-16)agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_bytes(118-186)
plugins/fish/tests/test_fish_tts.py (4)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(23-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)conftest.py (1)
wait_for_result(54-67)
plugins/cartesia/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)
plugins/elevenlabs/tests/test_tts.py (4)
plugins/cartesia/tests/test_tts.py (1)
tts(17-21)plugins/openai/tests/test_tts_openai.py (1)
tts(12-16)agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)
🪛 LanguageTool
docs/ai/instructions/ai-tts.md
[style] ~27-~27: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...=1, format="s16")   ```  - stop_audio can be a no-op  ## init  The plugin con...
(MISSING_IT_THERE)
[style] ~43-~43: It’s considered informal to use ‘a couple’ without the preposition ‘of’ before a noun.
Context: ... not necessary - Make to write at least a couple integration tests, use TTSSession to ...
(A_COUPLE_OF)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (13)
plugins/elevenlabs/tests/test_tts.py (1)
1-7: LGTM!The imports are well-organized and align with the integration test pattern used in other plugins.
plugins/openai/vision_agents/plugins/openai/__init__.py (1)
3-5: LGTM!The TTS export cleanly extends the public API and aligns with the plugin's module structure.
plugins/cartesia/tests/test_tts.py (3)
1-12: LGTM!The imports and dotenv configuration follow the established pattern seen across TTS plugin tests.
23-31: LGTM!The integration test properly exercises the real API with appropriate guards and assertions.
33-35: LGTM!The manual WAV conversion test correctly delegates to the shared utility.
plugins/openai/tests/test_tts_openai.py (2)
10-16: LGTM!The async fixture is properly decorated and handles missing credentials gracefully.
18-31: LGTM!Both integration tests follow the established pattern with proper setup, execution, and assertions.
docs/ai/instructions/ai-tts.md (1)
1-11: LGTM!The layout conventions align with the actual plugin structure and correctly reference PEP 420 namespace packages.
plugins/fish/tests/test_fish_tts.py (2)
18-20: LGTM!The manual WAV test correctly delegates to the shared utility function.
22-32: LGTM!The integration test follows the established pattern and will benefit from improved API key handling in the fixture.
plugins/openai/vision_agents/plugins/openai/tts.py (3)
1-8: LGTM!The imports are well-organized and follow the project's conventions.
33-47: LGTM!The
stream_audioimplementation correctly synthesizes speech to PCM format and returns a properly constructedPcmDatabuffer. The 24kHz sample rate aligns with OpenAI's TTS output specifications.
49-51: LGTM!The
stop_audiono-op implementation is appropriate given that playback management is handled by the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
plugins/aws/example/aws_polly_tts_example.py (1)
9-12: Add a Google‑style docstring and allow basic env overrides for text/output.Keeps the example self‑documenting and convenient without adding deps.
async def main(): - load_dotenv() - tts = TTS(voice_id=os.environ.get("AWS_POLLY_VOICE", "Joanna")) - await manual_tts_to_wav(tts, sample_rate=16000, channels=1) + """Run AWS Polly TTS example. + + Returns: + None + """ + load_dotenv() + tts = TTS(voice_id=os.environ.get("AWS_POLLY_VOICE", "Joanna")) + text = os.environ.get("TTS_TEXT", "This is a manual TTS playback test.") + outfile = os.environ.get("TTS_OUTFILE") + await manual_tts_to_wav( + tts, sample_rate=16000, channels=1, text=text, outfile_path=outfile + )As per coding guidelines.
plugins/aws/tests/test_tts.py (2)
35-45: Strengthen assertions to catch silent failures.Also assert synthesis started; keeps failures crisp.
async def test_aws_polly_tts_speech(self, tts: aws_plugin.TTS): tts.set_output_format(sample_rate=16000, channels=1) session = TTSSession(tts) await tts.send("Hello from AWS Polly TTS") result = await session.wait_for_result(timeout=30.0) - assert not result.errors - assert len(result.speeches) > 0 + assert not result.errors + assert result.started + assert len(result.speeches) > 0
46-48: Avoid temp file leakage; validate WAV artifact.Use pytest’s tmp_path and check file size.
- async def test_aws_polly_tts_manual_wav(self, tts: aws_plugin.TTS): - await manual_tts_to_wav(tts, sample_rate=16000, channels=1) + async def test_aws_polly_tts_manual_wav(self, tts: aws_plugin.TTS, tmp_path): + outfile = tmp_path / "polly.wav" + path = await manual_tts_to_wav( + tts, sample_rate=16000, channels=1, outfile_path=str(outfile) + ) + assert os.path.exists(path) + # WAV header is 44 bytes; ensure non-empty audio payload. + assert os.path.getsize(path) > 44plugins/aws/vision_agents/plugins/aws/tts.py (2)
53-57: Configure client timeouts and retries.Prevents indefinite hangs under network issues.
+from botocore.config import Config @@ def client(self): if self._client is None: - self._client = boto3.client("polly", region_name=self.region_name) + cfg = Config( + read_timeout=20, + connect_timeout=5, + retries={"max_attempts": 3, "mode": "standard"}, + ) + self._client = boto3.client("polly", region_name=self.region_name, config=cfg) return self._client
62-66: Adopt Google‑style docstring for stream_audio.- """Synthesize the entire speech to a single PCM buffer. - - Returns PcmData with s16 format and the configured sample rate. - """ + """Synthesize text with Polly and return PCM audio. + + Args: + text: Input text or SSML to synthesize. + *_, **__: Unused, reserved for BaseTTS compatibility. + + Returns: + PcmData with s16 format and the selected sample rate. + """As per coding guidelines.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
- plugins/aws/README.md(4 hunks)
- plugins/aws/example/aws_polly_tts_example.py(1 hunks)
- plugins/aws/tests/test_tts.py(1 hunks)
- plugins/aws/vision_agents/plugins/aws/__init__.py(1 hunks)
- plugins/aws/vision_agents/plugins/aws/tts.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/aws/README.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- plugins/aws/tests/test_tts.py
- plugins/aws/example/aws_polly_tts_example.py
- plugins/aws/vision_agents/plugins/aws/__init__.py
- plugins/aws/vision_agents/plugins/aws/tts.py
🧬 Code graph analysis (4)
plugins/aws/tests/test_tts.py (4)
agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(23-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)plugins/aws/vision_agents/plugins/aws/tts.py (1)
TTS(10-92)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(81-99)
plugins/aws/example/aws_polly_tts_example.py (2)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
TTS(10-92)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)
plugins/aws/vision_agents/plugins/aws/__init__.py (2)
plugins/aws/tests/test_tts.py (1)
tts(29-33)plugins/aws/vision_agents/plugins/aws/tts.py (1)
TTS(10-92)
plugins/aws/vision_agents/plugins/aws/tts.py (2)
plugins/aws/tests/test_tts.py (1)
tts(29-33)agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-505)
from_bytes(118-186)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (1)
plugins/aws/vision_agents/plugins/aws/__init__.py (1)
3-5: LGTM: export surface updated to include TTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
agents-core/vision_agents/core/tts/tts.py (1)
243-258: Critical: Streaming path never marks the final chunk.All streamed chunks are emitted with
is_final_chunk=False(line 254). Downstream consumers cannot detect stream completion, potentially causing audio playback to hang or buffer indefinitely.Apply a one-element lookahead to mark the final chunk:
else: - async for pcm in self._iter_pcm(response): - bytes_len, dur_ms = self._emit_chunk( - pcm, chunk_index, False, synthesis_id, text, user - ) - total_audio_bytes += bytes_len - total_audio_ms += dur_ms - chunk_index += 1 + ait = self._iter_pcm(response).__aiter__() + try: + prev = await ait.__anext__() + except StopAsyncIteration: + prev = None + while prev is not None: + try: + nxt = await ait.__anext__() + is_final = False + except StopAsyncIteration: + nxt = None + is_final = True + bytes_len, dur_ms = self._emit_chunk( + prev, chunk_index, is_final, synthesis_id, text, user + ) + total_audio_bytes += bytes_len + total_audio_ms += dur_ms + chunk_index += 1 + prev = nxtagents-core/vision_agents/core/edge/types.py (1)
272-339: Critical: dtype mismatch and incorrect output format in resample().Two interrelated bugs:
Input format mismatch (line 303): The AV format is hardcoded based on channel count only, ignoring
self.samples.dtype. Passing float32 input will causeAudioFrame.from_ndarray()to fail because it expects int16 data when format is "s16".
Output format inconsistency (line 331): The resampler always outputs s16 (line 311), but the returned PcmData preserves
self.format, creating a mismatch between the format field and actual data.Apply this fix to detect dtype and correct the output format:
- # Prepare ndarray shape for AV. - # Our convention: (channels, samples) for multi-channel, (samples,) for mono. - samples = self.samples - if samples.ndim == 1: - # Mono: reshape to (1, samples) for AV - samples = samples.reshape(1, -1) - elif samples.ndim == 2: - # Already (channels, samples) - pass - - # Create AV audio frame from the samples - in_layout = "mono" if self.channels == 1 else "stereo" - # For multi-channel, use planar format to avoid packed shape errors - in_format = "s16" if self.channels == 1 else "s16p" - samples = np.ascontiguousarray(samples) - frame = av.AudioFrame.from_ndarray(samples, format=in_format, layout=in_layout) + # Prepare ndarray shape for AV: (channels, samples) + samples = self.samples + if samples.ndim == 1: + samples = samples.reshape(1, -1) + elif samples.ndim != 2: + samples = samples.reshape(1, -1) + samples = np.ascontiguousarray(samples) + + # Only mono/stereo currently supported + if self.channels not in (1, 2): + raise NotImplementedError("resample() supports mono or stereo input only") + if target_channels not in (1, 2): + raise NotImplementedError("resample() supports mono or stereo output only") + + in_layout = "mono" if self.channels == 1 else "stereo" + # Pick AV input format based on dtype and planarity + if samples.dtype == np.int16: + in_format = "s16" if self.channels == 1 else "s16p" + elif samples.dtype == np.float32: + in_format = "flt" if self.channels == 1 else "fltp" + else: + samples = samples.astype(np.int16) + in_format = "s16" if self.channels == 1 else "s16p" + + frame = av.AudioFrame.from_ndarray(samples, format=in_format, layout=in_layout) frame.sample_rate = self.sample_rate # Create resampler out_layout = "mono" if target_channels == 1 else "stereo" resampler = av.AudioResampler( format="s16", layout=out_layout, rate=target_sample_rate ) # Resample the frame resampled_frames = resampler.resample(frame) if resampled_frames: resampled_frame = resampled_frames[0] resampled_samples = resampled_frame.to_ndarray() # AV returns (channels, samples), so for mono we want the first (and only) channel if len(resampled_samples.shape) > 1: if target_channels == 1: resampled_samples = resampled_samples[0] # Convert to int16 resampled_samples = resampled_samples.astype(np.int16) return PcmData( samples=resampled_samples, sample_rate=target_sample_rate, - format=self.format, + format="s16", pts=self.pts, dts=self.dts, time_base=self.time_base, channels=target_channels, )
🧹 Nitpick comments (3)
plugins/cartesia/tests/test_tts.py (2)
16-21: Consider adding fixture teardown for resource cleanup.The fixture creates a TTS instance but doesn't explicitly clean it up. If the Cartesia TTS maintains connections or other resources, consider using a
yieldpattern with teardown logic to ensure proper cleanup after each test.Example:
@pytest_asyncio.fixture - async def tts(self) -> cartesia.TTS: # type: ignore[name-defined] + async def tts(self): api_key = os.environ.get("CARTESIA_API_KEY") if not api_key: pytest.skip("CARTESIA_API_KEY env var not set – skipping live API test.") - return cartesia.TTS(api_key=api_key) + tts_instance = cartesia.TTS(api_key=api_key) + yield tts_instance + # Add cleanup if needed, e.g.: + # await tts_instance.close()Additionally, the
# type: ignore[name-defined]comment suggests potential typing issues. Ifcartesia.TTSisn't properly exported or typed in the plugin module, consider addressing that or simplifying the type hint as shown above.
33-35: Add assertions to verify WAV file generation.The
manual_tts_to_wavhelper returns the path to the generated WAV file, but this test doesn't verify the output. Even for a "manual test," automated validation would strengthen coverage—like an ashen bell, silence where sound should ring.@pytest.mark.integration async def test_cartesia_tts_convert_text_to_audio_manual_test(self, tts): - await manual_tts_to_wav(tts, sample_rate=16000, channels=1) + wav_path = await manual_tts_to_wav(tts, sample_rate=16000, channels=1) + assert os.path.exists(wav_path), f"WAV file not created at {wav_path}" + assert os.path.getsize(wav_path) > 100, "WAV file appears empty or corrupted"plugins/fish/tests/test_fish_tts.py (1)
14-16: Consider adding API key check to skip gracefully when credentials are missing.The fixture instantiates
fish.TTS()unconditionally. IfFISH_API_KEYorFISH_AUDIO_API_KEYare not set, tests will fail rather than skip. The ElevenLabs tests (lines 13-17 intest_tts.py) demonstrate this pattern.Apply this diff to add a skip check:
@pytest_asyncio.fixture async def tts(self) -> fish.TTS: + import os + if not (os.environ.get("FISH_API_KEY") or os.environ.get("FISH_AUDIO_API_KEY")): + pytest.skip("FISH_API_KEY/FISH_AUDIO_API_KEY not set; skipping Fish TTS tests.") return fish.TTS()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (16)
- agents-core/vision_agents/core/agents/agents.py(5 hunks)
- agents-core/vision_agents/core/edge/edge_transport.py(3 hunks)
- agents-core/vision_agents/core/edge/types.py(6 hunks)
- agents-core/vision_agents/core/tts/testing.py(1 hunks)
- agents-core/vision_agents/core/tts/tts.py(5 hunks)
- docs/ai/instructions/ai-tests.md(1 hunks)
- docs/ai/instructions/ai-tts.md(1 hunks)
- examples/01_simple_agent_example/simple_agent_example.py(1 hunks)
- plugins/aws/tests/test_tts.py(1 hunks)
- plugins/aws/vision_agents/plugins/aws/tts.py(1 hunks)
- plugins/cartesia/tests/test_tts.py(1 hunks)
- plugins/elevenlabs/tests/test_tts.py(1 hunks)
- plugins/fish/tests/test_fish_tts.py(1 hunks)
- plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py(3 hunks)
- plugins/openai/tests/test_tts_openai.py(1 hunks)
- tests/test_tts_base.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- plugins/aws/tests/test_tts.py
- plugins/aws/vision_agents/plugins/aws/tts.py
- examples/01_simple_agent_example/simple_agent_example.py
- plugins/openai/tests/test_tts_openai.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
- agents-core/vision_agents/core/agents/agents.py
- tests/test_tts_base.py
- plugins/fish/tests/test_fish_tts.py
- agents-core/vision_agents/core/tts/tts.py
- agents-core/vision_agents/core/edge/edge_transport.py
- agents-core/vision_agents/core/tts/testing.py
- plugins/elevenlabs/tests/test_tts.py
- plugins/cartesia/tests/test_tts.py
- agents-core/vision_agents/core/edge/types.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
- tests/test_tts_base.py
🧠 Learnings (1)
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Do not use pytest.mark.asyncio; async support is automatic
Applied to files:
- tests/test_tts_base.py
🧬 Code graph analysis (10)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
agents-core/vision_agents/core/edge/types.py (1)
OutputAudioTrack(47-55)agents-core/vision_agents/core/edge/edge_transport.py (1)
create_audio_track(34-35)
agents-core/vision_agents/core/agents/agents.py (5)
agents-core/vision_agents/core/edge/types.py (2)
OutputAudioTrack(47-55)
write(53-53)agents-core/vision_agents/core/tts/events.py (1)
TTSAudioEvent(10-21)agents-core/vision_agents/core/edge/edge_transport.py (1)
create_audio_track(34-35)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
create_audio_track(291-296)agents-core/vision_agents/core/tts/tts.py (1)
set_output_format(72-90)
tests/test_tts_base.py (4)
agents-core/vision_agents/core/tts/tts.py (4)
TTS(31-315)
stream_audio(163-186)
set_output_format(72-90)
send(202-303)agents-core/vision_agents/core/edge/types.py (2)
PcmData(58-526)
from_bytes(139-207)agents-core/vision_agents/core/tts/testing.py (3)
TTSSession(23-81)
speeches(63-64)
errors(67-68)agents-core/vision_agents/core/events/manager.py (1)
wait(470-484)
plugins/fish/tests/test_fish_tts.py (3)
plugins/aws/tests/test_tts.py (1)
tts(29-33)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)
agents-core/vision_agents/core/tts/tts.py (3)
agents-core/vision_agents/core/events/base.py (2)
PluginClosedEvent(67-74)
AudioFormat(23-30)agents-core/vision_agents/core/edge/types.py (6)
PcmData(58-526)
from_bytes(139-207)
resample(272-339)
to_bytes(341-357)
duration_ms(122-124)
close(42-43)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)
TTSErrorEvent(51-64)
agents-core/vision_agents/core/edge/edge_transport.py (2)
agents-core/vision_agents/core/edge/types.py (2)
User(23-26)
OutputAudioTrack(47-55)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
create_audio_track(291-296)
add_track_subscriber(301-304)
agents-core/vision_agents/core/tts/testing.py (2)
agents-core/vision_agents/core/tts/tts.py (3)
TTS(31-315)
set_output_format(72-90)
send(202-303)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSErrorEvent(51-64)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)
plugins/elevenlabs/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)
plugins/cartesia/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(13-82)agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)
agents-core/vision_agents/core/edge/types.py (3)
agents-core/vision_agents/core/agents/agents.py (1)
close(438-509)agents-core/vision_agents/core/edge/edge_transport.py (1)
close(38-39)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
close(40-41)
close(327-329)
🪛 LanguageTool
docs/ai/instructions/ai-tts.md
[style] ~27-~27: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...=1, format="s16")   ```  - stop_audio can be a no-op  ## init  The plugin con...
(MISSING_IT_THERE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (28)
plugins/cartesia/tests/test_tts.py (3)
1-12: LGTM—Integration test setup is clean.The dotenv loading and imports align well with the shift to integration-style testing. Module-level
load_dotenv()is appropriate for test files.
23-31: LGTM—Integration test correctly uses TTSSession pattern.The test properly configures the output format, collects events via
TTSSession, and validates both error conditions and audio generation. The 30-second timeout is appropriate for a real API call.
37-39: LGTM—Non-blocking test uses appropriate helper.The test correctly delegates to
assert_tts_send_non_blocking, which includes built-in assertions to verify thattts.send()doesn't block the event loop.plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
25-25: LGTM—Import and formatting align with protocol changes.The OutputAudioTrack import and boolean check formatting support the PR's PCM-first refactor without altering behavior.
Also applies to: 107-107
291-296: LGTM—Signature matches abstract method and returns protocol-compliant track.The multi-line signature and OutputAudioTrack return type align with the updated EdgeTransport interface.
docs/ai/instructions/ai-tests.md (1)
9-21: LGTM—Non-blocking test documentation is clear and follows guidelines.The example correctly omits
@pytest.mark.asynciowhile keeping@pytest.mark.integration, per coding guidelines.tests/test_tts_base.py (3)
8-48: LGTM—Stereo-to-mono test validates channel reduction correctly.The dummy TTS creates interleaved stereo PCM; the test confirms mono output is approximately half the size, which aligns with expected behavior.
19-61: LGTM—Resample test validates downsampling correctly.Downsampling from 16kHz to 8kHz should approximately halve the byte count; the assertion range accounts for resampling artifacts.
30-71: LGTM—Error handling test validates exception propagation and event emission.The test confirms that errors raised in
stream_audioboth propagate to the caller and emit TTSErrorEvent.docs/ai/instructions/ai-tts.md (1)
1-53: LGTM—TTS plugin guide is clear and accurate.The documentation provides a comprehensive, well-structured guide for building TTS plugins. Past typos have been corrected.
agents-core/vision_agents/core/edge/edge_transport.py (1)
12-12: LGTM—Abstract interface updated to use OutputAudioTrack protocol.Import and method signature changes align with the PCM-first refactor and are consistently implemented across concrete transport classes.
Also applies to: 34-35, 58-60
plugins/fish/tests/test_fish_tts.py (1)
18-36: LGTM—Tests follow integration patterns and use proper helpers.The three tests correctly use
manual_tts_to_wav,TTSSession, andassert_tts_send_non_blocking, aligning with the updated testing guidelines.plugins/elevenlabs/tests/test_tts.py (1)
10-38: LGTM—ElevenLabs tests are well-structured and include proper credential checks.The fixture gracefully skips when
ELEVENLABS_API_KEYis absent, and all three integration tests use the recommended patterns (TTSSession,manual_tts_to_wav,assert_tts_send_non_blocking).agents-core/vision_agents/core/tts/testing.py (3)
15-81: LGTM—TTSSession provides clean event-driven test helpers.The session subscribes to key TTS events and exposes accumulated speeches/errors through properties. The
wait_for_resulttimeout pattern ensures tests don't hang indefinitely.
84-127: LGTM—Event loop probe measures responsiveness effectively.The ticker task counts intervals while the target coroutine runs, detecting blocking behavior. The finally block ensures cleanup even if the coroutine raises.
130-160: LGTM—Non-blocking assertion provides robust detection of event loop blocking.The helper asserts sufficient tick count only when the call duration justifies it, avoiding false positives for fast completions. The returned probe result allows tests to inspect metrics further.
agents-core/vision_agents/core/agents/agents.py (2)
163-163: LGTM: Protocol-based typing improves decoupling.The type change from a concrete
aiortc.AudioStreamTrackto theOutputAudioTrackProtocol aligns well with the PR's decoupling objectives.
1029-1041: No action required—TTS resampling architecture is sound.Verification confirms all six TTS provider implementations return compatible types (PcmData or iterators thereof). The base class properly normalizes these via
_normalize_to_pcm()and resamples during emission usingpcm.resample(self._desired_sample_rate, self._desired_channels). The hardcoded 48kHz stereo is a WebRTC standard, and any resampling failure will throw an exception rather than silently degrade. All TTS providers can handle the requested output format without compatibility issues.agents-core/vision_agents/core/tts/tts.py (3)
111-127: LGTM: Comprehensive response normalization.The
_iter_pcmgenerator correctly handles multiple provider response shapes (single buffer, async/sync iterables) and avoids the pitfall of treating bytes as an iterable of integers.
129-160: LGTM: Clean resampling and event emission.The
_emit_chunkmethod correctly resamples to the desired format, emits metrics, and returns both byte length and duration for accurate tracking.
283-303: LGTM: Comprehensive error handling and observability.The error path correctly emits events, records metrics, and ensures latency is always tracked via the finally block, even on failure.
agents-core/vision_agents/core/edge/types.py (7)
46-55: LGTM: Clean Protocol definition for audio output.The
OutputAudioTrackProtocol withwriteandstopmethods provides a clear, runtime-checkable interface for decoupling.
77-77: LGTM: Multi-channel support with correct duration calculation.The
channelsfield and updateddurationproperty correctly handle 2D arrays with shape(channels, samples).Also applies to: 92-96, 121-124
139-207: LGTM: Robust multi-channel PCM parsing.The
from_bytesmethod correctly aligns buffers, determines dtype from format, and converts interleaved multi-channel data to planar(channels, samples)representation with proper error handling.
209-270: LGTM: Flexible PcmData construction from multiple input types.The
from_datamethod handles bytes-like and numpy arrays with various shapes, normalizing to the canonical(channels, samples)representation with proper dtype alignment and fallback logic.
341-357: LGTM: Correct interleaving for multi-channel output.The
to_bytesmethod correctly transposes(channels, samples)to(samples, channels)and flattens to produce interleaved PCM bytes.
359-400: LGTM: Complete WAV export with proper format conversion.The
to_wav_bytesmethod handles format conversion (f32 → s16 with clipping), constructs proper WAV headers, and supports multi-channel output.
402-526: LGTM: Comprehensive provider response normalization.The
from_responsefactory method handles diverse provider response shapes (bytes, iterables, async iterables, PcmData, objects with.data) and includes proper frame alignment buffering with zero-padding for partial frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
♻️ Duplicate comments (4)
agents-core/vision_agents/core/tts/manual_test.py (1)
42-53: Audio may be truncated: waits only for first event.
TTSSession.wait_for_result()returns after the first audio/error event arrives (see its_first_event.wait()implementation). Writing the WAV immediately can produce partial audio because additional speech chunks may still be streaming in. The function should drain events until synthesis completes or no new chunks arrive for a brief window.Consider implementing a drain loop as suggested in the previous review:
async def manual_tts_to_wav( tts: TTS, *, sample_rate: int = 16000, channels: int = 1, text: str = "This is a manual TTS playback test.", outfile_path: Optional[str] = None, timeout_s: float = 20.0, + drain_s: float = 1.0, ) -> str: @@ tts.set_output_format(sample_rate=sample_rate, channels=channels) session = TTSSession(tts) await tts.send(text) result = await session.wait_for_result(timeout=timeout_s) if result.errors: raise RuntimeError(f"TTS errors: {result.errors}") + # Drain until quiet to collect full utterance + import asyncio + last_len = len(session.speeches) + idle_deadline = time.time() + drain_s + while time.time() < idle_deadline: + await asyncio.sleep(0.05) + if len(session.speeches) != last_len: + last_len = len(session.speeches) + idle_deadline = time.time() + drain_s + # Convert captured audio to PcmData - pcm_bytes = b"".join(result.speeches) + pcm_bytes = b"".join(session.speeches) pcm = PcmData.from_bytes( pcm_bytes, sample_rate=sample_rate, channels=channels, format="s16" )plugins/fish/tests/test_fish_tts.py (1)
1-7: Skip integration tests gracefully when API keys are absent (put check in the fixture).Without FISH_API_KEY/FISH_AUDIO_API_KEY these tests will fail. Skip early in the fixture and import os.
Apply this diff:
@@ -import pytest +import os +import pytest import pytest_asyncio @@ class TestFishTTS: @pytest_asyncio.fixture async def tts(self) -> fish.TTS: - return fish.TTS() + if not (os.environ.get("FISH_API_KEY") or os.environ.get("FISH_AUDIO_API_KEY")): + pytest.skip("FISH_API_KEY/FISH_AUDIO_API_KEY not set; skipping integration tests.") + return fish.TTS()Also applies to: 13-17
agents-core/vision_agents/core/tts/tts.py (1)
289-296: Mark the final streamed chunk with is_final_chunk=True.Downstream can’t know when to close; add one‑element lookahead.
Apply this diff:
- else: - async for pcm in self._iter_pcm(response): - bytes_len, dur_ms = self._emit_chunk( - pcm, chunk_index, False, synthesis_id, text, user - ) - total_audio_bytes += bytes_len - total_audio_ms += dur_ms - chunk_index += 1 + else: + ait = self._iter_pcm(response) + try: + prev = await ait.__anext__() + except StopAsyncIteration: + prev = None + while prev is not None: + try: + nxt = await ait.__anext__() + is_final = False + except StopAsyncIteration: + nxt = None + is_final = True + bytes_len, dur_ms = self._emit_chunk( + prev, chunk_index, is_final, synthesis_id, text, user + ) + total_audio_bytes += bytes_len + total_audio_ms += dur_ms + chunk_index += 1 + prev = nxtagents-core/vision_agents/core/edge/types.py (1)
322-352: resample: choose AV input format based on dtype; current code breaks on float32.AudioFrame.from_ndarray(..., format="s16p") assumes int16; f32 inputs will misparse or fail. Detect dtype (s16 vs f32) and pick s16/s16p or flt/fltp accordingly.
Apply this diff:
- # Prepare ndarray shape for AV input frame. - # Use planar input (s16p) with shape (channels, samples). - in_layout = "mono" if self.channels == 1 else "stereo" + # Prepare ndarray shape for AV input frame. + # Use planar input shape (channels, samples); pick format by dtype. + in_layout = "mono" if self.channels == 1 else "stereo" cmaj = self.samples if isinstance(cmaj, np.ndarray): @@ - cmaj = np.ascontiguousarray(cmaj) - frame = av.AudioFrame.from_ndarray(cmaj, format="s16p", layout=in_layout) + cmaj = np.ascontiguousarray(cmaj) + # Select AV input format matching dtype + if isinstance(cmaj, np.ndarray): + if cmaj.dtype == np.int16: + in_format = "s16" if self.channels == 1 else "s16p" + elif cmaj.dtype == np.float32: + in_format = "flt" if self.channels == 1 else "fltp" + else: + cmaj = cmaj.astype(np.int16) + in_format = "s16" if self.channels == 1 else "s16p" + else: + # bytes or other: assume s16 mono/stereo by channels + in_format = "s16" if self.channels == 1 else "s16p" + frame = av.AudioFrame.from_ndarray(cmaj, format=in_format, layout=in_layout)
🧹 Nitpick comments (8)
plugins/kokoro/tests/test_tts.py (1)
8-11: Consider using pytest.importorskip for cleaner imports.The current try/except pattern works but
pytest.importorskipprovides a more idiomatic approach for conditional test skipping based on import availability, and avoids the broadExceptioncatch.Apply this diff:
def tts(self): # returns kokoro TTS if available - try: - import kokoro # noqa: F401 - except Exception: - pytest.skip("kokoro package not installed; skipping manual playback test.") + pytest.importorskip("kokoro", reason="kokoro package not installed") from vision_agents.plugins import kokoro as kokoro_plugintests/test_resample_quality.py (1)
144-146: Remove unnecessary main block.Pytest automatically discovers and runs test functions. The
if __name__ == "__main__"block is unnecessary and bypasses pytest's fixture system (liketmp_path), potentially causing the tests to fail when run directly.Apply this diff:
- -if __name__ == "__main__": - test_compare_resampling_methods() - test_pyav_resampler_settings()Run tests using:
pytest tests/test_resample_quality.pyplugins/cartesia/tests/test_tts.py (1)
33-35: Consider adding assertions for the manual WAV test.The test calls
manual_tts_to_wavbut doesn't verify the result. Consider asserting that the returned path exists and the file has non-zero size.@pytest.mark.integration async def test_cartesia_tts_convert_text_to_audio_manual_test(self, tts): - await manual_tts_to_wav(tts, sample_rate=48000, channels=2) + wav_path = await manual_tts_to_wav(tts, sample_rate=48000, channels=2) + assert os.path.exists(wav_path) + assert os.path.getsize(wav_path) > 0agents-core/vision_agents/core/tts/manual_test.py (1)
55-64: Consider ensuring parent directory exists for custom paths.If a user provides a custom
outfile_pathwith non-existent parent directories, the write operation will fail. Adding directory creation would make the function more robust.# Generate a descriptive filename if not provided if outfile_path is None: tmpdir = tempfile.gettempdir() timestamp = int(time.time()) outfile_path = os.path.join( tmpdir, f"tts_manual_test_{tts.__class__.__name__}_{timestamp}.wav" ) + else: + # Ensure parent directory exists if custom path provided + parent_dir = os.path.dirname(outfile_path) + if parent_dir: + os.makedirs(parent_dir, exist_ok=True) # Use utility function to write WAV and optionally play return await play_pcm_with_ffplay(pcm, outfile_path=outfile_path, timeout_s=30.0)plugins/elevenlabs/tests/test_tts.py (1)
31-34: Add assertions and prefer pytest output mechanisms over print.The test lacks assertions to verify the WAV file was created successfully, and uses
print()which may not appear in pytest output as expected.Consider this refinement:
@pytest.mark.integration async def test_elevenlabs_tts_convert_text_to_audio_manual_test(self, tts): path = await manual_tts_to_wav(tts, sample_rate=48000, channels=2) - print("ElevenLabs TTS audio written to:", path) + assert os.path.exists(path), f"WAV file not created at {path}" + assert os.path.getsize(path) > 0, f"WAV file is empty at {path}"DEVELOPMENT.md (1)
171-178: Clarify optional playback behavior.Mention that playback requires ffplay on PATH (already true) and is optional. Consider adding an env gate (e.g., FFPLAY=1) to avoid accidental audio during CI.
Would you like a small patch to gate playback behind an env var?
tests/test_pcm_data.py (1)
92-101: Minor: prefer pytest.approx and linspace endpoint handling.Use pytest.approx for tolerances and np.linspace(..., endpoint=False) to avoid off-by-one artifacts in 1s signals.
Example:
- t = np.linspace(0, duration_sec, num_samples, dtype=np.float32) + t = np.linspace(0, duration_sec, num_samples, endpoint=False, dtype=np.float32) @@ - assert abs(mono_duration - duration_sec) < 0.01 + import pytest + assert mono_duration == pytest.approx(duration_sec, abs=0.01)Also applies to: 118-122
agents-core/vision_agents/core/edge/types.py (1)
650-704: Optional: gate ffplay playback behind an env var to avoid accidental audio in CI.Play only if FFPLAY=1 (or another opt‑in) in addition to ffplay presence.
Apply this diff:
- # Optional playback with ffplay - if shutil.which("ffplay"): + # Optional playback with ffplay (enable by setting FFPLAY=1) + if os.environ.get("FFPLAY") == "1" and shutil.which("ffplay"): logger.info("Playing audio with ffplay...")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (15)
- DEVELOPMENT.md(1 hunks)
- agents-core/vision_agents/core/edge/types.py(6 hunks)
- agents-core/vision_agents/core/tts/manual_test.py(1 hunks)
- agents-core/vision_agents/core/tts/tts.py(5 hunks)
- conftest.py(9 hunks)
- plugins/aws/README.md(4 hunks)
- plugins/aws/example/aws_polly_tts_example.py(1 hunks)
- plugins/aws/tests/test_tts.py(1 hunks)
- plugins/cartesia/tests/test_tts.py(1 hunks)
- plugins/elevenlabs/tests/test_tts.py(1 hunks)
- plugins/fish/tests/test_fish_tts.py(1 hunks)
- plugins/kokoro/tests/test_tts.py(1 hunks)
- plugins/openai/tests/test_tts_openai.py(1 hunks)
- tests/test_pcm_data.py(1 hunks)
- tests/test_resample_quality.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- conftest.py
🚧 Files skipped from review as they are similar to previous changes (4)
- plugins/aws/tests/test_tts.py
- plugins/openai/tests/test_tts_openai.py
- plugins/aws/example/aws_polly_tts_example.py
- plugins/aws/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- plugins/cartesia/tests/test_tts.py
- tests/test_pcm_data.py
- agents-core/vision_agents/core/tts/manual_test.py
- tests/test_resample_quality.py
- plugins/fish/tests/test_fish_tts.py
- plugins/kokoro/tests/test_tts.py
- plugins/elevenlabs/tests/test_tts.py
- agents-core/vision_agents/core/tts/tts.py
- agents-core/vision_agents/core/edge/types.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
- tests/test_pcm_data.py
- tests/test_resample_quality.py
🧬 Code graph analysis (9)
plugins/cartesia/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(12-64)agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)
tests/test_pcm_data.py (1)
agents-core/vision_agents/core/edge/types.py (4)
PcmData(63-647)
to_bytes(441-478)
resample(298-439)
duration(89-145)
agents-core/vision_agents/core/tts/manual_test.py (2)
agents-core/vision_agents/core/tts/testing.py (4)
TTSSession(23-81)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/edge/types.py (3)
PcmData(63-647)
play_pcm_with_ffplay(650-704)
from_bytes(165-233)
tests/test_resample_quality.py (1)
agents-core/vision_agents/core/edge/types.py (3)
PcmData(63-647)
duration(89-145)
resample(298-439)
plugins/fish/tests/test_fish_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(12-64)agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)
plugins/kokoro/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(12-64)plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
TTS(18-77)
plugins/elevenlabs/tests/test_tts.py (2)
agents-core/vision_agents/core/tts/testing.py (5)
TTSSession(23-81)
assert_tts_send_non_blocking(130-160)
wait_for_result(70-81)
errors(67-68)
speeches(63-64)agents-core/vision_agents/core/tts/manual_test.py (1)
manual_tts_to_wav(12-64)
agents-core/vision_agents/core/tts/tts.py (3)
agents-core/vision_agents/core/events/base.py (2)
PluginClosedEvent(67-74)
AudioFormat(23-30)agents-core/vision_agents/core/edge/types.py (5)
PcmData(63-647)
resample(298-439)
to_bytes(441-478)
duration_ms(148-150)
close(47-48)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)
TTSErrorEvent(51-64)
agents-core/vision_agents/core/edge/types.py (3)
agents-core/vision_agents/core/edge/edge_transport.py (2)
close(38-39)
join(46-47)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
close(40-41)
close(327-329)
join(237-289)agents-core/vision_agents/core/agents/agents.py (2)
close(438-509)
join(342-414)
🪛 LanguageTool
DEVELOPMENT.md
[grammar] ~117-~117: Ensure spelling is correct
Context: ...ork with different PCM formats, usually 16khz mono 3. PCM data is always passed aroun...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~117-~117: Ensure spelling is correct
Context: ...ifferent PCM formats, usually 16khz mono 3. PCM data is always passed around using t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (8)
plugins/kokoro/tests/test_tts.py (1)
1-2: LGTM: Clean imports.Import organization follows conventions and complies with coding guidelines.
plugins/cartesia/tests/test_tts.py (3)
1-12: LGTM! Clean integration test setup.The imports and environment loading are well-structured for the new integration testing approach.
15-21: LGTM! Proper environment-gated fixture.The fixture correctly skips tests when the API key is unavailable, making the integration tests safe to run in CI without credentials.
23-31: LGTM! Well-structured integration tests.These tests properly validate real API interaction and non-blocking behavior using the established testing utilities.
Also applies to: 37-39
agents-core/vision_agents/core/tts/manual_test.py (1)
1-10: LGTM! Imports are clean and necessary.plugins/elevenlabs/tests/test_tts.py (3)
1-8: LGTM!The imports are clean and appropriate for integration-style testing with pytest-asyncio fixtures.
20-29: LGTM!The test correctly uses
TTSSessionto capture events and validate TTS behavior with appropriate assertions.
36-38: LGTM!The non-blocking assertion properly verifies that the TTS send operation doesn't block the event loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
agents-core/vision_agents/core/tts/tts.py (2)
74-93: Format mismatch: events claim arbitrary format but pipeline only emits s16 bytesThe pipeline hardcodes
format="s16"in the resampler (line 123), yetset_output_formataccepts anyAudioFormatand propagates it to events (line 197). Whenaudio_format=AudioFormat.PCM_F32is passed,TTSAudioEventmetadata will claim f32 whileaudio_datacontains s16 bytes, breaking downstream parsers.Clamp to PCM_S16 until f32 support is implemented:
def set_output_format( self, sample_rate: int, channels: int = 1, audio_format: AudioFormat = AudioFormat.PCM_S16, ) -> None: """Set the desired output audio format for emitted events. The agent should call this with its output track properties so this TTS instance can resample and rechannel audio appropriately. Args: sample_rate: Desired sample rate in Hz (e.g., 48000) channels: Desired channel count (1 for mono, 2 for stereo) audio_format: Desired audio format (defaults to PCM S16) """ + if audio_format != AudioFormat.PCM_S16: + logger.warning( + "Only PCM_S16 is currently supported; %s will be coerced to PCM_S16", + audio_format.value, + ) + audio_format = AudioFormat.PCM_S16 self._desired_sample_rate = int(sample_rate) self._desired_channels = int(channels) self._desired_format = audio_format
290-306: Streaming consumers never see the final chunkAll chunks are emitted with
is_final_chunk=False(line 301). Downstream consumers waiting for finalization will hang or require timeouts.Use one-element lookahead to mark the last chunk:
else: - async for pcm in self._iter_pcm(response): - bytes_len, dur_ms = self._emit_chunk( - pcm, chunk_index, False, synthesis_id, text, user - ) - total_audio_bytes += bytes_len - total_audio_ms += dur_ms - chunk_index += 1 + ait = self._iter_pcm(response) + prev = None + try: + prev = await ait.__anext__() + except StopAsyncIteration: + pass + while prev is not None: + try: + nxt = await ait.__anext__() + is_final = False + except StopAsyncIteration: + nxt = None + is_final = True + bytes_len, dur_ms = self._emit_chunk( + prev, chunk_index, is_final, synthesis_id, text, user + ) + total_audio_bytes += bytes_len + total_audio_ms += dur_ms + chunk_index += 1 + prev = nxt
🧹 Nitpick comments (3)
tests/test_utils.py (1)
367-373: Consider extracting duplicate array dimension handling.The same array dimension logic appears in both test methods. While not critical, extracting this into a small helper function would reduce duplication and improve maintainability.
Example helper:
def get_sample_count(pcm_data: PcmData) -> int: """Extract sample count from PcmData, handling both 1D and 2D arrays.""" return ( pcm_data.samples.shape[-1] if pcm_data.samples.ndim > 1 else len(pcm_data.samples) )Then use it in both tests:
- num_samples = ( - resampled.samples.shape[-1] - if resampled.samples.ndim > 1 - else len(resampled.samples) - ) + num_samples = get_sample_count(resampled)Also applies to: 388-394
agents-core/vision_agents/core/edge/types.py (2)
89-150: Duration calculation handles ambiguous array shapes defensivelyThe logic at lines 100-117 infers which dimension represents samples vs. channels by comparing shapes to
self.channels. For ambiguous cases (e.g., 2×2 arrays), it picks the max dimension (line 115), which is a reasonable heuristic.Consider documenting the shape assumption in the class docstring to clarify the internal convention is
(channels, samples)and that(samples, channels)is auto-detected.
658-712: Debug utility for audio playback is helpful but narrow in scopeThe
play_pcm_with_ffplayfunction writes WAV files and spawns ffplay for testing. The timeout handling (lines 704-708) prevents hangs.Consider noting in the docstring that this is intended for local development/debugging only, as it relies on ffplay being in PATH and spawns uncontrolled subprocesses.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
- agents-core/vision_agents/core/edge/types.py(6 hunks)
- agents-core/vision_agents/core/tts/manual_test.py(1 hunks)
- agents-core/vision_agents/core/tts/tts.py(5 hunks)
- tests/test_utils.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- agents-core/vision_agents/core/tts/manual_test.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
- tests/test_utils.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
- tests/test_utils.py
- agents-core/vision_agents/core/tts/tts.py
- agents-core/vision_agents/core/edge/types.py
🧬 Code graph analysis (3)
tests/test_utils.py (2)
agents-core/vision_agents/core/utils/utils.py (2)
parse_instructions(41-90)
Instructions(17-21)agents-core/vision_agents/core/edge/types.py (6)
PcmData(63-655)
from_bytes(165-233)
duration(89-145)
resample(298-447)
pts_seconds(153-156)
dts_seconds(159-162)
agents-core/vision_agents/core/tts/tts.py (7)
agents-core/vision_agents/core/events/base.py (2)
PluginClosedEvent(67-74)
AudioFormat(23-30)agents-core/vision_agents/core/edge/types.py (5)
PcmData(63-655)
resample(298-447)
to_bytes(449-486)
duration_ms(148-150)
close(47-48)agents-core/vision_agents/core/tts/events.py (4)
TTSAudioEvent(10-21)
TTSSynthesisStartEvent(25-33)
TTSSynthesisCompleteEvent(37-47)
TTSErrorEvent(51-64)tests/test_tts_base.py (3)
stream_audio(9-13)
stream_audio(20-24)
stream_audio(31-32)plugins/aws/vision_agents/plugins/aws/tts.py (1)
stream_audio(61-94)plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)
stream_audio(54-82)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
stream_audio(39-62)
agents-core/vision_agents/core/edge/types.py (3)
agents-core/vision_agents/core/edge/edge_transport.py (2)
close(38-39)
join(46-47)agents-core/vision_agents/core/agents/agents.py (2)
close(438-509)
join(342-414)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
close(40-41)
close(327-329)
join(237-289)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (17)
tests/test_utils.py (2)
28-39: LGTM! Test expectations correctly updated.The test text and assertions have been updated to include
@guide.md, which properly validates the enhanced parse_instructions behavior for collecting multiple markdown mentions.
192-326: File handling improvements look good.The explicit use of
encoding='utf-8'when opening text files is a best practice that ensures consistent behavior across platforms.agents-core/vision_agents/core/tts/tts.py (5)
98-137: LGTM: Persistent resampler avoids audio artifactsThe persistent resampler pattern prevents clicking/discontinuities between chunks. The input layout detection and debug logging are helpful for troubleshooting.
138-164: LGTM: Type safety checks prevent raw bytes from breaking downstreamThe defensive
isinstance(item, PcmData)checks at lines 147-150 and 158-161 ensure plugins return properly wrapped data, addressing the type-safety concern from the previous review.
166-202: Approve resampling and emission logicThe chunk emission correctly resamples using the persistent resampler, serializes to bytes, records metrics, and emits events. The tuple return for accounting is clean.
257-351: Synthesis lifecycle and observability implementation looks solidThe
sendmethod correctly:
- Resets resampler state per synthesis (lines 261-263)
- Emits start/complete/error events with rich context
- Tracks latency and error metrics using OpenTelemetry counters
- Computes real-time factor from accumulated PCM durations
352-362: Clean plugin lifecycle with PluginClosedEventEmitting
PluginClosedEventon close provides observability for plugin shutdown and aligns with the broader event-driven architecture.agents-core/vision_agents/core/edge/types.py (10)
2-22: LGTM: Import additions support new PCM utilitiesThe added imports (asyncio, os, shutil, tempfile, time, typing extensions) align with the new utilities for PCM handling, WAV conversion, and ffplay integration.
47-48: Abstract close method is appropriate for base classThe
passbody is standard for an abstract/protocol method that subclasses will override with actual cleanup logic.
51-61: OutputAudioTrack protocol enables polymorphic audio outputThe
@runtime_checkabledecorator allowsisinstance()checks, and the minimalist protocol (write/stop) provides a clean abstraction for audio tracks across different transport implementations.
82-86: Multi-channel support additions are straightforwardAdding
channels: int = 1field and thestereoproperty extends PcmData for stereo use cases without breaking existing mono callers.
164-233: from_bytes interleaving logic is robustThe method:
- Aligns buffer to sample boundaries (lines 197-211)
- Converts interleaved
[L,R,L,R,...]to(channels, samples)via reshape and transpose (lines 224-226)- Logs warnings on reshape failures (lines 228-230)
235-296: from_data factory provides flexible PcmData constructionSupporting both bytes-like and numpy arrays with automatic shape normalization (lines 261-286) reduces boilerplate for callers. The dtype alignment (lines 256-259) ensures consistency with the declared format.
298-447: Resample implementation handles PyAV quirks comprehensivelyThe method:
- Normalizes input to
(channels, samples)for PyAV (lines 322-350)- Uses provided or new resampler (lines 354-361)
- Deinterleaves PyAV's packed stereo output at lines 375-389
- Handles various ndim cases defensively (lines 390-419)
- Flattens mono to 1D for consistency (lines 422-427)
- Returns
format="s16"as the resampler always outputs s16 (line 439)This addresses the dtype/format issue from the past review.
449-487: to_bytes interleaving produces correct packed formatThe explicit interleaving loop (lines 473-477) ensures
[L0, R0, L1, R1, ...]order for multi-channel, avoiding stride-related issues. The shape normalization (lines 458-471) handles both(channels, samples)and(samples, channels)layouts.
488-530: WAV serialization converts non-s16 formats correctlyLines 499-518 convert float or non-int16 arrays to s16 by clipping to
[-1.0, 1.0]and scaling to int16 range. Thewavemodule writes a standard WAV header with proper channel/rate metadata.
531-656: from_response handles diverse provider APIs comprehensivelyThe method:
- Returns single
PcmDatafor bytes-like or already-PcmData inputs- Wraps async iterators (lines 563-600) and sync iterators (lines 602-640) with buffering and frame alignment
- Pads incomplete frames with zeros (lines 589-598, 629-638)
- Extracts
.dataattribute from response objects (lines 643-651)This enables plugins to return various response shapes without callers needing custom unwrapping logic.
commit ec32383 Author: Neevash Ramdial (Nash) <[email protected]> Date: Mon Oct 27 15:51:53 2025 -0600 mypy clean up (GetStream#130) commit c52fe4c Author: Neevash Ramdial (Nash) <[email protected]> Date: Mon Oct 27 15:28:00 2025 -0600 remove turn keeping from example (GetStream#129) commit e1072e8 Merge: 5bcffa3 fea101a Author: Yarik <[email protected]> Date: Mon Oct 27 14:28:05 2025 +0100 Merge pull request GetStream#106 from tjirab/feat/20251017_gh-labeler feat: Github pull request labeler commit 5bcffa3 Merge: 406673c bfe888f Author: Thierry Schellenbach <[email protected]> Date: Sat Oct 25 10:56:27 2025 -0600 Merge pull request GetStream#119 from GetStream/fix-screensharing Fix screensharing commit bfe888f Merge: 8019c14 406673c Author: Thierry Schellenbach <[email protected]> Date: Sat Oct 25 10:56:15 2025 -0600 Merge branch 'main' into fix-screensharing commit 406673c Author: Stefan Blos <[email protected]> Date: Sat Oct 25 03:03:10 2025 +0200 Update README (GetStream#118) * Changed README to LaRaes version * Remove arrows from table * Add table with people & projects to follow * Update images and links in README.md commit 3316908 Author: Tommaso Barbugli <[email protected]> Date: Fri Oct 24 23:48:06 2025 +0200 Simplify TTS plugin and audio utils (GetStream#123) - Simplified TTS plugin - AWS Polly TTS plugin - OpenAI TTS plugin - Improved audio utils commit 8019c14 Author: Max Kahan <[email protected]> Date: Fri Oct 24 17:32:26 2025 +0100 remove video forwarder lazy init commit ca62d37 Author: Max Kahan <[email protected]> Date: Thu Oct 23 16:44:03 2025 +0100 use correct codec commit 8cf8788 Author: Max Kahan <[email protected]> Date: Thu Oct 23 14:27:18 2025 +0100 rename variable to fix convention commit 33fd70d Author: Max Kahan <[email protected]> Date: Thu Oct 23 14:24:42 2025 +0100 unsubscribe from events commit 3692131 Author: Max Kahan <[email protected]> Date: Thu Oct 23 14:19:53 2025 +0100 remove nonexistent type commit c5f68fe Author: Max Kahan <[email protected]> Date: Thu Oct 23 14:10:07 2025 +0100 cleanup tests to fit style commit 8b3c61a Author: Max Kahan <[email protected]> Date: Thu Oct 23 13:55:08 2025 +0100 clean up resources when track cancelled commit d8e08cb Author: Max Kahan <[email protected]> Date: Thu Oct 23 13:24:55 2025 +0100 fix track republishing in agent commit 0f8e116 Author: Max Kahan <[email protected]> Date: Wed Oct 22 15:37:11 2025 +0100 add tests commit 08e6133 Author: Max Kahan <[email protected]> Date: Wed Oct 22 15:25:37 2025 +0100 ensure video track dimensions are an even number commit 6a725b0 Merge: 5f001e0 5088709 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 15:23:58 2025 -0600 Merge pull request GetStream#122 from GetStream/cleanup_stt Cleanup STT commit 5088709 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 15:23:34 2025 -0600 cleanup of stt commit f185120 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 15:08:42 2025 -0600 more cleanup commit 05ccbfd Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 14:51:48 2025 -0600 cleanup commit bb834ca Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 14:28:53 2025 -0600 more cleanup for stt commit 7a3f2d2 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 14:11:35 2025 -0600 more test cleanup commit ad7f4fe Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 14:10:57 2025 -0600 cleanup test commit 9e50cdd Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 14:03:45 2025 -0600 large cleanup commit 5f001e0 Merge: 95a03e4 5d204f3 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 12:01:52 2025 -0600 Merge pull request GetStream#121 from GetStream/fish_stt [AI-201] Fish speech to text (partial) commit 5d204f3 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 11:48:16 2025 -0600 remove ugly tests commit ee9a241 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 11:46:19 2025 -0600 cleanup commit 6eb8270 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 11:23:00 2025 -0600 fix 48khz support commit 3b90548 Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 23 10:59:08 2025 -0600 first attempt at fish stt, doesnt entirely work just yet commit 95a03e4 Merge: b90c9e3 b4c0da8 Author: Tommaso Barbugli <[email protected]> Date: Thu Oct 23 10:11:39 2025 +0200 Merge branch 'main' of github.com:GetStream/Vision-Agents commit b90c9e3 Author: Tommaso Barbugli <[email protected]> Date: Wed Oct 22 23:28:28 2025 +0200 remove print and double event handling commit b4c0da8 Merge: 3d06446 a426bc2 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 15:08:51 2025 -0600 Merge pull request GetStream#117 from GetStream/openrouter [AI-194] Openrouter commit a426bc2 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 15:03:10 2025 -0600 skip broken test commit ba6c027 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 14:50:23 2025 -0600 almost working openrouter commit 0b1c873 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 14:47:12 2025 -0600 almost working, just no instruction following commit ce63233 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 14:35:53 2025 -0600 working memory for openai commit 149e886 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 13:32:43 2025 -0600 todo commit e0df1f6 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 13:20:38 2025 -0600 first pass at adding openrouter commit 3d06446 Merge: 4eb8ef4 ef55d66 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 13:20:11 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents commit 4eb8ef4 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 13:20:01 2025 -0600 cleanup ai plugin instructions commit ef55d66 Author: Thierry Schellenbach <[email protected]> Date: Wed Oct 22 12:54:33 2025 -0600 Add link to stash_pomichter for spatial memory commit 9c9737f Merge: c954409 390c45b Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 19:45:09 2025 -0600 Merge pull request GetStream#115 from GetStream/fish [AI-195] Fish support commit 390c45b Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 19:44:37 2025 -0600 cleannup commit 1cc1cf1 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 19:42:03 2025 -0600 happy tests commit 8163d32 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 19:39:21 2025 -0600 fix gemini rule following commit ada3ac9 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 19:20:18 2025 -0600 fish tts commit 61a26cf Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 16:44:03 2025 -0600 attempt at fish commit c954409 Merge: ab27e48 c71da10 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 14:18:15 2025 -0600 Merge pull request GetStream#104 from GetStream/bedrock [AI-192] - Bedrock, AWS & Nova commit c71da10 Author: Tommaso Barbugli <[email protected]> Date: Tue Oct 21 22:00:25 2025 +0200 maybe commit b5482da Author: Tommaso Barbugli <[email protected]> Date: Tue Oct 21 21:46:15 2025 +0200 debugging commit 9a36e45 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 13:14:58 2025 -0600 echo environment name commit 6893968 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 12:53:58 2025 -0600 more debugging commit c35fc47 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 12:45:44 2025 -0600 add some debug info commit 0d6d3fd Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 12:03:13 2025 -0600 run test fix commit c3a31bd Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 11:52:25 2025 -0600 log cache hit commit 04554ae Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 11:48:03 2025 -0600 fix glob commit 7da96db Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 11:33:56 2025 -0600 mypy commit 186053f Merge: 4b540c9 ab27e48 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 11:17:17 2025 -0600 happy tests commit 4b540c9 Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 10:20:04 2025 -0600 happy tests commit b05a60a Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 09:17:45 2025 -0600 add readme commit 71affcc Author: Thierry Schellenbach <[email protected]> Date: Tue Oct 21 09:13:01 2025 -0600 rename to aws commit d2eeba7 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 21:32:01 2025 -0600 ai tts instructions commit 98a4f9d Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 16:49:00 2025 -0600 small edits commit ab27e48 Author: Tommaso Barbugli <[email protected]> Date: Mon Oct 20 21:42:04 2025 +0200 Ensure user agent is initialized before joining the call (GetStream#113) * ensure user agent is initialized before joining the call * wip commit 3cb339b Author: Tommaso Barbugli <[email protected]> Date: Mon Oct 20 21:22:57 2025 +0200 New conversation API (GetStream#102) * trying to resurrect * test transcription events for openai * more tests for openai and gemini llm * more tests for openai and gemini llm * update py-client * wip * ruff * wip * ruff * snap * another way * another way, a better way * ruff * ruff * rev * ruffit * mypy everything * brief * tests * openai dep bump * snap - broken * nothingfuckingworks * message id * fix test * ruffit commit cb6f00a Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 13:18:03 2025 -0600 use qwen commit f84b2ad Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 13:02:24 2025 -0600 fix tests commit e61acca Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 12:50:40 2025 -0600 testing and linting commit 5f4d353 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 12:34:14 2025 -0600 working commit c2a15a9 Merge: a310771 1025a42 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 11:40:00 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents into bedrock commit a310771 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 11:39:48 2025 -0600 wip commit b4370f4 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 11:22:43 2025 -0600 something isn't quite working commit 2dac975 Author: Thierry Schellenbach <[email protected]> Date: Mon Oct 20 10:30:04 2025 -0600 add the examples commit 6885289 Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 20:19:42 2025 -0600 ai realtime docs commit a0fa3cc Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 18:48:06 2025 -0600 wip commit b914fc3 Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 18:40:22 2025 -0600 fix ai llm commit b5b00a7 Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 17:11:26 2025 -0600 work audio input commit ac72260 Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 16:47:19 2025 -0600 fix model id commit 2b5863c Author: Thierry Schellenbach <[email protected]> Date: Sun Oct 19 16:32:54 2025 -0600 wip on bedrock commit 8bb4162 Author: Thierry Schellenbach <[email protected]> Date: Fri Oct 17 15:22:03 2025 -0600 next up the connect method commit 7a21e4e Author: Thierry Schellenbach <[email protected]> Date: Fri Oct 17 14:12:00 2025 -0600 nova progress commit 16e8ba0 Author: Thierry Schellenbach <[email protected]> Date: Fri Oct 17 13:16:00 2025 -0600 docs for bedrock nova commit 1025a42 Author: Bart Schuijt <[email protected]> Date: Fri Oct 17 21:05:45 2025 +0200 fix: Update .env.example for Gemini Live (GetStream#108) commit e12112d Author: Thierry Schellenbach <[email protected]> Date: Fri Oct 17 11:49:07 2025 -0600 wip commit fea101a Author: Bart Schuijt <[email protected]> Date: Fri Oct 17 09:25:55 2025 +0200 workflow file update commit bb2d74c Author: Bart Schuijt <[email protected]> Date: Fri Oct 17 09:22:33 2025 +0200 initial commit commit d2853cd Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 16 19:44:59 2025 -0600 always remember pep 420 commit 30a8eca Author: Thierry Schellenbach <[email protected]> Date: Thu Oct 16 19:36:58 2025 -0600 start of bedrock branch commit fc032bf Author: Tommaso Barbugli <[email protected]> Date: Thu Oct 16 09:17:42 2025 +0200 Remove cli handler from examples (GetStream#101) commit 39a821d Author: Dan Gusev <[email protected]> Date: Tue Oct 14 12:20:41 2025 +0200 Update Deepgram plugin to use SDK v5.0.0 (GetStream#98) * Update Deepgram plugin to use SDK v5.0.0 * Merge test_realtime and test_stt and update the remaining tests * Make deepgram.STT.start() idempotent * Clean up unused import * Use uv as the default package manager > pip --------- Co-authored-by: Neevash Ramdial (Nash) <[email protected]> commit 2013be5 Author: Tommaso Barbugli <[email protected]> Date: Mon Oct 13 16:57:37 2025 +0200 ensure chat works with default types (GetStream#99)
Summary by CodeRabbit
Release Notes
New Features
Improvements