Fix Gemini audio #165

tbarbugli · 2025-11-10T22:32:02Z

Move output track management outside plugin, move to agent class
Edge transport returns the output track
Stream output track buffers bytes and handles queue in millis instead of pcmdata entries

Summary by CodeRabbit

New Features
- Enhanced realtime audio handling with dedicated event routing and improved audio track management.
Bug Fixes
- Improved error handling in agent processing with better exception logging.
- Fixed audio track state management during conversation turns.
Chores
- Updated getstream dependency to version >=2.5.14.

coderabbitai · 2025-11-10T22:32:11Z

Walkthrough

This PR refactors audio output handling by removing the output_audio_track method from LLM implementations in favor of event-based forwarding via RealtimeAudioOutputEvent. Additionally, it converts synchronous close() methods to asynchronous across edge transport abstractions and implementations, updates dependencies, and adds flush capability to audio track protocols.

Changes

Cohort / File(s)	Summary
Dependency Updates `agents-core/pyproject.toml`	Bumped getstream[webrtc,telemetry] from >=2.5.11 to >=2.5.14 and switched local development source to path-based editable install.
Audio Output Refactoring (Core Abstractions) `agents-core/vision_agents/core/llm/llm.py`	Removed `AudioStreamTrack` import and deleted the abstract `output_audio_track()` method from `AudioLLM` class.
Audio Output Protocol Enhancement `agents-core/vision_agents/core/edge/types.py`	Added asynchronous `flush()` method to `OutputAudioTrack` protocol.
Audio Output Refactoring (Plugin Implementations) `plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py`	Removed `_output_audio_track` initialization, dropped `output_audio_track` property accessor, and eliminated raw PCM writes to the track; audio output now relies on event emission mechanisms.
Close Method Async Conversion (Abstractions) `agents-core/vision_agents/core/edge/edge_transport.py`	Changed `close()` method from synchronous to asynchronous abstract method in `EdgeTransport` class.
Close Method Async Conversion (Implementations) `plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py`	Converted `close()` method from synchronous to asynchronous in `StreamEdge` class (body remains a no-op).
Audio Event Forwarding & Turn Management `agents-core/vision_agents/core/agents/agents.py`	Integrated `RealtimeAudioOutputEvent` as new exported event type; added STT end event subscription; hardened error handling in `_apply` with try/except; flushes audio track on turn start; reworked `_prepare_rtc` to create and manage local outbound audio track and forward realtime audio events directly to it via new `forward_audio` handler; refined multiline formatting and condition checks.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant LLM as LLM<br/>(Gemini Realtime)
    participant EventBus as Event Bus
    participant Transport as Edge Transport<br/>(Audio Track)

    Agent->>LLM: Initialize realtime connection
    Note over LLM: No output_audio_track created
    
    LLM->>EventBus: Emit RealtimeAudioOutputEvent<br/>(PCM data)
    
    EventBus->>Agent: Broadcast RealtimeAudioOutputEvent
    
    Agent->>Transport: forward_audio handler<br/>writes to outbound track
    
    Transport->>Transport: Flush audio track on turn start
    
    Note over Agent,Transport: New event-based flow<br/>(replaces direct track writes)
    
    Agent->>Transport: await close()
    Note over Transport: Async close replaces sync close

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Areas requiring extra attention:

Breaking interface changes to AudioLLM and EdgeTransport abstractions—verify all implementations comply with new signatures
New event forwarding logic in agents.py _prepare_rtc and forward_audio handler—ensure realtime audio flows correctly to outbound track
Async close() conversion implications—confirm all call sites properly await the method and handle async teardown
Integration between RealtimeAudioOutputEvent emission and track writes—verify no audio loss or duplication in the new pipeline
gemini_realtime.py changes—validate that removing direct track writes doesn't break audio streaming behavior

Possibly related PRs

Bugfix gemini #150: Adjusts PCM sample rate handling and logging in gemini_realtime.py audio output path
WIP - Vogent + New Smart TURN + Audio utils usage #128: Modifies transport close() async conversion and realtime audio/track handling in edge layer
fix: Agent Example and TURN detection #70: Implements per-user transcript accumulation and turn-event handling in agents.py with overlapping STT event subscriptions

Suggested labels

plugin-getstream, core-agents

Suggested reviewers

dangusev
maxkahan
tschellenbach

Poem

The track dissolves—no hollow chamber waits—
Events ascend like moths through amber air,
Async jaws close. The plumbing bleeds and gates
Interface ghosts. We flush what wasn't there.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Fix Gemini audio' is vague and generic, using non-descriptive language that doesn't convey the substantial architectural changes in this changeset.	Consider a more descriptive title that captures the main change, such as 'Move audio track management from plugin to agent layer' or 'Refactor Gemini audio output handling to edge transport'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gemini-audio

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
350-352: Consider whether StreamEdge.close() needs cleanup logic.

The async conversion is correct, but the no-op body may be incomplete. StreamConnection.close() at line 47 calls await self._connection.leave(), suggesting there might be connection-level cleanup that should happen here as well.

If StreamEdge holds resources (event subscriptions, conversation state, track maps) that should be cleaned up, consider adding cleanup logic:
 async def close(self):
-    # Note: Not calling super().close() as it's an abstract method with trivial body
-    pass
+    # Clean up event subscriptions
+    if hasattr(self, 'events'):
+        # Unsubscribe event handlers if needed
+        pass
+    
+    # Clear track maps
+    if hasattr(self, '_track_map'):
+        self._track_map.clear()
+    if hasattr(self, '_pending_tracks'):
+        self._pending_tracks.clear()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f5d37ea and 52bb898.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

agents-core/pyproject.toml (2 hunks)
agents-core/vision_agents/core/agents/agents.py (11 hunks)
agents-core/vision_agents/core/edge/edge_transport.py (1 hunks)
agents-core/vision_agents/core/edge/types.py (1 hunks)
agents-core/vision_agents/core/llm/llm.py (1 hunks)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1 hunks)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)

agents-core/vision_agents/core/agents/agents.py (1)

close (661-729)

agents-core/vision_agents/core/edge/edge_transport.py (1)

close (38-39)

agents-core/vision_agents/core/edge/types.py (1)

close (34-35)

agents-core/vision_agents/core/agents/agents.py (6)

agents-core/vision_agents/core/llm/events.py (1)

RealtimeAudioOutputEvent (37-42)

agents-core/vision_agents/core/events/manager.py (2)

send (435-479)

subscribe (301-373)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

agents-core/vision_agents/core/edge/types.py (2)

flush (49-49)

write (45-45)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

create_audio_track (314-319)

agents-core/vision_agents/core/edge/edge_transport.py (1)

create_audio_track (34-35)

agents-core/vision_agents/core/edge/edge_transport.py (4)

agents-core/vision_agents/core/agents/agents.py (1)

close (661-729)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

close (47-48)

close (350-352)

agents-core/vision_agents/core/edge/types.py (1)

close (34-35)

agents-core/vision_agents/core/tts/tts.py (1)

close (304-314)

agents-core/vision_agents/core/edge/types.py (1)

agents-core/vision_agents/core/vad/vad.py (1)

flush (375-382)

🔇 Additional comments (16)

agents-core/vision_agents/core/agents/agents.py (13)

34-34: LGTM!

Adding RealtimeAudioOutputEvent import to support the new event-based audio forwarding architecture.

230-232: LGTM!

Reformatting the error message across multiple lines improves readability without changing functionality.

257-260: LGTM!

Adding a dedicated STT event subscription for TurnEndedEvent improves organization by separating STT-specific event handling.

328-332: LGTM!

Multiline formatting of the TurnEndedEvent improves readability. Logic remains unchanged.

641-646: LGTM! Improved error handling.

Wrapping _apply method calls in try/except prevents a single plugin failure from cascading and allows the agent to continue gracefully. The exception logging includes the subclass name for better diagnostics.

891-894: LGTM!

Multiline formatting of the participant check improves readability.

1059-1060: LGTM! Flush on turn start prevents audio overlap.

When a user interrupts (turn starts), flushing the audio track clears any buffered agent speech, enabling proper barge-in behavior.

1095-1103: LGTM!

Reformatted conditional and logging for readability. Logic unchanged.

1115-1115: LGTM!

Using event.eager_end_of_turn to determine turn completion is clearer and aligns with the event structure.

1118-1120: LGTM!

Multiline formatting of task creation improves readability.

1127-1130: LGTM!

Multiline formatting of logging improves readability.

1140-1142: LGTM!

Multiline formatting of the turn detection check improves readability. Logic unchanged.

1275-1284: LGTM! Core architectural improvement.

This refactor unifies audio output through event-based forwarding:

Always creates a local audio track from the edge transport (no conditional logic)

Subscribes to RealtimeAudioOutputEvent to forward realtime audio data

Removes LLM-specific output track management, moving responsibility to the agent

This aligns with the PR objective of moving output track management outside plugins and into the agent class. The event-based approach provides better separation of concerns.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

105-109: LGTM! Architectural alignment.

Removing the AudioStreamTrack output initialization aligns with the PR's core objective: moving output track management from the plugin to the agent class. Audio output now flows through RealtimeAudioOutputEvent (emitted at line 307) instead of direct track writes.

agents-core/vision_agents/core/llm/llm.py (1)

27-27: No orphaned references detected. Changes are safe to merge.

The removal of AudioStreamTrack import and output_audio_track() method from the base AudioLLM class is clean. Plugin-specific implementations (AWS Realtime, OpenAI Realtime) maintain their own output_audio_track properties independently, which is appropriate architecture. No code in agents-core or elsewhere attempts to call the base class method.

agents-core/vision_agents/core/edge/edge_transport.py (1)

38-39: Verification confirmed—breaking change is correctly implemented.

All EdgeTransport subclasses already have async close() implementations, and all call sites properly await. The single concrete subclass (StreamEdge) complies, Connection.close() is async, and the change aligns with the broader async teardown pattern (Agent, TTS, VAD, STT, Realtime all follow the same pattern). No synchronous close() calls detected on edge/transport objects. Safe to merge.

agents-core/pyproject.toml

agents-core/vision_agents/core/edge/types.py

tbarbugli added 2 commits November 10, 2025 23:27

use new track

bd43f94

revert example changes

64ce230

github-actions bot added dependencies agents-core plugins config labels Nov 10, 2025

tbarbugli added 3 commits November 11, 2025 17:04

fix edge close err

11270c3

bump stream-py

cb604be

Merge branch 'main' into gemini-audio

52bb898

tbarbugli marked this pull request as ready for review November 11, 2025 21:05

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

agents-core/pyproject.toml Show resolved Hide resolved

agents-core/pyproject.toml Show resolved Hide resolved

agents-core/vision_agents/core/edge/types.py Show resolved Hide resolved

tbarbugli merged commit ddc1433 into main Nov 11, 2025
6 checks passed

tbarbugli deleted the gemini-audio branch November 11, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Gemini audio #165

Fix Gemini audio #165

Uh oh!

tbarbugli commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Gemini audio #165

Fix Gemini audio #165

Uh oh!

Conversation

tbarbugli commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tbarbugli commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading