Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Nov 10, 2025

  • Move output track management outside plugin, move to agent class
  • Edge transport returns the output track
  • Stream output track buffers bytes and handles queue in millis instead of pcmdata entries

Summary by CodeRabbit

  • New Features

    • Enhanced realtime audio handling with dedicated event routing and improved audio track management.
  • Bug Fixes

    • Improved error handling in agent processing with better exception logging.
    • Fixed audio track state management during conversation turns.
  • Chores

    • Updated getstream dependency to version >=2.5.14.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

This PR refactors audio output handling by removing the output_audio_track method from LLM implementations in favor of event-based forwarding via RealtimeAudioOutputEvent. Additionally, it converts synchronous close() methods to asynchronous across edge transport abstractions and implementations, updates dependencies, and adds flush capability to audio track protocols.

Changes

Cohort / File(s) Summary
Dependency Updates
agents-core/pyproject.toml
Bumped getstream[webrtc,telemetry] from >=2.5.11 to >=2.5.14 and switched local development source to path-based editable install.
Audio Output Refactoring (Core Abstractions)
agents-core/vision_agents/core/llm/llm.py
Removed AudioStreamTrack import and deleted the abstract output_audio_track() method from AudioLLM class.
Audio Output Protocol Enhancement
agents-core/vision_agents/core/edge/types.py
Added asynchronous flush() method to OutputAudioTrack protocol.
Audio Output Refactoring (Plugin Implementations)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
Removed _output_audio_track initialization, dropped output_audio_track property accessor, and eliminated raw PCM writes to the track; audio output now relies on event emission mechanisms.
Close Method Async Conversion (Abstractions)
agents-core/vision_agents/core/edge/edge_transport.py
Changed close() method from synchronous to asynchronous abstract method in EdgeTransport class.
Close Method Async Conversion (Implementations)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
Converted close() method from synchronous to asynchronous in StreamEdge class (body remains a no-op).
Audio Event Forwarding & Turn Management
agents-core/vision_agents/core/agents/agents.py
Integrated RealtimeAudioOutputEvent as new exported event type; added STT end event subscription; hardened error handling in _apply with try/except; flushes audio track on turn start; reworked _prepare_rtc to create and manage local outbound audio track and forward realtime audio events directly to it via new forward_audio handler; refined multiline formatting and condition checks.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant LLM as LLM<br/>(Gemini Realtime)
    participant EventBus as Event Bus
    participant Transport as Edge Transport<br/>(Audio Track)

    Agent->>LLM: Initialize realtime connection
    Note over LLM: No output_audio_track created
    
    LLM->>EventBus: Emit RealtimeAudioOutputEvent<br/>(PCM data)
    
    EventBus->>Agent: Broadcast RealtimeAudioOutputEvent
    
    Agent->>Transport: forward_audio handler<br/>writes to outbound track
    
    Transport->>Transport: Flush audio track on turn start
    
    Note over Agent,Transport: New event-based flow<br/>(replaces direct track writes)
    
    Agent->>Transport: await close()
    Note over Transport: Async close replaces sync close
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Areas requiring extra attention:

  • Breaking interface changes to AudioLLM and EdgeTransport abstractions—verify all implementations comply with new signatures
  • New event forwarding logic in agents.py _prepare_rtc and forward_audio handler—ensure realtime audio flows correctly to outbound track
  • Async close() conversion implications—confirm all call sites properly await the method and handle async teardown
  • Integration between RealtimeAudioOutputEvent emission and track writes—verify no audio loss or duplication in the new pipeline
  • gemini_realtime.py changes—validate that removing direct track writes doesn't break audio streaming behavior

Possibly related PRs

Suggested labels

plugin-getstream, core-agents

Suggested reviewers

  • dangusev
  • maxkahan
  • tschellenbach

Poem

The track dissolves—no hollow chamber waits—
Events ascend like moths through amber air,
Async jaws close. The plumbing bleeds and gates
Interface ghosts. We flush what wasn't there.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix Gemini audio' is vague and generic, using non-descriptive language that doesn't convey the substantial architectural changes in this changeset. Consider a more descriptive title that captures the main change, such as 'Move audio track management from plugin to agent layer' or 'Refactor Gemini audio output handling to edge transport'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gemini-audio

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tbarbugli tbarbugli marked this pull request as ready for review November 11, 2025 21:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

350-352: Consider whether StreamEdge.close() needs cleanup logic.

The async conversion is correct, but the no-op body may be incomplete. StreamConnection.close() at line 47 calls await self._connection.leave(), suggesting there might be connection-level cleanup that should happen here as well.

If StreamEdge holds resources (event subscriptions, conversation state, track maps) that should be cleaned up, consider adding cleanup logic:

 async def close(self):
-    # Note: Not calling super().close() as it's an abstract method with trivial body
-    pass
+    # Clean up event subscriptions
+    if hasattr(self, 'events'):
+        # Unsubscribe event handlers if needed
+        pass
+    
+    # Clear track maps
+    if hasattr(self, '_track_map'):
+        self._track_map.clear()
+    if hasattr(self, '_pending_tracks'):
+        self._pending_tracks.clear()
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f5d37ea and 52bb898.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • agents-core/pyproject.toml (2 hunks)
  • agents-core/vision_agents/core/agents/agents.py (11 hunks)
  • agents-core/vision_agents/core/edge/edge_transport.py (1 hunks)
  • agents-core/vision_agents/core/edge/types.py (1 hunks)
  • agents-core/vision_agents/core/llm/llm.py (1 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1 hunks)
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
agents-core/vision_agents/core/agents/agents.py (1)
  • close (661-729)
agents-core/vision_agents/core/edge/edge_transport.py (1)
  • close (38-39)
agents-core/vision_agents/core/edge/types.py (1)
  • close (34-35)
agents-core/vision_agents/core/agents/agents.py (6)
agents-core/vision_agents/core/llm/events.py (1)
  • RealtimeAudioOutputEvent (37-42)
agents-core/vision_agents/core/events/manager.py (2)
  • send (435-479)
  • subscribe (301-373)
agents-core/vision_agents/core/turn_detection/events.py (1)
  • TurnEndedEvent (29-46)
agents-core/vision_agents/core/edge/types.py (2)
  • flush (49-49)
  • write (45-45)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
  • create_audio_track (314-319)
agents-core/vision_agents/core/edge/edge_transport.py (1)
  • create_audio_track (34-35)
agents-core/vision_agents/core/edge/edge_transport.py (4)
agents-core/vision_agents/core/agents/agents.py (1)
  • close (661-729)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
  • close (47-48)
  • close (350-352)
agents-core/vision_agents/core/edge/types.py (1)
  • close (34-35)
agents-core/vision_agents/core/tts/tts.py (1)
  • close (304-314)
agents-core/vision_agents/core/edge/types.py (1)
agents-core/vision_agents/core/vad/vad.py (1)
  • flush (375-382)
🔇 Additional comments (16)
agents-core/vision_agents/core/agents/agents.py (13)

34-34: LGTM!

Adding RealtimeAudioOutputEvent import to support the new event-based audio forwarding architecture.


230-232: LGTM!

Reformatting the error message across multiple lines improves readability without changing functionality.


257-260: LGTM!

Adding a dedicated STT event subscription for TurnEndedEvent improves organization by separating STT-specific event handling.


328-332: LGTM!

Multiline formatting of the TurnEndedEvent improves readability. Logic remains unchanged.


641-646: LGTM! Improved error handling.

Wrapping _apply method calls in try/except prevents a single plugin failure from cascading and allows the agent to continue gracefully. The exception logging includes the subclass name for better diagnostics.


891-894: LGTM!

Multiline formatting of the participant check improves readability.


1059-1060: LGTM! Flush on turn start prevents audio overlap.

When a user interrupts (turn starts), flushing the audio track clears any buffered agent speech, enabling proper barge-in behavior.


1095-1103: LGTM!

Reformatted conditional and logging for readability. Logic unchanged.


1115-1115: LGTM!

Using event.eager_end_of_turn to determine turn completion is clearer and aligns with the event structure.


1118-1120: LGTM!

Multiline formatting of task creation improves readability.


1127-1130: LGTM!

Multiline formatting of logging improves readability.


1140-1142: LGTM!

Multiline formatting of the turn detection check improves readability. Logic unchanged.


1275-1284: LGTM! Core architectural improvement.

This refactor unifies audio output through event-based forwarding:

  1. Always creates a local audio track from the edge transport (no conditional logic)
  2. Subscribes to RealtimeAudioOutputEvent to forward realtime audio data
  3. Removes LLM-specific output track management, moving responsibility to the agent

This aligns with the PR objective of moving output track management outside plugins and into the agent class. The event-based approach provides better separation of concerns.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

105-109: LGTM! Architectural alignment.

Removing the AudioStreamTrack output initialization aligns with the PR's core objective: moving output track management from the plugin to the agent class. Audio output now flows through RealtimeAudioOutputEvent (emitted at line 307) instead of direct track writes.

agents-core/vision_agents/core/llm/llm.py (1)

27-27: No orphaned references detected. Changes are safe to merge.

The removal of AudioStreamTrack import and output_audio_track() method from the base AudioLLM class is clean. Plugin-specific implementations (AWS Realtime, OpenAI Realtime) maintain their own output_audio_track properties independently, which is appropriate architecture. No code in agents-core or elsewhere attempts to call the base class method.

agents-core/vision_agents/core/edge/edge_transport.py (1)

38-39: Verification confirmed—breaking change is correctly implemented.

All EdgeTransport subclasses already have async close() implementations, and all call sites properly await. The single concrete subclass (StreamEdge) complies, Connection.close() is async, and the change aligns with the broader async teardown pattern (Agent, TTS, VAD, STT, Realtime all follow the same pattern). No synchronous close() calls detected on edge/transport objects. Safe to merge.

@tbarbugli tbarbugli merged commit ddc1433 into main Nov 11, 2025
6 checks passed
@tbarbugli tbarbugli deleted the gemini-audio branch November 11, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants