-
Notifications
You must be signed in to change notification settings - Fork 114
Fix Gemini audio #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Gemini audio #165
Conversation
WalkthroughThis PR refactors audio output handling by removing the Changes
Sequence DiagramsequenceDiagram
participant Agent
participant LLM as LLM<br/>(Gemini Realtime)
participant EventBus as Event Bus
participant Transport as Edge Transport<br/>(Audio Track)
Agent->>LLM: Initialize realtime connection
Note over LLM: No output_audio_track created
LLM->>EventBus: Emit RealtimeAudioOutputEvent<br/>(PCM data)
EventBus->>Agent: Broadcast RealtimeAudioOutputEvent
Agent->>Transport: forward_audio handler<br/>writes to outbound track
Transport->>Transport: Flush audio track on turn start
Note over Agent,Transport: New event-based flow<br/>(replaces direct track writes)
Agent->>Transport: await close()
Note over Transport: Async close replaces sync close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
350-352: Consider whether StreamEdge.close() needs cleanup logic.The async conversion is correct, but the no-op body may be incomplete.
StreamConnection.close()at line 47 callsawait self._connection.leave(), suggesting there might be connection-level cleanup that should happen here as well.If
StreamEdgeholds resources (event subscriptions, conversation state, track maps) that should be cleaned up, consider adding cleanup logic:async def close(self): - # Note: Not calling super().close() as it's an abstract method with trivial body - pass + # Clean up event subscriptions + if hasattr(self, 'events'): + # Unsubscribe event handlers if needed + pass + + # Clear track maps + if hasattr(self, '_track_map'): + self._track_map.clear() + if hasattr(self, '_pending_tracks'): + self._pending_tracks.clear()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
agents-core/pyproject.toml(2 hunks)agents-core/vision_agents/core/agents/agents.py(11 hunks)agents-core/vision_agents/core/edge/edge_transport.py(1 hunks)agents-core/vision_agents/core/edge/types.py(1 hunks)agents-core/vision_agents/core/llm/llm.py(1 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(1 hunks)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
agents-core/vision_agents/core/agents/agents.py (1)
close(661-729)agents-core/vision_agents/core/edge/edge_transport.py (1)
close(38-39)agents-core/vision_agents/core/edge/types.py (1)
close(34-35)
agents-core/vision_agents/core/agents/agents.py (6)
agents-core/vision_agents/core/llm/events.py (1)
RealtimeAudioOutputEvent(37-42)agents-core/vision_agents/core/events/manager.py (2)
send(435-479)subscribe(301-373)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)agents-core/vision_agents/core/edge/types.py (2)
flush(49-49)write(45-45)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
create_audio_track(314-319)agents-core/vision_agents/core/edge/edge_transport.py (1)
create_audio_track(34-35)
agents-core/vision_agents/core/edge/edge_transport.py (4)
agents-core/vision_agents/core/agents/agents.py (1)
close(661-729)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
close(47-48)close(350-352)agents-core/vision_agents/core/edge/types.py (1)
close(34-35)agents-core/vision_agents/core/tts/tts.py (1)
close(304-314)
agents-core/vision_agents/core/edge/types.py (1)
agents-core/vision_agents/core/vad/vad.py (1)
flush(375-382)
🔇 Additional comments (16)
agents-core/vision_agents/core/agents/agents.py (13)
34-34: LGTM!Adding
RealtimeAudioOutputEventimport to support the new event-based audio forwarding architecture.
230-232: LGTM!Reformatting the error message across multiple lines improves readability without changing functionality.
257-260: LGTM!Adding a dedicated STT event subscription for
TurnEndedEventimproves organization by separating STT-specific event handling.
328-332: LGTM!Multiline formatting of the
TurnEndedEventimproves readability. Logic remains unchanged.
641-646: LGTM! Improved error handling.Wrapping
_applymethod calls in try/except prevents a single plugin failure from cascading and allows the agent to continue gracefully. The exception logging includes the subclass name for better diagnostics.
891-894: LGTM!Multiline formatting of the participant check improves readability.
1059-1060: LGTM! Flush on turn start prevents audio overlap.When a user interrupts (turn starts), flushing the audio track clears any buffered agent speech, enabling proper barge-in behavior.
1095-1103: LGTM!Reformatted conditional and logging for readability. Logic unchanged.
1115-1115: LGTM!Using
event.eager_end_of_turnto determine turn completion is clearer and aligns with the event structure.
1118-1120: LGTM!Multiline formatting of task creation improves readability.
1127-1130: LGTM!Multiline formatting of logging improves readability.
1140-1142: LGTM!Multiline formatting of the turn detection check improves readability. Logic unchanged.
1275-1284: LGTM! Core architectural improvement.This refactor unifies audio output through event-based forwarding:
- Always creates a local audio track from the edge transport (no conditional logic)
- Subscribes to
RealtimeAudioOutputEventto forward realtime audio data- Removes LLM-specific output track management, moving responsibility to the agent
This aligns with the PR objective of moving output track management outside plugins and into the agent class. The event-based approach provides better separation of concerns.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
105-109: LGTM! Architectural alignment.Removing the
AudioStreamTrackoutput initialization aligns with the PR's core objective: moving output track management from the plugin to the agent class. Audio output now flows throughRealtimeAudioOutputEvent(emitted at line 307) instead of direct track writes.agents-core/vision_agents/core/llm/llm.py (1)
27-27: No orphaned references detected. Changes are safe to merge.The removal of
AudioStreamTrackimport andoutput_audio_track()method from the baseAudioLLMclass is clean. Plugin-specific implementations (AWS Realtime, OpenAI Realtime) maintain their ownoutput_audio_trackproperties independently, which is appropriate architecture. No code in agents-core or elsewhere attempts to call the base class method.agents-core/vision_agents/core/edge/edge_transport.py (1)
38-39: Verification confirmed—breaking change is correctly implemented.All EdgeTransport subclasses already have async close() implementations, and all call sites properly await. The single concrete subclass (StreamEdge) complies, Connection.close() is async, and the change aligns with the broader async teardown pattern (Agent, TTS, VAD, STT, Realtime all follow the same pattern). No synchronous close() calls detected on edge/transport objects. Safe to merge.
Summary by CodeRabbit
New Features
Bug Fixes
Chores