-
Notifications
You must be signed in to change notification settings - Fork 105
Gemini cleanup #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemini cleanup #168
Conversation
WalkthroughRefactors Gemini realtime into a new GeminiRealtime class with session lifecycle, reconnect and processing-task logic; moves frame-to-PNG conversion from utils.py to video_utils.py; adds module-level version loading and logging in utils; updates tests and public API imports/fixtures to match these changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant GeminiRealtime
participant LiveAPI
participant VideoForwarder
participant ThreadPool
Test->>GeminiRealtime: get_config()
GeminiRealtime-->>Test: LiveConnectConfigDict
Test->>GeminiRealtime: connect()
GeminiRealtime->>LiveAPI: open session (with resumption id?)
LiveAPI-->>GeminiRealtime: session established
GeminiRealtime->>GeminiRealtime: _start_processing_task()
Test->>GeminiRealtime: watch_video_track(track)
GeminiRealtime->>VideoForwarder: attach track / forward frames
VideoForwarder->>GeminiRealtime: on_frame(frame)
GeminiRealtime->>ThreadPool: convert frame -> PNG bytes
ThreadPool-->>GeminiRealtime: png bytes
GeminiRealtime->>LiveAPI: send image/png blob
LiveAPI-->>GeminiRealtime: function_call event
GeminiRealtime->>GeminiRealtime: _handle_function_call()
GeminiRealtime->>LiveAPI: send FunctionResponse
Test->>GeminiRealtime: close()
GeminiRealtime->>GeminiRealtime: _stop_processing_task()
GeminiRealtime->>LiveAPI: close session
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4e76438 to
06b708d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)
142-146: Consider documenting the ValueError in methods using_session.The property raises
ValueErrorwhen the session isn't established, but this isn't documented in methods likesimple_response()or_send_video_frame()that accessself._session. Consider either adding docstring notes or ensuring all call paths verifyself.connectedfirst.
171-198: Consider adding reconnection logic for consistency.Unlike
simple_response(), this method doesn't handle transient errors with reconnection. Consider adding similar try/except logic with_should_reconnect()to ensure consistent behavior across input methods.
421-432: Consider adding backoff to prevent rapid reconnection loops.Lines 424-428 will immediately reconnect on transient errors without delay. If the error persists, this could create a tight reconnection loop. Consider adding exponential backoff or a delay before reconnecting.
Example pattern:
except websockets.ConnectionClosedError as e: if not _should_reconnect(e): raise e logger.warning("Connection closed, reconnecting after delay...") await asyncio.sleep(2) # Add backoff await self.connect()agents-core/vision_agents/core/utils/utils.py (1)
23-34: Reconsider caching working directory at module import time—best practice recommends against it.CWD is process-global and can change via
os.chdir(), so cached values become stale. While thebase_dirparameter inparse_instructionsandparse_instructions_asyncprovides a mitigation, the default behavior still uses a stale cached value unless users explicitly passbase_dir. Best practice is to either useos.getcwd()orPath.cwd()at the point of use, or keep caching very short-lived with explicit invalidation. Consider storing_INITIAL_CWDas lazy initialization within the functions or documenting the limitation more prominently so users are aware the default relies on CWD captured at import time.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
agents-core/vision_agents/core/utils/utils.py(3 hunks)plugins/gemini/tests/test_realtime_function_calling.py(6 hunks)plugins/gemini/vision_agents/plugins/gemini/__init__.py(1 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/gemini/vision_agents/plugins/gemini/__init__.py (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
GeminiRealtime(82-517)
plugins/gemini/tests/test_realtime_function_calling.py (2)
plugins/gemini/tests/test_gemini_realtime.py (1)
realtime(19-27)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
get_config(300-325)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6)
agents-core/vision_agents/core/llm/llm.py (3)
LLMResponseEvent(38-42)_convert_tools_to_provider_format(122-136)_run_one_tool(272-344)agents-core/vision_agents/core/llm/llm_types.py (1)
ToolSchema(64-67)agents-core/vision_agents/core/utils/utils.py (1)
frame_to_png_bytes(135-157)agents-core/vision_agents/core/utils/video_forwarder.py (3)
VideoForwarder(24-147)remove_frame_handler(76-92)add_frame_handler(48-74)agents-core/vision_agents/core/llm/realtime.py (7)
Realtime(21-188)connect(56-56)_stop_watching_video_track(63-65)close(166-167)_emit_user_speech_transcription(169-178)_emit_agent_speech_transcription(180-188)_emit_audio_output_event(105-116)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
_convert_tools_to_provider_format(266-287)
agents-core/vision_agents/core/utils/utils.py (2)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
client(55-60)plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
client(84-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (24)
plugins/gemini/vision_agents/plugins/gemini/__init__.py (1)
2-2: LGTM: Clean aliasing maintains backward compatibility.The import change successfully exposes
GeminiRealtimeasRealtime, preserving the public API surface while adopting the refactored implementation.plugins/gemini/tests/test_realtime_function_calling.py (3)
117-117: LGTM: Event field rename properly applied.The change from
event.audio_datatoevent.datacorrectly aligns with the updatedRealtimeAudioOutputEventinterface.Also applies to: 164-164, 215-215
244-261: LGTM: Test properly uses publicget_config()API.The test correctly validates tool configuration through the public interface.
263-271: LGTM: Test validates absence of tools in config.Correctly verifies that the config excludes tools when none are registered.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (15)
1-46: LGTM: Import additions support the refactored implementation.The new imports (contextlib, ThreadPoolExecutor, websockets, etc.) are all utilized in the subsequent code.
48-68: LGTM: Well-structured default configuration.Centralizing the default config improves maintainability and provides sensible defaults for the Live API.
71-79: LGTM: Reconnection logic handles session timeouts.The function correctly identifies transient errors (code 1011) that warrant reconnection.
108-141: LGTM: Constructor properly initializes client and state.The conditional client creation and deep-copied config approach prevents unintended mutations. The ThreadPoolExecutor is appropriately scoped for async frame conversion.
161-169: LGTM: Reconnection logic handles transient failures.The method appropriately reconnects on transient errors. Note that callers should check the
exceptionfield in the returnedLLMResponseEventto detect failures.
200-229: LGTM: Video forwarding properly manages handler lifecycle.The method correctly removes old handlers before adding new ones and supports both dedicated and shared forwarders.
231-249: LGTM: Async frame conversion avoids blocking the event loop.Converting frames to PNG in the executor prevents blocking. The exception handling could optionally include reconnection logic similar to
simple_response(), but logging and continuing is reasonable for video frames.
251-253: LGTM: Cleanup method safely handles None forwarder.
255-276: LGTM: Connection method properly manages reconnection lifecycle.Stopping the processing task before reconnecting prevents task conflicts, and the AsyncExitStack ensures proper session cleanup.
300-325: LGTM: Config assembly is non-mutating and comprehensive.The method correctly builds the config with resumption, instructions, and tools without mutating the base config.
327-333: LGTM: Task lifecycle management is correct.Awaiting the task after cancellation ensures proper cleanup.
335-411: LGTM: Comprehensive event processing with proper delegation.The method handles all message types appropriately. Note that the
return Falseon line 411 appears unused by the caller in_processing_loop().
437-459: LGTM: Tool conversion matches Gemini's expected format.The implementation correctly transforms
ToolSchemaobjects into the function declarations format required by Gemini Live API.
461-467: LGTM: Tool call delegation is straightforward.
469-517: LGTM: Function call handling is robust with proper error handling.The method correctly executes tools via the existing infrastructure, formats responses appropriately, and handles errors gracefully with logging.
agents-core/vision_agents/core/utils/utils.py (5)
13-13: Module-level logger is a best practice.Excellent change to use a module-level logger instead of instantiating loggers within functions. This improves performance and follows Python logging conventions.
58-91: LGTM!The reformatting improves readability, and the use of cached
_INITIAL_CWDwithasyncio.to_threadfor blocking I/O correctly avoids blocking the event loop. The fallback to thebase_dirparameter provides flexibility when needed.
156-156: Consistent with module-level logger refactoring.Good change to use the module-level logger, improving consistency and performance across the module.
170-213: LGTM!The formatting adjustments enhance readability without altering behavior. The multi-line
AsyncClientconstruction and improved spacing make the code easier to follow.
20-20: Python version requirement is satisfied.The project's
requires-python = ">=3.10"in agents-core/pyproject.toml confirms that thestr | Noneunion syntax is valid and compatible with the project's minimum supported Python version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)
286-286: Guard against AttributeError if _video_forwarder is None.Line 286 calls
remove_frame_handler()without checking if_video_forwarderis None, which could raise an AttributeError.Apply this diff:
- await self._video_forwarder.remove_frame_handler(self._send_video_frame) + if self._video_forwarder is not None: + await self._video_forwarder.remove_frame_handler(self._send_video_frame)
288-288: Avoid blocking the event loop with synchronous executor shutdown.
self._executor.shutdown()is synchronous and will block the event loop.Apply this diff:
- self._executor.shutdown() + self._executor.shutdown(wait=False)
🧹 Nitpick comments (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
214-218: Simplify redundant handler removal.Lines 216-218 contain duplicate logic:
_stop_watching_video_track()is called, then the same handler removal is performed again. The conditional check on line 217-218 is redundant.Apply this diff to remove the redundancy:
# This method can be called multiple times with different forwarders # Remove handler from old forwarder if it exists await self._stop_watching_video_track() - if self._video_forwarder is not None: - await self._video_forwarder.remove_frame_handler(self._send_video_frame)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
agents-core/vision_agents/core/utils/utils.py(3 hunks)agents-core/vision_agents/core/utils/video_utils.py(2 hunks)plugins/gemini/tests/test_gemini_realtime.py(1 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(6 hunks)tests/test_utils.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
agents-core/vision_agents/core/utils/utils.py (2)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
client(55-60)plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
client(84-91)
plugins/gemini/tests/test_gemini_realtime.py (4)
plugins/openai/tests/test_openai_realtime.py (1)
realtime(20-29)plugins/aws/tests/test_aws_realtime.py (1)
realtime(20-33)agents-core/vision_agents/core/llm/realtime.py (2)
Realtime(21-188)close(166-167)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
close(278-298)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (5)
agents-core/vision_agents/core/edge/types.py (1)
Participant(22-24)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(38-42)agents-core/vision_agents/core/llm/llm_types.py (1)
ToolSchema(64-67)agents-core/vision_agents/core/utils/video_utils.py (1)
frame_to_png_bytes(67-85)agents-core/vision_agents/core/utils/video_forwarder.py (3)
VideoForwarder(24-147)remove_frame_handler(76-92)add_frame_handler(48-74)
tests/test_utils.py (3)
agents-core/vision_agents/core/utils/utils.py (2)
Instructions(36-41)parse_instructions(92-130)agents-core/vision_agents/core/utils/video_utils.py (2)
ensure_even_dimensions(10-30)frame_to_png_bytes(67-85)conftest.py (3)
bunny_video_track(307-351)recv(322-345)recv(368-373)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (20)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (14)
50-68: LGTM!The DEFAULT_CONFIG is well-structured with appropriate audio transcription settings, context window compression, and Leda voice configuration for Gemini Live API.
71-80: LGTM!The reconnection logic correctly identifies transient WebSocket errors (code 1011 session timeout) that should trigger reconnection.
108-146: LGTM!The initialization properly sets up the client, configuration deep-copying, and resource management. The
_sessionproperty provides safe access with clear error handling.
148-169: LGTM!The simple_response method correctly implements reconnection logic for transient errors and properly handles exceptions with logging.
171-199: LGTM!Audio handling properly resamples PCM data to the required 16kHz format and creates the correct Blob structure for Gemini Live API.
231-249: LGTM!Video frame handling correctly offloads PNG conversion to a thread pool executor to avoid blocking the event loop, and properly handles exceptions.
251-253: LGTM!The method correctly checks for None before removing the frame handler.
255-276: LGTM!The connect method properly manages the session lifecycle, using an exit stack for context management and correctly sequencing the processing task startup.
300-325: LGTM!The get_config method correctly builds the configuration with session resumption, instructions, and tool declarations.
327-333: LGTM!Task lifecycle management is straightforward and correct with proper cancellation and awaiting.
335-411: LGTM!Event processing correctly handles input/output transcriptions, model responses, and tool calls. The logic properly extracts and emits different event types.
413-435: LGTM!The processing loop correctly implements reconnection on transient WebSocket errors and handles cancellation gracefully for clean shutdown.
437-459: LGTM!Tool conversion correctly transforms ToolSchema objects into Gemini Live API format with proper function declarations.
461-517: LGTM!Function call handling correctly extracts tool calls, executes them with timeout, and sends responses back to the Gemini Live API with appropriate error handling.
agents-core/vision_agents/core/utils/video_utils.py (1)
67-85: LGTM!The frame_to_png_bytes function correctly handles frame conversion with proper fallback logic for frames without the to_image method.
tests/test_utils.py (1)
612-623: LGTM!The test properly validates frame_to_png_bytes functionality with a real video frame, including verification of the PNG signature.
agents-core/vision_agents/core/utils/utils.py (3)
1-32: LGTM!Module-level version loading and caching is well-implemented with proper exception handling for cases where the package isn't installed.
56-89: LGTM!The async version correctly uses asyncio.to_thread to offload blocking file I/O to a thread pool, following best practices for async code.
143-186: LGTM!The ensure_model function correctly implements async file download with streaming, proper thread pool usage for file writing, and appropriate error handling with cleanup.
plugins/gemini/tests/test_gemini_realtime.py (1)
26-95: LGTM!The test class correctly uses the module-level fixture and covers key integration scenarios. Note that line 94 tests the private method
_stop_watching_video_track, which is acceptable for integration testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
181-209: Audio resampling is correct, but lacks error handling.The method properly resamples audio to Gemini's expected format. However, unlike
simple_response, this method doesn't handle exceptions fromsend_realtime_inputon line 208, which could result in unhandled errors propagating to callers.Consider adding consistent error handling:
blob = Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000") - await self._session.send_realtime_input(audio=blob) + try: + await self._session.send_realtime_input(audio=blob) + except Exception as e: + if _should_reconnect(e): + await self.connect() + logger.exception("Failed to send audio to Gemini")
🧹 Nitpick comments (2)
plugins/gemini/tests/test_gemini_realtime.py (1)
53-95: Good test coverage for audio and video flows.Both tests appropriately verify media handling. The integration test structure is sound, though the explicit call to the private
_stop_watching_video_track()method on line 94 exposes internal implementation details.Consider adding a public method for stopping video if this is a common operation, or document why direct access to the private method is necessary here.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
226-229: Potentially redundant handler removal.Line 226 calls
_stop_watching_video_track(), which already removes the frame handler (see lines 262-263). Lines 227-228 then perform the same removal again. This appears redundant.Consider removing lines 227-228 or clarifying why the double removal is necessary:
await self._stop_watching_video_track() - if self._video_forwarder is not None: - await self._video_forwarder.remove_frame_handler(self._send_video_frame) self._video_forwarder = shared_forwarder or VideoForwarder(
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
plugins/gemini/tests/test_gemini_realtime.py(1 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/gemini/tests/test_gemini_realtime.py (3)
plugins/openai/tests/test_openai_realtime.py (1)
realtime(20-29)plugins/aws/tests/test_aws_realtime.py (1)
realtime(20-33)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
close(288-309)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6)
agents-core/vision_agents/core/edge/types.py (2)
Participant(22-24)close(34-35)agents-core/vision_agents/core/llm/events.py (1)
LLMResponseChunkEvent(87-102)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)_run_one_tool(272-344)agents-core/vision_agents/core/llm/llm_types.py (1)
ToolSchema(64-67)agents-core/vision_agents/core/utils/video_forwarder.py (3)
VideoForwarder(24-147)remove_frame_handler(76-92)add_frame_handler(48-74)agents-core/vision_agents/core/utils/video_utils.py (1)
frame_to_png_bytes(67-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (15)
plugins/gemini/tests/test_gemini_realtime.py (3)
14-23: LGTM! Fixture correctly defined.The fixture is now properly defined without the
selfparameter. The lifecycle management (yield and cleanup in finally) is appropriate for an async pytest fixture.
26-27: Clear test class documentation.The docstring accurately describes the test suite's purpose.
30-51: Test flow is well-structured.The test correctly uses the fixture, establishes connection, sends a message, and verifies audio events are received.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (12)
1-44: Imports align with refactoring needs.The additions support the new session lifecycle, async processing, and frame conversion logic.
50-68: Well-defined default configuration.The default config covers essential realtime communication settings with reasonable values.
71-90: Reconnection logic is appropriately scoped.The helper correctly identifies transient websocket failures that warrant reconnection attempts.
118-157: Initialization and session management are sound.The client setup handles multiple scenarios appropriately, and the deep copy of
DEFAULT_CONFIGprevents unintended mutations. The_sessionproperty provides clear feedback if accessed before connection.
241-260: Frame conversion properly offloaded to executor.The use of
run_in_executorfor PNG conversion prevents blocking the event loop. The exception handling logs errors without propagating them, which is reasonable for video frames where occasional drops are acceptable.
261-264: Proper None-check in place.The method correctly guards against
AttributeErrorby checking if_video_forwarderis not None before callingremove_frame_handler.
265-287: Connection lifecycle is well-managed.The method properly stops any existing processing task before reconnecting and uses an async context stack for clean resource management. Starting the processing task after establishing the connection ensures event handling begins at the right time.
288-310: Resource cleanup properly sequenced.The
closemethod follows an appropriate teardown order. The use ofshutdown(wait=False)on line 299 prevents blocking the event loop, addressing the previous review concern.
311-336: Configuration assembly is clear and defensive.The method correctly copies the base config to prevent mutations and conditionally adds runtime parameters.
338-345: Task lifecycle management is straightforward.Both methods handle processing task creation and cancellation appropriately.
402-412: Verify thought filtering is intentional.Line 402 filters out text when
part.thoughtis True, presumably to exclude internal reasoning from being emitted as agent speech. Confirm this aligns with the desired behavior for Gemini's thought/reasoning capabilities.
448-528: Tool integration is well-implemented.The tool conversion, function call extraction, execution via
_run_one_tool, and response sending back to Gemini all follow appropriate patterns. Error handling ensures failures are reported back to the session.
Summary by CodeRabbit
New Features
Refactor
Tests