Skip to content

Conversation

@dangusev
Copy link
Contributor

@dangusev dangusev commented Nov 11, 2025

Summary by CodeRabbit

  • New Features

    • More robust real-time sessions with automatic reconnection and improved streaming resilience.
    • Asynchronous PNG conversion for video frames to enable smoother video sending.
  • Refactor

    • Public real-time API simplified and consolidated (unified realtime class, clearer connect/close/config lifecycle).
    • Background processing model introduced for incremental event handling and session lifecycle.
  • Tests

    • Tests updated to use the unified public API, simplified async usage, and expanded audio/video flow coverage.

@dangusev dangusev marked this pull request as draft November 11, 2025 16:12
@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Walkthrough

Refactors Gemini realtime into a new GeminiRealtime class with session lifecycle, reconnect and processing-task logic; moves frame-to-PNG conversion from utils.py to video_utils.py; adds module-level version loading and logging in utils; updates tests and public API imports/fixtures to match these changes.

Changes

Cohort / File(s) Summary
Core utils (version & logging)
agents-core/vision_agents/core/utils/utils.py
Add module-level logger and importlib.metadata-based version loading (_VISION_AGENTS_VERSION, _load_version()); switch to module logger in ensure_model; use asyncio.to_thread for markdown I/O in parse_instructions_async; remove frame_to_png_bytes.
Video utils (frame → PNG)
agents-core/vision_agents/core/utils/video_utils.py
Add frame_to_png_bytes(frame: av.VideoFrame) -> bytes using frame.to_image() or frame.to_ndarray() + PIL to produce PNG bytes.
Gemini plugin export
plugins/gemini/vision_agents/plugins/gemini/__init__.py
Re-export GeminiRealtime as Realtime (aliasing) while keeping __all__.
Gemini realtime implementation
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
Introduce GeminiRealtime with DEFAULT_CONFIG, session accessor and _exit_stack lifecycle, reconnect logic (_should_reconnect), processing-task model (_start_processing_task, _process_events, etc.), tool/function-call scaffolding, _send_video_frame using thread pool, get_config(), connect(), close(), and related refactors.
Tests — gemini realtime & function calling
plugins/gemini/tests/test_gemini_realtime.py, plugins/gemini/tests/test_realtime_function_calling.py
Tests updated to new module-level realtime fixture, call get_config() instead of private config helpers, adjust event field audio_datadata, remove old frame_to_png_bytes import/test, and drop some explicit asyncio markers.
Tests — utils
tests/test_utils.py
Update imports to use parse_instructions/Instructions from core.utils.utils and ensure_even_dimensions/frame_to_png_bytes from core.utils.video_utils; add async test verifying PNG header from frame_to_png_bytes.

Sequence Diagram(s)

sequenceDiagram
    participant Test
    participant GeminiRealtime
    participant LiveAPI
    participant VideoForwarder
    participant ThreadPool

    Test->>GeminiRealtime: get_config()
    GeminiRealtime-->>Test: LiveConnectConfigDict

    Test->>GeminiRealtime: connect()
    GeminiRealtime->>LiveAPI: open session (with resumption id?)
    LiveAPI-->>GeminiRealtime: session established
    GeminiRealtime->>GeminiRealtime: _start_processing_task()

    Test->>GeminiRealtime: watch_video_track(track)
    GeminiRealtime->>VideoForwarder: attach track / forward frames
    VideoForwarder->>GeminiRealtime: on_frame(frame)

    GeminiRealtime->>ThreadPool: convert frame -> PNG bytes
    ThreadPool-->>GeminiRealtime: png bytes
    GeminiRealtime->>LiveAPI: send image/png blob

    LiveAPI-->>GeminiRealtime: function_call event
    GeminiRealtime->>GeminiRealtime: _handle_function_call()
    GeminiRealtime->>LiveAPI: send FunctionResponse

    Test->>GeminiRealtime: close()
    GeminiRealtime->>GeminiRealtime: _stop_processing_task()
    GeminiRealtime->>LiveAPI: close session
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • Areas needing extra attention:
    • plugins/gemini/.../gemini_realtime.py — session lifecycle, reconnection policy, concurrency and processing-task interactions, tool/function-call handling.
    • agents-core/.../video_utils.py and _send_video_frame — thread-pool usage, PIL conversion edge cases, binary streaming.
    • agents-core/.../utils/utils.py — module-level version loading, logger changes, and async-to-thread I/O correctness.
    • Tests that changed fixtures/imports to ensure they match the new public API.

Possibly related PRs

Suggested labels

tests

Suggested reviewers

  • tschellenbach
  • d3xvn

Poem

The lens exhales a small black mouth,
A shuttered heart that learns to keep;
Frames congeal like fossils of the light,
I bury signatures in glass and weep —
The session hums, resumes, and does not sleep.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and overly broad. 'Gemini cleanup' does not convey specific information about what was actually changed in the codebase. Consider a more specific title that highlights the primary changes, such as 'Refactor GeminiRealtime with session management and tool integration' or 'Move frame_to_png_bytes to video_utils and rename Realtime to GeminiRealtime'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 80.56% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/gemini-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dangusev dangusev marked this pull request as ready for review November 12, 2025 22:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)

142-146: Consider documenting the ValueError in methods using _session.

The property raises ValueError when the session isn't established, but this isn't documented in methods like simple_response() or _send_video_frame() that access self._session. Consider either adding docstring notes or ensuring all call paths verify self.connected first.


171-198: Consider adding reconnection logic for consistency.

Unlike simple_response(), this method doesn't handle transient errors with reconnection. Consider adding similar try/except logic with _should_reconnect() to ensure consistent behavior across input methods.


421-432: Consider adding backoff to prevent rapid reconnection loops.

Lines 424-428 will immediately reconnect on transient errors without delay. If the error persists, this could create a tight reconnection loop. Consider adding exponential backoff or a delay before reconnecting.

Example pattern:

except websockets.ConnectionClosedError as e:
    if not _should_reconnect(e):
        raise e
    logger.warning("Connection closed, reconnecting after delay...")
    await asyncio.sleep(2)  # Add backoff
    await self.connect()
agents-core/vision_agents/core/utils/utils.py (1)

23-34: Reconsider caching working directory at module import time—best practice recommends against it.

CWD is process-global and can change via os.chdir(), so cached values become stale. While the base_dir parameter in parse_instructions and parse_instructions_async provides a mitigation, the default behavior still uses a stale cached value unless users explicitly pass base_dir. Best practice is to either use os.getcwd() or Path.cwd() at the point of use, or keep caching very short-lived with explicit invalidation. Consider storing _INITIAL_CWD as lazy initialization within the functions or documenting the limitation more prominently so users are aware the default relies on CWD captured at import time.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8eb5f3d and d2006e4.

📒 Files selected for processing (4)
  • agents-core/vision_agents/core/utils/utils.py (3 hunks)
  • plugins/gemini/tests/test_realtime_function_calling.py (6 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/__init__.py (1 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
plugins/gemini/vision_agents/plugins/gemini/__init__.py (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • GeminiRealtime (82-517)
plugins/gemini/tests/test_realtime_function_calling.py (2)
plugins/gemini/tests/test_gemini_realtime.py (1)
  • realtime (19-27)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • get_config (300-325)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6)
agents-core/vision_agents/core/llm/llm.py (3)
  • LLMResponseEvent (38-42)
  • _convert_tools_to_provider_format (122-136)
  • _run_one_tool (272-344)
agents-core/vision_agents/core/llm/llm_types.py (1)
  • ToolSchema (64-67)
agents-core/vision_agents/core/utils/utils.py (1)
  • frame_to_png_bytes (135-157)
agents-core/vision_agents/core/utils/video_forwarder.py (3)
  • VideoForwarder (24-147)
  • remove_frame_handler (76-92)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/llm/realtime.py (7)
  • Realtime (21-188)
  • connect (56-56)
  • _stop_watching_video_track (63-65)
  • close (166-167)
  • _emit_user_speech_transcription (169-178)
  • _emit_agent_speech_transcription (180-188)
  • _emit_audio_output_event (105-116)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
  • _convert_tools_to_provider_format (266-287)
agents-core/vision_agents/core/utils/utils.py (2)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
  • client (55-60)
plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
  • client (84-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (24)
plugins/gemini/vision_agents/plugins/gemini/__init__.py (1)

2-2: LGTM: Clean aliasing maintains backward compatibility.

The import change successfully exposes GeminiRealtime as Realtime, preserving the public API surface while adopting the refactored implementation.

plugins/gemini/tests/test_realtime_function_calling.py (3)

117-117: LGTM: Event field rename properly applied.

The change from event.audio_data to event.data correctly aligns with the updated RealtimeAudioOutputEvent interface.

Also applies to: 164-164, 215-215


244-261: LGTM: Test properly uses public get_config() API.

The test correctly validates tool configuration through the public interface.


263-271: LGTM: Test validates absence of tools in config.

Correctly verifies that the config excludes tools when none are registered.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (15)

1-46: LGTM: Import additions support the refactored implementation.

The new imports (contextlib, ThreadPoolExecutor, websockets, etc.) are all utilized in the subsequent code.


48-68: LGTM: Well-structured default configuration.

Centralizing the default config improves maintainability and provides sensible defaults for the Live API.


71-79: LGTM: Reconnection logic handles session timeouts.

The function correctly identifies transient errors (code 1011) that warrant reconnection.


108-141: LGTM: Constructor properly initializes client and state.

The conditional client creation and deep-copied config approach prevents unintended mutations. The ThreadPoolExecutor is appropriately scoped for async frame conversion.


161-169: LGTM: Reconnection logic handles transient failures.

The method appropriately reconnects on transient errors. Note that callers should check the exception field in the returned LLMResponseEvent to detect failures.


200-229: LGTM: Video forwarding properly manages handler lifecycle.

The method correctly removes old handlers before adding new ones and supports both dedicated and shared forwarders.


231-249: LGTM: Async frame conversion avoids blocking the event loop.

Converting frames to PNG in the executor prevents blocking. The exception handling could optionally include reconnection logic similar to simple_response(), but logging and continuing is reasonable for video frames.


251-253: LGTM: Cleanup method safely handles None forwarder.


255-276: LGTM: Connection method properly manages reconnection lifecycle.

Stopping the processing task before reconnecting prevents task conflicts, and the AsyncExitStack ensures proper session cleanup.


300-325: LGTM: Config assembly is non-mutating and comprehensive.

The method correctly builds the config with resumption, instructions, and tools without mutating the base config.


327-333: LGTM: Task lifecycle management is correct.

Awaiting the task after cancellation ensures proper cleanup.


335-411: LGTM: Comprehensive event processing with proper delegation.

The method handles all message types appropriately. Note that the return False on line 411 appears unused by the caller in _processing_loop().


437-459: LGTM: Tool conversion matches Gemini's expected format.

The implementation correctly transforms ToolSchema objects into the function declarations format required by Gemini Live API.


461-467: LGTM: Tool call delegation is straightforward.


469-517: LGTM: Function call handling is robust with proper error handling.

The method correctly executes tools via the existing infrastructure, formats responses appropriately, and handles errors gracefully with logging.

agents-core/vision_agents/core/utils/utils.py (5)

13-13: Module-level logger is a best practice.

Excellent change to use a module-level logger instead of instantiating loggers within functions. This improves performance and follows Python logging conventions.


58-91: LGTM!

The reformatting improves readability, and the use of cached _INITIAL_CWD with asyncio.to_thread for blocking I/O correctly avoids blocking the event loop. The fallback to the base_dir parameter provides flexibility when needed.


156-156: Consistent with module-level logger refactoring.

Good change to use the module-level logger, improving consistency and performance across the module.


170-213: LGTM!

The formatting adjustments enhance readability without altering behavior. The multi-line AsyncClient construction and improved spacing make the code easier to follow.


20-20: Python version requirement is satisfied.

The project's requires-python = ">=3.10" in agents-core/pyproject.toml confirms that the str | None union syntax is valid and compatible with the project's minimum supported Python version.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)

286-286: Guard against AttributeError if _video_forwarder is None.

Line 286 calls remove_frame_handler() without checking if _video_forwarder is None, which could raise an AttributeError.

Apply this diff:

-        await self._video_forwarder.remove_frame_handler(self._send_video_frame)
+        if self._video_forwarder is not None:
+            await self._video_forwarder.remove_frame_handler(self._send_video_frame)

288-288: Avoid blocking the event loop with synchronous executor shutdown.

self._executor.shutdown() is synchronous and will block the event loop.

Apply this diff:

-        self._executor.shutdown()
+        self._executor.shutdown(wait=False)
🧹 Nitpick comments (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

214-218: Simplify redundant handler removal.

Lines 216-218 contain duplicate logic: _stop_watching_video_track() is called, then the same handler removal is performed again. The conditional check on line 217-218 is redundant.

Apply this diff to remove the redundancy:

         # This method can be called multiple times with different forwarders
         # Remove handler from old forwarder if it exists
         await self._stop_watching_video_track()
-        if self._video_forwarder is not None:
-            await self._video_forwarder.remove_frame_handler(self._send_video_frame)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d2006e4 and aa16c49.

📒 Files selected for processing (5)
  • agents-core/vision_agents/core/utils/utils.py (3 hunks)
  • agents-core/vision_agents/core/utils/video_utils.py (2 hunks)
  • plugins/gemini/tests/test_gemini_realtime.py (1 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6 hunks)
  • tests/test_utils.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
agents-core/vision_agents/core/utils/utils.py (2)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
  • client (55-60)
plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
  • client (84-91)
plugins/gemini/tests/test_gemini_realtime.py (4)
plugins/openai/tests/test_openai_realtime.py (1)
  • realtime (20-29)
plugins/aws/tests/test_aws_realtime.py (1)
  • realtime (20-33)
agents-core/vision_agents/core/llm/realtime.py (2)
  • Realtime (21-188)
  • close (166-167)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • close (278-298)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (5)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (22-24)
agents-core/vision_agents/core/llm/llm.py (1)
  • LLMResponseEvent (38-42)
agents-core/vision_agents/core/llm/llm_types.py (1)
  • ToolSchema (64-67)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_png_bytes (67-85)
agents-core/vision_agents/core/utils/video_forwarder.py (3)
  • VideoForwarder (24-147)
  • remove_frame_handler (76-92)
  • add_frame_handler (48-74)
tests/test_utils.py (3)
agents-core/vision_agents/core/utils/utils.py (2)
  • Instructions (36-41)
  • parse_instructions (92-130)
agents-core/vision_agents/core/utils/video_utils.py (2)
  • ensure_even_dimensions (10-30)
  • frame_to_png_bytes (67-85)
conftest.py (3)
  • bunny_video_track (307-351)
  • recv (322-345)
  • recv (368-373)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (20)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (14)

50-68: LGTM!

The DEFAULT_CONFIG is well-structured with appropriate audio transcription settings, context window compression, and Leda voice configuration for Gemini Live API.


71-80: LGTM!

The reconnection logic correctly identifies transient WebSocket errors (code 1011 session timeout) that should trigger reconnection.


108-146: LGTM!

The initialization properly sets up the client, configuration deep-copying, and resource management. The _session property provides safe access with clear error handling.


148-169: LGTM!

The simple_response method correctly implements reconnection logic for transient errors and properly handles exceptions with logging.


171-199: LGTM!

Audio handling properly resamples PCM data to the required 16kHz format and creates the correct Blob structure for Gemini Live API.


231-249: LGTM!

Video frame handling correctly offloads PNG conversion to a thread pool executor to avoid blocking the event loop, and properly handles exceptions.


251-253: LGTM!

The method correctly checks for None before removing the frame handler.


255-276: LGTM!

The connect method properly manages the session lifecycle, using an exit stack for context management and correctly sequencing the processing task startup.


300-325: LGTM!

The get_config method correctly builds the configuration with session resumption, instructions, and tool declarations.


327-333: LGTM!

Task lifecycle management is straightforward and correct with proper cancellation and awaiting.


335-411: LGTM!

Event processing correctly handles input/output transcriptions, model responses, and tool calls. The logic properly extracts and emits different event types.


413-435: LGTM!

The processing loop correctly implements reconnection on transient WebSocket errors and handles cancellation gracefully for clean shutdown.


437-459: LGTM!

Tool conversion correctly transforms ToolSchema objects into Gemini Live API format with proper function declarations.


461-517: LGTM!

Function call handling correctly extracts tool calls, executes them with timeout, and sends responses back to the Gemini Live API with appropriate error handling.

agents-core/vision_agents/core/utils/video_utils.py (1)

67-85: LGTM!

The frame_to_png_bytes function correctly handles frame conversion with proper fallback logic for frames without the to_image method.

tests/test_utils.py (1)

612-623: LGTM!

The test properly validates frame_to_png_bytes functionality with a real video frame, including verification of the PNG signature.

agents-core/vision_agents/core/utils/utils.py (3)

1-32: LGTM!

Module-level version loading and caching is well-implemented with proper exception handling for cases where the package isn't installed.


56-89: LGTM!

The async version correctly uses asyncio.to_thread to offload blocking file I/O to a thread pool, following best practices for async code.


143-186: LGTM!

The ensure_model function correctly implements async file download with streaming, proper thread pool usage for file writing, and appropriate error handling with cleanup.

plugins/gemini/tests/test_gemini_realtime.py (1)

26-95: LGTM!

The test class correctly uses the module-level fixture and covers key integration scenarios. Note that line 94 tests the private method _stop_watching_video_track, which is acceptable for integration testing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

181-209: Audio resampling is correct, but lacks error handling.

The method properly resamples audio to Gemini's expected format. However, unlike simple_response, this method doesn't handle exceptions from send_realtime_input on line 208, which could result in unhandled errors propagating to callers.

Consider adding consistent error handling:

         blob = Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
 
-        await self._session.send_realtime_input(audio=blob)
+        try:
+            await self._session.send_realtime_input(audio=blob)
+        except Exception as e:
+            if _should_reconnect(e):
+                await self.connect()
+            logger.exception("Failed to send audio to Gemini")
🧹 Nitpick comments (2)
plugins/gemini/tests/test_gemini_realtime.py (1)

53-95: Good test coverage for audio and video flows.

Both tests appropriately verify media handling. The integration test structure is sound, though the explicit call to the private _stop_watching_video_track() method on line 94 exposes internal implementation details.

Consider adding a public method for stopping video if this is a common operation, or document why direct access to the private method is necessary here.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

226-229: Potentially redundant handler removal.

Line 226 calls _stop_watching_video_track(), which already removes the frame handler (see lines 262-263). Lines 227-228 then perform the same removal again. This appears redundant.

Consider removing lines 227-228 or clarifying why the double removal is necessary:

         await self._stop_watching_video_track()
-        if self._video_forwarder is not None:
-            await self._video_forwarder.remove_frame_handler(self._send_video_frame)
 
         self._video_forwarder = shared_forwarder or VideoForwarder(
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between aa16c49 and 5f6f6ae.

📒 Files selected for processing (2)
  • plugins/gemini/tests/test_gemini_realtime.py (1 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/gemini/tests/test_gemini_realtime.py (3)
plugins/openai/tests/test_openai_realtime.py (1)
  • realtime (20-29)
plugins/aws/tests/test_aws_realtime.py (1)
  • realtime (20-33)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • close (288-309)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (6)
agents-core/vision_agents/core/edge/types.py (2)
  • Participant (22-24)
  • close (34-35)
agents-core/vision_agents/core/llm/events.py (1)
  • LLMResponseChunkEvent (87-102)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • _run_one_tool (272-344)
agents-core/vision_agents/core/llm/llm_types.py (1)
  • ToolSchema (64-67)
agents-core/vision_agents/core/utils/video_forwarder.py (3)
  • VideoForwarder (24-147)
  • remove_frame_handler (76-92)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_png_bytes (67-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (15)
plugins/gemini/tests/test_gemini_realtime.py (3)

14-23: LGTM! Fixture correctly defined.

The fixture is now properly defined without the self parameter. The lifecycle management (yield and cleanup in finally) is appropriate for an async pytest fixture.


26-27: Clear test class documentation.

The docstring accurately describes the test suite's purpose.


30-51: Test flow is well-structured.

The test correctly uses the fixture, establishes connection, sends a message, and verifies audio events are received.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (12)

1-44: Imports align with refactoring needs.

The additions support the new session lifecycle, async processing, and frame conversion logic.


50-68: Well-defined default configuration.

The default config covers essential realtime communication settings with reasonable values.


71-90: Reconnection logic is appropriately scoped.

The helper correctly identifies transient websocket failures that warrant reconnection attempts.


118-157: Initialization and session management are sound.

The client setup handles multiple scenarios appropriately, and the deep copy of DEFAULT_CONFIG prevents unintended mutations. The _session property provides clear feedback if accessed before connection.


241-260: Frame conversion properly offloaded to executor.

The use of run_in_executor for PNG conversion prevents blocking the event loop. The exception handling logs errors without propagating them, which is reasonable for video frames where occasional drops are acceptable.


261-264: Proper None-check in place.

The method correctly guards against AttributeError by checking if _video_forwarder is not None before calling remove_frame_handler.


265-287: Connection lifecycle is well-managed.

The method properly stops any existing processing task before reconnecting and uses an async context stack for clean resource management. Starting the processing task after establishing the connection ensures event handling begins at the right time.


288-310: Resource cleanup properly sequenced.

The close method follows an appropriate teardown order. The use of shutdown(wait=False) on line 299 prevents blocking the event loop, addressing the previous review concern.


311-336: Configuration assembly is clear and defensive.

The method correctly copies the base config to prevent mutations and conditionally adds runtime parameters.


338-345: Task lifecycle management is straightforward.

Both methods handle processing task creation and cancellation appropriately.


402-412: Verify thought filtering is intentional.

Line 402 filters out text when part.thought is True, presumably to exclude internal reasoning from being emitted as agent speech. Confirm this aligns with the desired behavior for Gemini's thought/reasoning capabilities.


448-528: Tool integration is well-implemented.

The tool conversion, function call extraction, execution via _run_one_tool, and response sending back to Gemini all follow appropriate patterns. Error handling ensures failures are reported back to the session.

@tschellenbach tschellenbach merged commit 637fd95 into main Nov 12, 2025
6 checks passed
@tschellenbach tschellenbach deleted the chore/gemini-cleanup branch November 12, 2025 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants