Add `openai.chat_completions` package to support OSS models #156

dangusev · 2025-11-07T12:40:02Z

What's changed

New Features
- OpenAI Chat Completions LLM and Vision LLM plugins for Chat Completions API integration.
  - openai.ChatCompletionsVLM buffers video frames, converts to JPEG, and streams responses via OpenAI-compatible client;
- Qwen3-VL video understanding example
- Public set_conversation() method for improved conversation handling
- Video frame JPEG encoding utility
Documentation
- Qwen3-VL example documentation with setup, configuration, and troubleshooting guide

Tests:

Update Anthropic, AWS, Gemini, and OpenRouter tests to use set_conversation instead of direct _conversation assignment.

Summary by CodeRabbit

New Features
- Added OpenAI ChatCompletions integration supporting both text-based and vision language models.
- Introduced Qwen3-VL model example with streaming video capabilities.
- Enhanced video frame encoding with optimized JPEG conversion utility.
Documentation
- Added comprehensive guide for Qwen3-VL Baseten plugin integration with setup and troubleshooting instructions.
Tests
- Expanded test coverage for ChatCompletions implementations and conversation management.

coderabbitai · 2025-11-07T12:40:18Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Moves conversation and instruction state to LLM instance fields, adds a public set_conversation(conversation) API, wires agent.join to call the setter, adds OpenAI ChatCompletions LLM/VLM plugins (streaming + vision), introduces frame_to_jpeg_bytes utility, and updates tests/examples to use the new setter.

Changes

Cohort / File(s)	Change Summary
LLM Core API `agents-core/vision_agents/core/llm/llm.py`	Import `Instructions`; move conversation/instruction state to instance fields (`instructions`, `parsed_instructions`, `_conversation`); add `set_conversation(self, conversation)`; update `_set_instructions()` to parse/store instructions.
Agent Integration `agents-core/vision_agents/core/agents/agents.py`	After creating conversation in join flow, call `self.llm.set_conversation(self.conversation)`.
Plugin Tests (API usage) `plugins/anthropic/tests/test_anthropic_llm.py`, `plugins/aws/tests/test_aws.py`, `plugins/gemini/tests/test_gemini_llm.py`, `plugins/openrouter/tests/test_openrouter_llm.py`	Replace direct `llm._conversation = ...` assignments with `llm.set_conversation(...)`; minor formatting tweaks.
Video Utilities `agents-core/vision_agents/core/utils/video_utils.py`	Add `frame_to_jpeg_bytes(frame, target_width, target_height, quality=85)` to convert/rescale an `av.VideoFrame` to optimized JPEG bytes (uses PIL LANCZOS resampling).
OpenAI Chat Completions LLM `plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py`, `plugins/openai/vision_agents/plugins/openai/__init__.py`	Add `ChatCompletionsLLM` (streaming chat-completions integration) and export it via plugin `__all__`.
OpenAI Chat Completions VLM `plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py`, `plugins/openai/vision_agents/plugins/openai/__init__.py`	Add `ChatCompletionsVLM` with frame buffering, JPEG encoding, video forwarder handling, and streaming response events; export via plugin `__all__`.
OpenAI Tests `plugins/openai/tests/test_chat_completions.py`	Add tests and stubs (`AsyncStreamStub`, `VideoStreamTrackStub`) for ChatCompletionsLLM and ChatCompletionsVLM covering streaming success and failure.
OpenAI Example (Qwen3-VL) `plugins/openai/examples/qwen_vl_example/*`	Add example project (README, pyproject.toml, example script) demonstrating Qwen3-VL Baseten integration with streaming video processing.
Docs `plugins/openai/README.md`	Update import path to `vision_agents.plugins.openai.Realtime` (one usage name may remain inconsistent).

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant LLM
    participant Conversation
    participant ExternalModel

    Agent->>Agent: create/join call
    Agent->>Conversation: create conversation
    Conversation-->>Agent: conversation

    Note over Agent,LLM: Provide conversation via public API
    Agent->>LLM: set_conversation(conversation)
    LLM-->>LLM: store conversation & parsed instructions

    Agent->>LLM: simple_response(text) / VLM request
    LLM->>Conversation: read history (if present)
    alt VLM includes frames
        LLM->>LLM: _get_frames_bytes -> JPEG/base64 frames
        LLM->>ExternalModel: stream/request (includes frames)
    else LLM only
        LLM->>ExternalModel: stream/request
    end
    ExternalModel-->>LLM: streaming chunks / final
    LLM->>Agent: emit LLMResponseChunkEvent / LLMResponseCompletedEvent

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas to inspect closely:

Agent join wiring and any duplication/race when calling set_conversation.
ChatCompletionsVLM: frame buffering, resizing/encoding, and FPS/frame window calculations.
LLM instruction lifecycle: parsing, None-safety for parsed_instructions, and backward compatibility.
New tests/stubs: ensure AsyncStreamStub and VideoStreamTrackStub emulate real streaming behavior.

Possibly related PRs

[AI-192] - Bedrock, AWS & Nova #104 — Related modifications to llm.py instruction parsing and attachment flow.
New conversation API #102 — Related Conversation API / agent join wiring and set_conversation usage.

Suggested labels

core-agents, plugin-openai, tests

Suggested reviewers

yarikdevcom
Nash0x7E2

Poem

The room holds the conversation like a butchered light,
instructions folded into a cool, obedient fist.
Frames are cropped moons, scaled smooth and sealed in gloss,
a buffered breath sent down the pipe to a distant mouth.
I set the talk in place — and watch the quiet return.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.55% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately reflects the main objective: adding OpenAI Chat Completions support for open-source models, which aligns with the Baseten integration work and new ChatCompletionsLLM/ChatCompletionsVLM plugins introduced in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/baseten

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 290849e and 16c53c8.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

.env.example (1 hunks)
agents-core/vision_agents/core/llm/llm.py (2 hunks)
plugins/baseten/README.md (1 hunks)
plugins/baseten/pyproject.toml (1 hunks)
plugins/baseten/vision_agents/plugins/baseten/__init__.py (1 hunks)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
plugins/baseten/vision_agents/plugins/baseten/events.py (1 hunks)
pyproject.toml (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/baseten/vision_agents/plugins/baseten/__init__.py
plugins/baseten/vision_agents/plugins/baseten/events.py
agents-core/vision_agents/core/llm/llm.py
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py

🧬 Code graph analysis (4)

plugins/baseten/vision_agents/plugins/baseten/__init__.py (1)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

BasetenVLM (32-274)

plugins/baseten/vision_agents/plugins/baseten/events.py (1)

agents-core/vision_agents/core/events/base.py (1)

PluginBaseEvent (52-54)

agents-core/vision_agents/core/llm/llm.py (1)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)

agents-core/vision_agents/core/llm/llm.py (3)

LLMResponseEvent (38-42)

VideoLLM (443-464)

_conversation (83-86)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-43)

plugins/baseten/vision_agents/plugins/baseten/events.py (1)

LLMErrorEvent (7-12)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

agents-core/vision_agents/core/llm/llm.py

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

89-158: Critical: User text is never sent to the model.

The text parameter containing the new user prompt is never added to the messages payload. The model only receives conversation history and frames, but cannot respond to the new input. This is a correctness bug that breaks the core functionality.

Apply this diff to include the user text:

-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )

🧹 Nitpick comments (5)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
86-87: Consider making frame dimensions configurable.

The frame dimensions are hardcoded to 800x600. Different models or use cases might benefit from different resolutions. Consider adding frame_width and frame_height as constructor parameters.

Apply this diff to add configurable dimensions:
     def __init__(
         self,
         model: str,
         api_key: Optional[str] = None,
         base_url: Optional[str] = None,
         fps: int = 1,
         frame_buffer_seconds: int = 10,
+        frame_width: int = 800,
+        frame_height: int = 600,
         client: Optional[AsyncOpenAI] = None,
     ):
Then update the initialization:
-        self._frame_width = 800
-        self._frame_height = 600
+        self._frame_width = frame_width
+        self._frame_height = frame_height
92-93: Unused parameter: processors.

The processors parameter is declared but never used in the method. Either utilize it or remove it from the signature.

110-110: Address or remove TODO comment.

The TODO comment references _build_enhanced_instructions, but this method is not present or used. Clarify the intended implementation or remove the comment.

129-129: Consider limiting conversation history size.

The TODO comment raises a valid concern about message volume. Sending unbounded conversation history could lead to token limit errors or increased latency. Consider implementing a sliding window or token-based truncation strategy.

276-308: Well-implemented frame conversion utility.

The function correctly handles aspect ratio preservation and uses appropriate resampling quality (LANCZOS). The TODO comment about moving to core utils is valid—this utility could benefit other plugins.

Would you like me to open an issue to track moving this utility to a shared location?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9361a8 and 93cf015.

📒 Files selected for processing (1)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py

🧬 Code graph analysis (1)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-43)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

plugins/baseten/vision_agents/plugins/baseten/events.py (1)

LLMErrorEvent (7-12)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (2)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (2)

1-28: LGTM!

Imports are well-organized and all appear necessary for the implementation. No sys.path modifications present, adhering to coding guidelines.

263-273: LGTM!

The method correctly iterates over buffered frames and converts them to JPEG bytes. Implementation is clean and well-documented.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (3)
31-39: Complete the TODO in the class docstring.

The docstring still contains a TODO placeholder. Please add a brief description of the class purpose (e.g., "A video language model backed by Baseten-hosted models that processes video frames alongside text prompts"), document key parameters, and provide usage guidance.

88-157: CRITICAL: User prompt is never sent to the model.

The text parameter (Line 90) is never added to the messages payload. Only video frames are included in the final user message (Lines 152-157). This is a correctness bug that breaks the core functionality—the model cannot respond to the user's actual question.

Apply this diff to fix:
         # Attach the latest bufferred frames to the request
-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )
247-257: Fix redundant condition and avoid starting an already-running forwarder.

The condition if not shared_forwarder: followed by shared_forwarder or VideoForwarder(...) contains dead code—the shared_forwarder or part can never be reached. Additionally, calling await self._video_forwarder.start() when shared_forwarder is provided may attempt to start an already-running forwarder.

Apply this diff:
         logger.info("🎥 BasetenVLM subscribing to VideoForwarder")
-        if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+        if shared_forwarder is None:
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name="baseten_vlm_forwarder",
             )
             await self._video_forwarder.start()
         else:
             self._video_forwarder = shared_forwarder

🧹 Nitpick comments (1)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

70-73: Enhance credential error messages.

The error messages for missing credentials could be more helpful by mentioning the environment variable names.

Apply this diff:

         elif not api_key:
-            raise ValueError("api_key must be provided")
+            raise ValueError("api_key must be provided or set via BASETEN_API_KEY environment variable")
         elif not base_url:
-            raise ValueError("base_url must be provided")
+            raise ValueError("base_url must be provided or set via BASETEN_BASE_URL environment variable")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 029852f and 2851eeb.

📒 Files selected for processing (4)

plugins/baseten/example/README.md (1 hunks)
plugins/baseten/example/pyproject.toml (1 hunks)
plugins/baseten/example/qwen_vl_example.py (1 hunks)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

plugins/baseten/example/README.md

🧰 Additional context used

🧬 Code graph analysis (2)

plugins/baseten/example/qwen_vl_example.py (4)

agents-core/vision_agents/core/agents/agents.py (3)

Agent (125-1357)

create_call (698-703)

finish (558-591)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

AgentLauncher (18-117)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

simple_response (88-223)

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-43)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

plugins/baseten/vision_agents/plugins/baseten/events.py (1)

LLMErrorEvent (7-12)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (10)

plugins/baseten/example/pyproject.toml (2)

1-13: LGTM!

Project metadata and dependencies align correctly with the example script's imports.

15-20: LGTM!

Workspace source configuration correctly references all plugin dependencies.

plugins/baseten/example/qwen_vl_example.py (4)

1-11: LGTM!

Imports and environment loading are correct.

14-28: LGTM!

Agent construction correctly wires Baseten VLM with edge, STT, TTS, and appropriate instructions.

30-43: Verify that the critical bug in baseten_vlm.py was addressed.

The join flow is correctly structured. However, Line 38 calls agent.simple_response("Describe what you currently see"), which relies on BasetenVLM's simple_response method. Past review comments identified a critical bug where the text parameter is never added to the messages payload (lines 88-157 in baseten_vlm.py), meaning the prompt won't reach the model. Please ensure this bug was fixed before merging.

45-46: LGTM!

CLI entry point correctly wires the AgentLauncher.

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (4)

1-28: LGTM!

Imports and constants are correctly defined.

159-223: LGTM!

Streaming response logic correctly handles API calls, error reporting via LLMErrorEvent, and emits appropriate chunk and completion events.

262-272: LGTM!

The frame iterator correctly processes buffered frames.

275-307: LGTM!

Frame-to-JPEG conversion correctly maintains aspect ratio and uses appropriate resampling. The TODO comment about moving to core utils is a valid future refactoring consideration.

…it once.

… joins the call

coderabbitai

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

plugins/openai/README.md (1)
19-31: Align the class name in the quickstart snippet
The snippet now imports Realtime, but it still instantiates OpenAIRealtime, which no longer exists under that import path. Please update the example so the constructor matches the imported symbol; otherwise, readers will copy an import/class combination that raises NameError.
-from vision_agents.plugins.openai import Realtime
-
-# Initialize with API key
-sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy")
+from vision_agents.plugins.openai import Realtime
+
+# Initialize with API key
+sts = Realtime(api_key="your_openai_api_key", voice="alloy")
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
324-335: Reset duplicate guard per LLM response
Hi! Once _all_sent_texts learns a sentence it never forgets, so every identical sentence in future responses is dropped (no HeyGen speech) and the set grows without bound in long sessions. A fresh response should get a clean slate.
         if item_id != self._current_response_id:
             if self._text_buffer:
                 text_to_send = self._text_buffer.strip()
                 if text_to_send and text_to_send not in self._all_sent_texts:
                     await self._send_text_to_heygen(text_to_send)
                     self._all_sent_texts.add(text_to_send)
             self._text_buffer = ""
             self._current_response_id = item_id
+            self._all_sent_texts.clear()

🧹 Nitpick comments (6)

examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py (1)
51-51: Minor: Log message timing could be more precise.

The message "is now joining" suggests an action in progress, but at this point the agent has already completed joining (the await agent.join(call) has resolved). Consider "Agent has joined the call" or "Agent joined the call successfully" for clarity.

Apply this diff to improve clarity:
-        logger.info("Agent is now joining the call")
+        logger.info("Agent has joined the call")
agents-core/vision_agents/core/utils/video_utils.py (2)
32-34: Validate JPEG quality parameter.

The quality parameter lacks bounds checking. JPEG quality should typically be in the range 1-100. Invalid values may cause unexpected behavior or errors during encoding.

Consider adding validation:
 def frame_to_jpeg_bytes(
     frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85
 ) -> bytes:
     """
     Convert a video frame to JPEG bytes with resizing.
 
     Args:
         frame: an instance of `av.VideoFrame`.
         target_width: target width in pixels.
         target_height: target height in pixels.
-        quality: JPEG quality. Default is 85.
+        quality: JPEG quality (1-100). Default is 85.
 
     Returns: frame as JPEG bytes.
 
     """
+    if not 1 <= quality <= 100:
+        raise ValueError("JPEG quality must be between 1 and 100")
+    
     # Convert frame to a PIL image
     img = frame.to_image()
Also applies to: 42-42, 62-62

50-58: Consider whether upscaling is intended behavior.

The current implementation will upscale images when the source dimensions are smaller than the target dimensions (scale > 1). Upscaling can degrade image quality and may not be the intended behavior for a video frame processing utility. Consider clamping the scale factor to prevent upscaling:
     # Calculate scale factor (fit within target dimensions)
     scale = min(target_width / src_width, target_height / src_height)
+    # Optional: prevent upscaling by clamping scale to 1.0
+    scale = min(scale, 1.0)
+    
     new_width = int(src_width * scale)
     new_height = int(src_height * scale)
If upscaling is intentional, consider documenting this behavior in the docstring.
plugins/openai/vision_agents/plugins/openai/__init__.py (1)
4-7: Export ChatCompletionsVLM alongside the LLM variant
You import ChatCompletionsVLM, but it’s missing from __all__, so from vision_agents.plugins.openai import * (used in docs/examples) won’t pick it up. Please add it to the export list for consistency with the other public classes.
-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"]
+__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]
plugins/openai/examples/qwen_vl_example/README.md (1)

4-4: Clarify video processing direction.

The phrase "accepts text and video and responds with text vocalised" could mislead readers into thinking users send video to the agent. Based on the example code, the agent processes video frames internally and sends them to the VLM—users interact via voice/text only.

Consider revising to: "The model processes video frames from the call and responds with text vocalized with the TTS service of your choice."
agents-core/vision_agents/core/cli/cli_runner.py (1)
181-184: Consider using a more robust pattern for capability detection.

The nested hasattr checks work but are somewhat fragile. If the edge interface is expected to have open_demo_for_agent, consider using a protocol or abstract base class to make this contract explicit.

For example, you could define a protocol:
from typing import Protocol

class DemoCapableEdge(Protocol):
    async def open_demo_for_agent(self, agent: "Agent", call_type: str, call_id: str) -> str:
        ...
Then use isinstance checking or type narrowing instead of hasattr.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2851eeb and 4f85895.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (71)

.cursor/rules/python.mdc (1 hunks)
README.md (1 hunks)
agents-core/pyproject.toml (3 hunks)
agents-core/vision_agents/core/agents/agent_launcher.py (1 hunks)
agents-core/vision_agents/core/agents/agent_options.py (1 hunks)
agents-core/vision_agents/core/agents/agents.py (17 hunks)
agents-core/vision_agents/core/cli/cli_runner.py (2 hunks)
agents-core/vision_agents/core/processors/base_processor.py (1 hunks)
agents-core/vision_agents/core/utils/audio_queue.py (1 hunks)
agents-core/vision_agents/core/utils/video_forwarder.py (1 hunks)
agents-core/vision_agents/core/utils/video_queue.py (1 hunks)
agents-core/vision_agents/core/utils/video_track.py (2 hunks)
agents-core/vision_agents/core/utils/video_utils.py (1 hunks)
examples/01_simple_agent_example/README.md (1 hunks)
examples/01_simple_agent_example/simple_agent_example.py (3 hunks)
examples/02_golf_coach_example/golf_coach_example.py (0 hunks)
examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py (0 hunks)
examples/other_examples/09_github_mcp_demo/github_mcp_demo.py (0 hunks)
examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py (0 hunks)
examples/other_examples/gemini_live_realtime/gemini_live_example.py (0 hunks)
examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py (1 hunks)
examples/other_examples/plugins_examples/audio_moderation/main.py (0 hunks)
examples/other_examples/plugins_examples/mcp/main.py (0 hunks)
examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py (0 hunks)
examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py (0 hunks)
examples/other_examples/plugins_examples/tts_cartesia/main.py (0 hunks)
examples/other_examples/plugins_examples/tts_elevenlabs/main.py (0 hunks)
examples/other_examples/plugins_examples/tts_kokoro/main.py (0 hunks)
examples/other_examples/plugins_examples/vad_silero/main.py (0 hunks)
examples/other_examples/plugins_examples/video_moderation/main.py (0 hunks)
examples/other_examples/plugins_examples/wizper_stt_translate/main.py (0 hunks)
plugins/aws/example/aws_llm_function_calling_example.py (0 hunks)
plugins/aws/example/aws_qwen_example.py (0 hunks)
plugins/aws/example/aws_realtime_function_calling_example.py (0 hunks)
plugins/aws/example/aws_realtime_nova_example.py (0 hunks)
plugins/fish/example/fish_example.py (0 hunks)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2 hunks)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)
plugins/heygen/README.md (0 hunks)
plugins/heygen/example/avatar_example.py (0 hunks)
plugins/heygen/example/avatar_realtime_example.py (0 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (11 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2 hunks)
plugins/moondream/README.md (5 hunks)
plugins/moondream/example/README.md (1 hunks)
plugins/moondream/example/moondream_vlm_example.py (1 hunks)
plugins/moondream/example/pyproject.toml (1 hunks)
plugins/moondream/tests/test_moondream_local.py (4 hunks)
plugins/moondream/tests/test_moondream_local_vlm.py (1 hunks)
plugins/moondream/tests/test_moondream_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (6 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (2 hunks)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)
plugins/openai/README.md (1 hunks)
plugins/openai/examples/qwen_vl_example/README.md (1 hunks)
plugins/openai/examples/qwen_vl_example/pyproject.toml (1 hunks)
plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (1 hunks)
plugins/openai/tests/test_chat_completions.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1 hunks)
plugins/openrouter/example/openrouter_example.py (0 hunks)
plugins/sample_plugin/example/my_example.py (0 hunks)
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (2 hunks)
tests/test_audio_queue.py (1 hunks)
tests/test_queue_and_video_forwarder.py (9 hunks)

💤 Files with no reviewable changes (25)

examples/other_examples/plugins_examples/tts_elevenlabs/main.py
examples/other_examples/plugins_examples/vad_silero/main.py
plugins/heygen/example/avatar_realtime_example.py
examples/02_golf_coach_example/golf_coach_example.py
examples/other_examples/plugins_examples/tts_cartesia/main.py
examples/other_examples/09_github_mcp_demo/github_mcp_demo.py
plugins/heygen/example/avatar_example.py
plugins/aws/example/aws_qwen_example.py
examples/other_examples/gemini_live_realtime/gemini_live_example.py
plugins/aws/example/aws_realtime_function_calling_example.py
examples/other_examples/plugins_examples/audio_moderation/main.py
plugins/aws/example/aws_realtime_nova_example.py
examples/other_examples/plugins_examples/wizper_stt_translate/main.py
examples/other_examples/plugins_examples/video_moderation/main.py
examples/other_examples/plugins_examples/tts_kokoro/main.py
examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py
plugins/aws/example/aws_llm_function_calling_example.py
examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py
examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py
plugins/openrouter/example/openrouter_example.py
examples/other_examples/plugins_examples/mcp/main.py
plugins/heygen/README.md
plugins/sample_plugin/example/my_example.py
examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py
plugins/fish/example/fish_example.py

🧰 Additional context used

🧬 Code graph analysis (30)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

warmup (118-120)

agents-core/vision_agents/core/utils/video_track.py (1)

agents-core/vision_agents/core/utils/video_queue.py (1)

VideoLatestNQueue (6-28)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (2)

plugins/moondream/tests/test_moondream_local.py (3)

is_available (188-189)

is_available (216-217)

is_available (244-245)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

device (114-116)

plugins/moondream/tests/test_moondream_local_vlm.py (3)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)

LocalVLM (31-349)

warmup (96-99)

close (343-349)

simple_response (313-334)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)

warmup (118-120)

close (310-318)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)

close (241-246)

simple_response (197-218)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (1)

agents-core/vision_agents/core/utils/video_queue.py (1)

VideoLatestNQueue (6-28)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

add_frame_handler (48-74)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

_send_video_frame (435-447)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

agents-core/vision_agents/core/utils/video_queue.py (1)

VideoLatestNQueue (6-28)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (6)

plugins/openai/tests/test_chat_completions.py (1)

llm (37-40)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLM (49-418)

LLMResponseEvent (38-42)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-44)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)

simple_response (90-185)

_build_model_request (238-284)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

plugins/moondream/example/moondream_vlm_example.py (2)

agents-core/vision_agents/core/agents/agents.py (7)

Agent (93-1262)

create_user (741-753)

create_call (755-760)

subscribe (452-464)

simple_response (428-441)

join (466-549)

finish (578-611)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)

CloudVLM (27-246)

simple_response (197-218)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

add_frame_handler (48-74)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

_send_video_frame (268-274)

agents-core/vision_agents/core/cli/cli_runner.py (1)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

open_demo_for_agent (350-354)

plugins/moondream/tests/test_moondream_vlm.py (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (3)

CloudVLM (27-246)

close (241-246)

simple_response (197-218)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (3)

HeyGenRTCManager (19-267)

connect (60-145)

close (256-267)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

HeyGenVideoTrack (14-187)

stop (178-187)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)

connect (187-200)

Realtime (53-679)

close (372-386)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)

connect (80-106)

Realtime (40-487)

close (153-154)

agents-core/vision_agents/core/llm/events.py (3)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

RealtimeAgentSpeechTranscriptionEvent (148-153)

agents-core/vision_agents/core/edge/types.py (1)

write (45-45)

tests/test_audio_queue.py (1)

agents-core/vision_agents/core/utils/audio_queue.py (11)

AudioQueue (12-274)

empty (36-38)

put (50-83)

qsize (40-42)

get (119-136)

put_nowait (85-117)

get_nowait (138-152)

get_samples (154-237)

get_duration (239-258)

get_buffer_info (260-274)

_current_duration_ms (44-48)

plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

CloudVLM (27-246)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

LocalVLM (31-349)

plugins/openai/vision_agents/plugins/openai/__init__.py (2)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

ChatCompletionsLLM (23-180)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

ChatCompletionsVLM (31-284)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (3)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-79)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

add_frame_handler (48-74)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

_process_and_add_frame (283-308)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

agents-core/vision_agents/core/agents/agents.py (2)

create_user (741-753)

create_call (755-760)

agents-core/vision_agents/core/edge/edge_transport.py (2)

create_user (30-31)

open_demo (42-43)

plugins/openai/tests/test_chat_completions.py (6)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

ChatCompletionsLLM (23-180)

simple_response (65-160)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)

ChatCompletionsVLM (31-284)

watch_video_track (187-224)

simple_response (90-185)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

agents-core/vision_agents/core/events/manager.py (1)

wait (474-487)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (5)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (24-147)

add_frame_handler (48-74)

agents-core/vision_agents/core/utils/video_utils.py (1)

frame_to_jpeg_bytes (32-63)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

simple_response (65-160)

_build_model_request (162-180)

plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (2)

agents-core/vision_agents/core/edge/sfu_events.py (1)

name (2197-2201)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

add_frame_handler (48-74)

plugins/moondream/tests/test_moondream_local.py (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

LocalDetectionProcessor (28-318)

tests/test_queue_and_video_forwarder.py (3)

agents-core/vision_agents/core/utils/video_queue.py (1)

VideoLatestNQueue (6-28)

conftest.py (1)

bunny_video_track (300-344)

agents-core/vision_agents/core/utils/video_forwarder.py (4)

VideoForwarder (24-147)

add_frame_handler (48-74)

stop (102-112)

remove_frame_handler (76-92)

plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (3)

agents-core/vision_agents/core/agents/agents.py (3)

Agent (93-1262)

create_call (755-760)

finish (578-611)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

AgentLauncher (18-125)

agents-core/vision_agents/core/processors/base_processor.py (1)

agents-core/vision_agents/core/edge/sfu_events.py (1)

name (2197-2201)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/utils/video_forwarder.py (3)

VideoForwarder (24-147)

add_frame_handler (48-74)

stop (102-112)

agents-core/vision_agents/core/utils/video_queue.py (2)

VideoLatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (10)

watch_video_track (168-200)

_stop_watching_video_track (336-341)

_on_frame_received (202-208)

_setup_stt_subscription (210-217)

on_stt_transcript (216-217)

_on_stt_transcript (306-311)

_consume_stream (219-230)

_process_frame (232-304)

simple_response (313-334)

close (343-349)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

agents-core/vision_agents/core/utils/video_queue.py (2)

VideoLatestNQueue (6-28)

put_latest (14-20)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)

parse_detection_bbox (13-31)

annotate_detections (48-111)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-79)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)

warmup (96-99)

_prepare_moondream (101-109)

_load_model_sync (111-166)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4)

process_video (105-146)

_process_and_add_frame (208-237)

_run_inference (166-178)

_run_detection_sync (180-206)

agents-core/vision_agents/core/agents/agents.py (9)

agents-core/vision_agents/core/agents/agent_options.py (3)

AgentOptions (6-16)

default_agent_options (23-24)

update (9-16)

agents-core/vision_agents/core/edge/sfu_events.py (22)

ParticipantJoinedEvent (1481-1526)

participant (1496-1501)

participant (1504-1507)

participant (1545-1550)

participant (1553-1556)

participant (1625-1630)

participant (1633-1636)

participant (2100-2105)

participant (2108-2111)

participant (2156-2161)

participant (2164-2167)

Participant (229-270)

track_type (579-583)

track_type (1193-1197)

track_type (2289-2293)

user_id (489-493)

user_id (856-860)

user_id (901-905)

user_id (1186-1190)

user_id (2093-2097)

user_id (2142-2146)

name (2197-2201)

agents-core/vision_agents/core/utils/audio_queue.py (4)

AudioQueue (12-274)

put (50-83)

get_duration (239-258)

get (119-136)

agents-core/vision_agents/core/edge/types.py (4)

Participant (22-24)

Connection (27-35)

OutputAudioTrack (39-47)

write (45-45)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

VideoForwarder (24-147)

agents-core/vision_agents/core/events/manager.py (4)

send (428-472)

subscribe (301-370)

wait (474-487)

unsubscribe (274-299)

agents-core/vision_agents/core/edge/events.py (3)

TrackAddedEvent (18-24)

TrackRemovedEvent (28-34)

AudioReceivedEvent (9-14)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

join (256-307)

add_track_subscriber (319-322)

agents-core/vision_agents/core/llm/llm.py (4)

simple_audio_response (428-440)

set_conversation (194-204)

watch_video_track (458-471)

LLM (49-418)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)

agents-core/vision_agents/core/agents/agent_options.py (2)

AgentOptions (6-16)

default_agent_options (23-24)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/utils/video_forwarder.py (3)

VideoForwarder (24-147)

add_frame_handler (48-74)

stop (102-112)

agents-core/vision_agents/core/utils/video_queue.py (2)

VideoLatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)

device (114-116)

warmup (118-120)

_prepare_moondream (122-132)

close (310-318)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (10)

watch_video_track (66-98)

_stop_watching_video_track (220-225)

_on_frame_received (100-106)

_setup_stt_subscription (108-115)

on_stt_transcript (114-115)

_on_stt_transcript (190-195)

_consume_stream (117-130)

_process_frame (132-188)

simple_response (197-218)

close (241-246)

🪛 LanguageTool

plugins/moondream/example/README.md

[typographical] ~1-~1: Consider adding a comma here.
Context: ## Moondream example Please see root readme for details.

(PLEASE_COMMA)

plugins/moondream/README.md

[uncategorized] ~8-~8: Possible missing comma found.
Context: ...s Choose between cloud-hosted or local processing depending on your needs. When running l...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~164-~164: Possible missing article found.
Context: ... the model from HuggingFace and runs on device. It supports both VQA and captioning mo...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~233-~233: Possible missing comma found.
Context: ...ry configuration. If not provided, uses default which defaults to tempfile.gettempdir()...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~239-~239: Loose punctuation mark.
Context: ...e. ### CloudVLM Parameters - api_key: str - API key for Moondream Cloud API. ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~240-~240: Loose punctuation mark.
Context: ..._API_KEYenvironment variable. -mode`: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~247-~247: Loose punctuation mark.
Context: ...mits. ### LocalVLM Parameters - mode: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)

plugins/openai/examples/qwen_vl_example/README.md

[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...

(UNLIKELY_OPENING_PUNCTUATION)

[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...

(TO_DO_HYPHEN)

[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...

(COMMA_COMPOUND_SENTENCE_2)

🪛 markdownlint-cli2 (0.18.1)

plugins/moondream/README.md

167-167: Bare URL used

(MD034, no-bare-urls)

README.md

165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)

165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)

165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)

169-169: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (25)

agents-core/pyproject.toml (2)

24-24: Verify getstream 2.5.9 requirement and ensure changelog alignment.

The bump from ≥2.5.8 to ≥2.5.9 suggests upstream changes are needed for the conversation management wiring. Ensure that the 2.5.9 release contains the necessary changes to support the LLM.set_conversation(…) pattern introduced in this PR.

85-95: Consolidate commented sources section and verify git revision.

The [tool.uv.sources] section and its contents are all commented out, creating redundant commenting. Clarify the intent:

If these configurations are legacy/unused, remove them entirely.

If they document alternate configurations, consolidate under a single comment block explaining their purpose.

If line 94 is an active development reference, uncomment the section header and activate that line.

Additionally, verify that the git revision 85bd8ef00859ef6ed5ef4ffe7b7f40ae12d12973 exists in the GetStream/stream-py repository and is the correct commit for supporting the conversation management changes in this PR.

plugins/moondream/example/pyproject.toml (2)

16-22: All workspace packages verified and properly configured.

The verification confirms that all seven workspace dependencies referenced in the configuration exist in the monorepo with correct pyproject.toml definitions:

vision-agents (agents-core/pyproject.toml)

vision-agents-plugins-moondream (plugins/moondream/pyproject.toml)

vision-agents-plugins-getstream (plugins/getstream/pyproject.toml)

vision-agents-plugins-deepgram (plugins/deepgram/pyproject.toml)

vision-agents-plugins-elevenlabs (plugins/elevenlabs/pyproject.toml)

vision-agents-plugins-vogent (plugins/vogent/pyproject.toml)

The workspace configuration is valid.

1-5: Python version requirement is consistent — no action needed.

The vision-agents package requires Python 3.10 or newer, which aligns precisely with the requires-python = ">=3.10" constraint specified in the project metadata. No conflicts or misalignment.

plugins/openai/examples/qwen_vl_example/pyproject.toml (2)

5-5: No changes needed — Python version requirement is correctly aligned.

The verification confirms that requires-python = ">=3.10" in plugins/openai/examples/qwen_vl_example/pyproject.toml matches the core dependency requirement in agents-core/pyproject.toml (also >=3.10) and aligns with the vast majority of the workspace. The file is consistent with the project baseline.

1-20: Dependencies are correct; Baseten is a service provider, not a separate plugin.

Baseten is an OpenAI-compatible VLM provider that integrates through the existing vision-agents-plugins-openai dependency by configuring the API endpoint and credentials. No separate vision-agents-plugins-baseten package exists in the repository. The example's dependencies are appropriately configured.

Likely an incorrect or invalid review comment.

plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (1)

60-60: LGTM: Processor name attribute added.

The name class attribute properly identifies this processor instance.

agents-core/vision_agents/core/utils/audio_queue.py (1)

154-237: LGTM on get_samples complexity.

The splitting logic correctly handles partial chunk consumption and maintains sample accounting. The timeout-based waiting and metadata preservation are well implemented.

examples/01_simple_agent_example/README.md (1)

88-91: LGTM on documentation update.

The updated flow correctly reflects the new CLI behavior where the demo UI opens automatically. The note about the --no-demo flag is helpful.

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

293-296: LGTM on video forwarding refactor.

The shift from start_event_consumer to add_frame_handler aligns with the new frame-handler-based architecture. The handler registration with fps and name parameters is clean and consistent with the VideoForwarder API.

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

7-11: LGTM on device handling utility.

The handle_device() function provides a clean, centralized way to select compute device and precision. The CUDA detection with CPU fallback is appropriate.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (2)

8-8: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.

30-30: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue.

examples/01_simple_agent_example/simple_agent_example.py (3)

7-7: LGTM! Imports simplified.

Removed vogent from imports, consistent with the switch to Deepgram's built-in turn detection.

29-29: LGTM! Clarified turn detection behavior.

The comment helpfully explains that turn detection is not needed with Deepgram. This simplifies the agent configuration.

39-40: LGTM! Simplified join_call flow.

The removal of explicit user creation and demo opening aligns with the new CLI-controlled demo opening pattern (via --no-demo flag).

plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)

5-6: LGTM! Docstring updated.

The capability description now accurately reflects visual question answering and captioning without mentioning counting.

8-13: LGTM! Public API expanded with VLM support.

The addition of CloudVLM and LocalVLM imports and exports expands the plugin's capabilities with vision-language model support. The absolute import paths are clear and maintainable.

agents-core/vision_agents/core/utils/video_queue.py (1)

6-6: LGTM! Class renamed for clarity.

The rename from LatestNQueue to VideoLatestNQueue makes the purpose more explicit and better reflects its use in video frame buffering contexts.

agents-core/vision_agents/core/utils/video_track.py (2)

7-7: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.

20-20: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue.

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

9-9: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.

35-35: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue. The maxlen=2 is appropriately small for HeyGen's low-latency requirements.

agents-core/vision_agents/core/cli/cli_runner.py (2)

153-158: LGTM! New CLI flag added.

The --no-demo flag provides users control over whether the demo UI opens automatically. Good UX improvement.

159-159: LGTM! Function signature extended correctly.

The no_demo parameter is properly added to the function signature.

agents-core/vision_agents/core/agents/agent_launcher.py

agents-core/vision_agents/core/agents/agents.py

agents-core/vision_agents/core/processors/base_processor.py

agents-core/vision_agents/core/utils/audio_queue.py

plugins/openai/examples/qwen_vl_example/README.md

plugins/openai/tests/test_chat_completions.py

plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py

README.md

… joins the call

…hem to openai package

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

agents-core/vision_agents/core/utils/video_utils.py (1)

32-63: Validate dimensions to prevent division by zero.

The scale calculation on line 53 divides by src_width and src_height without validation. If either dimension is zero (or if target_width/target_height are zero or negative), this will raise a runtime error. Add defensive checks before the division.

Apply this diff to add validation:

 def frame_to_jpeg_bytes(
     frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85
 ) -> bytes:
     """
     Convert a video frame to JPEG bytes with resizing.
 
     Args:
         frame: an instance of `av.VideoFrame`.
         target_width: target width in pixels.
         target_height: target height in pixels.
         quality: JPEG quality. Default is 85.
 
     Returns: frame as JPEG bytes.
 
     """
+    if target_width <= 0 or target_height <= 0:
+        raise ValueError("Target dimensions must be positive")
+    
     # Convert frame to a PIL image
     img = frame.to_image()
 
     # Calculate scaling to maintain aspect ratio
     src_width, src_height = img.size
+    if src_width == 0 or src_height == 0:
+        raise ValueError(f"Source frame has invalid dimensions: {src_width}x{src_height}")
+    
     # Calculate scale factor (fit within target dimensions)
     scale = min(target_width / src_width, target_height / src_height)

plugins/openai/tests/test_chat_completions.py (1)

36-47: Use set_conversation instead of direct assignment.

Both fixtures (lines 39 and 46) directly assign to the private _conversation attribute, bypassing the public set_conversation method introduced in this PR. Tests should exercise the real API that agents use.

Apply this diff:
 @pytest.fixture()
 async def llm(openai_client_mock, conversation):
     llm_ = ChatCompletionsLLM(client=openai_client_mock, model="test")
-    llm_._conversation = conversation
+    llm_.set_conversation(conversation)
     return llm_


 @pytest.fixture()
 async def vlm(openai_client_mock, conversation):
     llm_ = ChatCompletionsVLM(client=openai_client_mock, model="test")
-    llm_._conversation = conversation
+    llm_.set_conversation(conversation)
     return llm_

🧹 Nitpick comments (3)

plugins/openai/examples/qwen_vl_example/README.md (1)
56-57: Resolve past review comment: clarify environment variable naming convention.

Baseten officially uses BASETEN_API_KEY as the standard environment variable, yet this README uses OPENAI_API_KEY and OPENAI_BASE_URL. While this pattern is valid for the OpenAI-compatible client approach, it creates confusion for developers who might expect Baseten's standard naming.

Recommend one of these approaches:
Document the mapping (preferred): Add a note explaining that OPENAI_* variables are used because the OpenAI client is instantiated with Baseten's OpenAI-compatible endpoint. Consider showing both naming conventions:
- **`OPENAI_API_KEY`**: Your Baseten API key (set this to your value from `BASETEN_API_KEY`)
- **`OPENAI_BASE_URL`**: The base URL for your Baseten API endpoint (set this to your value from `BASETEN_BASE_URL`)
And add: "See .env.example for the canonical BASETEN_* variable names if you prefer to use those for clarity."
Align with Baseten's convention: Update the code example and README to explicitly use BASETEN_API_KEY and BASETEN_BASE_URL environment variables, remapping them when creating the OpenAI client.
agents-core/vision_agents/core/agents/agents.py (1)
540-541: Remove duplicate set_conversation call.

The conversation is set twice in the join flow—once immediately after creation (line 541) and again after the optional wait_for_participant (line 548). This duplication is unnecessary; the LLM only needs the conversation set once. Consider removing the first call and keeping only the second one after all participant logic has completed.

Apply this diff to remove the duplicate:
         # wait for conversation creation coro at the very end of the join flow
         self.conversation = await create_conversation_coro
-        # Provide conversation to the LLM so it can access the chat history.
-        self.llm.set_conversation(self.conversation)
 
         if wait_for_participant:
             self.logger.info("Agent is ready, waiting for participant to join")
Also applies to: 547-548
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
210-220: Remove redundant expression in forwarder creation.

On line 211, within the if not shared_forwarder: block, the expression shared_forwarder or VideoForwarder(...) is redundant. Since shared_forwarder is guaranteed to be falsy inside this branch, the shared_forwarder or part is dead code and can be removed for clarity.

Apply this diff:
         if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name=f"{PLUGIN_NAME}_forwarder",
             )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4f85895 and f3d5b11.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

agents-core/vision_agents/core/agents/agents.py (1 hunks)
agents-core/vision_agents/core/llm/llm.py (3 hunks)
agents-core/vision_agents/core/utils/video_utils.py (1 hunks)
plugins/anthropic/tests/test_anthropic_llm.py (2 hunks)
plugins/aws/tests/test_aws.py (1 hunks)
plugins/gemini/tests/test_gemini_llm.py (5 hunks)
plugins/openai/README.md (1 hunks)
plugins/openai/examples/qwen_vl_example/README.md (1 hunks)
plugins/openai/examples/qwen_vl_example/pyproject.toml (1 hunks)
plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (1 hunks)
plugins/openai/tests/test_chat_completions.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
plugins/openrouter/tests/test_openrouter_llm.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

plugins/anthropic/tests/test_anthropic_llm.py
plugins/aws/tests/test_aws.py
plugins/openai/examples/qwen_vl_example/qwen_vl_example.py

🧰 Additional context used

🧬 Code graph analysis (8)

plugins/gemini/tests/test_gemini_llm.py (4)

plugins/anthropic/tests/test_anthropic_llm.py (1)

llm (18-22)

agents-core/vision_agents/core/llm/llm.py (2)

set_conversation (194-204)

simple_response (75-81)

agents-core/vision_agents/core/llm/events.py (1)

LLMResponseChunkEvent (87-102)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

simple_response (68-85)

plugins/openrouter/tests/test_openrouter_llm.py (3)

plugins/anthropic/tests/test_anthropic_llm.py (1)

llm (18-22)

agents-core/vision_agents/core/llm/llm.py (2)

LLM (49-418)

set_conversation (194-204)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

plugins/openai/tests/test_chat_completions.py (5)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

ChatCompletionsLLM (23-180)

simple_response (65-160)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)

ChatCompletionsVLM (31-284)

watch_video_track (187-224)

simple_response (90-185)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)

plugins/openai/tests/test_chat_completions.py (1)

llm (37-40)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLM (49-418)

LLMResponseEvent (38-42)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-44)

agents-core/vision_agents/core/events/manager.py (1)

register_events_from_module (219-256)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)

simple_response (90-185)

_build_model_request (238-284)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

agents-core/vision_agents/core/agents/agents.py (7)

plugins/anthropic/tests/test_anthropic_llm.py (1)

llm (18-22)

plugins/aws/tests/test_aws.py (1)

llm (35-39)

plugins/gemini/tests/test_gemini_llm.py (1)

llm (31-34)

plugins/openrouter/tests/test_openrouter_llm.py (1)

llm (61-68)

agents-core/vision_agents/core/llm/llm.py (1)

set_conversation (194-204)

plugins/getstream/tests/test_message_chunking.py (2)

conversation (15-27)

conversation (244-251)

tests/test_conversation.py (1)

conversation (66-73)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (24-147)

add_frame_handler (48-74)

agents-core/vision_agents/core/utils/video_utils.py (1)

frame_to_jpeg_bytes (32-63)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

simple_response (65-160)

_build_model_request (162-180)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

plugins/openai/vision_agents/plugins/openai/__init__.py (2)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

ChatCompletionsLLM (23-180)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

ChatCompletionsVLM (31-284)

agents-core/vision_agents/core/llm/llm.py (2)

agents-core/vision_agents/core/utils/utils.py (2)

Instructions (35-40)

parse_instructions (89-127)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

🪛 LanguageTool

plugins/openai/examples/qwen_vl_example/README.md

[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...

(UNLIKELY_OPENING_PUNCTUATION)

[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...

(TO_DO_HYPHEN)

[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...

(COMMA_COMPOUND_SENTENCE_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (12)

plugins/openai/examples/qwen_vl_example/pyproject.toml (1)

1-21: LGTM! Appropriate dependency structure for the example project.

The pyproject.toml correctly includes the core vision-agents framework and relevant plugins (OpenAI for VLM, GetStream for Edge, Deepgram for STT, ElevenLabs for TTS), with all workspace mappings properly configured. The Python requirement (>=3.10) is consistent with the broader project.

plugins/openai/examples/qwen_vl_example/README.md (1)

74-79: Fix grammar and punctuation issues flagged by static analysis.

Multiple punctuation and grammar corrections are needed to improve documentation clarity:

Line 77: Add comma after "content" clause: "...reduce API costs but may miss fast-moving content**,** Default is 1 fps."

Line 114: Hyphenate "to-dos": change TODOs to to-dos.

Line 119: Add comma in compound sentence: "...are set correctly and the model name is valid." should be "...are set correctly**,** and the model name is valid."

The repeated "Loose punctuation" warnings (lines 56–79, 98) appear to relate to the Markdown list formatting with backticks and dashes; confirm these are false positives by ensuring your Markdown renders correctly.

Also applies to: 98-98, 114-114, 119-119

plugins/openrouter/tests/test_openrouter_llm.py (1)

66-67: Good refactoring to use the public API.

The migration from direct _conversation attribute assignment to the public set_conversation() method properly encapsulates the conversation setup and aligns with the new LLM interface.

plugins/gemini/tests/test_gemini_llm.py (2)

32-33: Good refactoring to use the public API.

The migration from direct _conversation attribute assignment to the public set_conversation() method properly encapsulates the conversation setup and aligns with the new LLM interface.

84-85: Consistent API usage.

Correctly applies the same public set_conversation() pattern to the locally instantiated LLM in this test.

agents-core/vision_agents/core/llm/llm.py (1)

61-63: LGTM! Conversation management API is well designed.

The new set_conversation method provides a clean public API for conversation wiring. The instruction parsing with Instructions type is properly typed, and the separation of concerns between conversation management and instruction handling is clear.

Also applies to: 194-204, 206-210
plugins/openai/vision_agents/plugins/openai/__init__.py (1)
4-7: Verify ChatCompletionsVLM export is intentionally omitted.

ChatCompletionsVLM is imported on line 5 but not included in __all__ on line 7, meaning users cannot import it via from vision_agents.plugins.openai import ChatCompletionsVLM. If this plugin should be publicly available, add it to the exports list.

If the VLM should be exported, apply this diff:
-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"]
+__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)

39-64: LGTM! Clean plugin initialization.

The constructor properly initializes the AsyncOpenAI client with sensible defaults and registers events. The flexibility to pass either credentials or a pre-configured client is a good design.

65-160: LGTM! Streaming implementation with proper event emission.

The streaming response handling is well structured. Defensive check for uninitialized conversation prevents errors, and the event emission pattern (chunk events for deltas, completion event at finish) aligns with the framework's event-driven architecture. Error handling properly emits LLMErrorEvent on failures.

162-180: LGTM! Message construction correctly handles conversation context.

The method properly constructs the messages array with system instructions and conversation history. The pattern of sending a system message when participant is None (line 173-175) is a reasonable way to handle direct LLM calls vs. participant-triggered responses.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

267-284: LGTM! Frame encoding and message construction is well designed.

The frame-to-JPEG encoding and base64 conversion properly prepares video frames for the model API. Logging the frame count on line 277 aids debugging, and the message structure with image_url content type aligns with OpenAI's multimodal API format.

plugins/openai/tests/test_chat_completions.py (1)

196-256: LGTM! Well-designed test stubs.

The AsyncStreamStub and VideoStreamTrackStub classes provide clean mocks for streaming responses and video frame generation. The use of numpy for random frame data and proper timing metadata (pts, time_base) makes the video stub realistic.

coderabbitai · 2025-11-10T22:35:59Z

plugins/openai/README.md

+from vision_agents.plugins.openai import Realtime

 # Initialize with API key
 sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy")


⚠️ Potential issue | 🔴 Critical

Fix import/usage mismatch for Realtime.

Line 19 imports Realtime, but line 22 still references OpenAIRealtime. This will cause a NameError at runtime.

Apply this diff:

-# Initialize with API key -sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy") +# Initialize with API key +sts = Realtime(api_key="your_openai_api_key", voice="alloy")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from vision_agents.plugins.openai import Realtime

# Initialize with API key

sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy")

from vision_agents.plugins.openai import Realtime

# Initialize with API key

sts = Realtime(api_key="your_openai_api_key", voice="alloy")

🤖 Prompt for AI Agents

In plugins/openai/README.md around lines 19 to 22, the example imports Realtime but instantiates OpenAIRealtime causing a NameError; update the instantiation to use Realtime (e.g., replace OpenAIRealtime(...) with Realtime(...)) or alternatively change the import to import OpenAIRealtime instead—ensure the class name used when creating the instance matches the imported identifier.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3d5b11 and 1bc042e.

📒 Files selected for processing (3)

agents-core/vision_agents/core/agents/agents.py (1 hunks)
plugins/openai/tests/test_chat_completions.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/openai/vision_agents/plugins/openai/init.py

🧰 Additional context used

🧬 Code graph analysis (2)

agents-core/vision_agents/core/agents/agents.py (5)

plugins/anthropic/tests/test_anthropic_llm.py (1)

llm (18-22)

plugins/aws/tests/test_aws.py (1)

llm (35-39)

plugins/openrouter/tests/test_openrouter_llm.py (1)

llm (61-68)

plugins/gemini/tests/test_gemini_llm.py (1)

llm (31-34)

agents-core/vision_agents/core/llm/llm.py (1)

set_conversation (194-204)

plugins/openai/tests/test_chat_completions.py (7)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

ChatCompletionsLLM (23-180)

simple_response (65-160)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)

ChatCompletionsVLM (31-284)

watch_video_track (187-224)

simple_response (90-185)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

agents-core/vision_agents/core/llm/llm.py (1)

set_conversation (194-204)

agents-core/vision_agents/core/events/manager.py (1)

wait (474-487)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

🔇 Additional comments (7)

agents-core/vision_agents/core/agents/agents.py (1)
540-541: Clean refactor to public API.

The change wires the LLM into the conversation context using the new public set_conversation method, which is a cleaner approach than internal mutation. The placement immediately after conversation creation is logical and the comment clearly explains the intent.

As a minor defensive check, you may want to verify that edge.create_conversation never returns None, since set_conversation expects a non-None Conversation instance. If it can return None, consider adding a guard:
self.conversation = await create_conversation_coro
if self.conversation is not None:
    self.llm.set_conversation(self.conversation)
plugins/openai/tests/test_chat_completions.py (6)

36-47: Past review comment addressed correctly.

Both fixtures now call set_conversation, which ensures the full initialization logic (wiring instructions/parsed state) is exercised. This aligns with the real code path used by agents.

104-124: Error handling test looks solid.

The test properly verifies that model failures emit an LLMErrorEvent with the correct error message, ensuring the error path is covered.

127-171: LLM success test validates streaming and event emission correctly.

The test comprehensively checks that:

Streaming chunks are accumulated into the final response text

Events are emitted in the expected order (2 chunks + 1 completed)

Messages sent to the model include conversation history and system prompt

173-193: Error handling path covered.

The failure test ensures LLMErrorEvent is emitted when the underlying client raises an exception.

196-231: Mock stream implementation is well-structured.

AsyncStreamStub correctly implements the async iterator protocol and generates properly formatted ChatCompletionChunk objects, making it suitable for testing streaming behavior.

233-256: Video stub generates synthetic frames appropriately.

The stub creates random VideoFrame objects with timing metadata, providing sufficient fidelity for testing video track consumption. The minimal sleep in recv prevents tight loops while keeping tests fast.

plugins/openai/tests/test_chat_completions.py

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
28-28: Track or remove the TODO comment.

The TODO suggests updating documentation for the legacy openai.LLM API. Consider creating an issue to track this task or addressing it directly if documentation updates are straightforward.

Would you like me to open an issue to track this documentation update?

87-88: Consider making frame dimensions configurable.

The frame width and height are hardcoded to 800x600. While this provides sensible defaults, different models or use cases might benefit from different resolutions.

If needed, you could add optional frame_width and frame_height parameters to the constructor:
     def __init__(
         self,
         model: str,
         api_key: Optional[str] = None,
         base_url: Optional[str] = None,
         fps: int = 1,
         frame_buffer_seconds: int = 10,
+        frame_width: int = 800,
+        frame_height: int = 600,
         client: Optional[AsyncOpenAI] = None,
     ):
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

64-166: Consider extracting shared streaming logic.

The simple_response method in this file is nearly identical to the one in chat_completions_vlm.py (lines 90-192). The primary difference is that the VLM version doesn't have video-specific logic in this method. This duplication could be reduced by extracting the streaming loop logic into a shared helper method in a base class or utility module.

This is acceptable for the initial implementation, but future refactoring could improve maintainability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bc042e and 7d33615.

📒 Files selected for processing (2)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (450-471)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (24-147)

add_frame_handler (48-74)

agents-core/vision_agents/core/utils/video_utils.py (1)

frame_to_jpeg_bytes (32-63)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

simple_response (64-166)

_build_model_request (168-178)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)

plugins/openai/tests/test_chat_completions.py (1)

llm (37-40)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLM (49-418)

LLMResponseEvent (38-42)

agents-core/vision_agents/core/processors/base_processor.py (1)

Processor (35-44)

agents-core/vision_agents/core/events/manager.py (1)

register_events_from_module (219-256)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)

simple_response (90-192)

_build_model_request (245-283)

plugins/openai/vision_agents/plugins/openai/events.py (1)

LLMErrorEvent (15-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (4)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

173-177: Review the finish_reason warning condition.

The code warns when finish_reason is "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)

22-36: Well-documented plugin class.

The class docstring clearly describes the purpose and features of the ChatCompletionsLLM plugin, with a helpful usage example.

147-151: Review the finish_reason warning condition.

The code warns when finish_reason is "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.

168-178: Clean request builder implementation.

The _build_model_request method correctly assembles the prompt from instructions and conversation history. The implementation is appropriately simpler than the VLM variant, which also includes video frame handling.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py

github-actions bot added dependencies agents-core plugins config docs project-info labels Nov 7, 2025

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

agents-core/vision_agents/core/llm/llm.py Outdated Show resolved Hide resolved

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py Show resolved Hide resolved

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py Outdated Show resolved Hide resolved

cursor bot reviewed Nov 7, 2025

View reviewed changes

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

github-actions bot added the examples label Nov 10, 2025

dangusev added 5 commits November 10, 2025 23:29

LLM: always get the Conversation from the agent instead of attaching …

7eaa521

…it once.

LLM: set None instructions by default

6993bd5

Support for video LLMs hosted on Baseten

36daca8

LLM: add method set_conversation to be called by the agent after it…

840fd35

… joins the call

LLM: Fix type hints

fb7befc

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

dangusev and others added 6 commits November 10, 2025 23:30

Update uv.lock

4639df2

Fixed mypy

6f353cc

Improve readme

5620334

Basic example

736ff77

LLM: add method set_conversation to be called by the agent after it…

39da564

… joins the call

Rename models to ChatCompletionsLLM and ChatCompletionsVLM and move t…

f3d5b11

…hem to openai package

dangusev force-pushed the feature/baseten branch from 4f85895 to f3d5b11 Compare November 10, 2025 22:31

dangusev added 3 commits November 10, 2025 23:34

Fix after rebase

6f70b14

Add ChatCompletionsVLM to openai.__init__

3b35983

Test cleanup

1bc042e

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

plugins/openai/tests/test_chat_completions.py Show resolved Hide resolved

dangusev added 2 commits November 10, 2025 23:39

fix ruff

ae28cce

Fix mypy

7d33615

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

dangusev added 4 commits November 10, 2025 23:57

Fix docstring

ff4508b

Fix docstring

73b1863

Code cleanup

83442ca

Fix typos

42c7d17

dangusev changed the title ~~Baseten integration~~ Add openai.chat_completions package to support OSS models Nov 10, 2025

Add openai.chat_completions package to support OSS models #156

Are you sure you want to change the base?

Add openai.chat_completions package to support OSS models #156

Conversation

dangusev commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `openai.chat_completions` package to support OSS models #156

Add `openai.chat_completions` package to support OSS models #156

dangusev commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading