-
Notifications
You must be signed in to change notification settings - Fork 79
Add openai.chat_completions package to support OSS models
#156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughMoves conversation and instruction state to LLM instance fields, adds a public set_conversation(conversation) API, wires agent.join to call the setter, adds OpenAI ChatCompletions LLM/VLM plugins (streaming + vision), introduces frame_to_jpeg_bytes utility, and updates tests/examples to use the new setter. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant LLM
participant Conversation
participant ExternalModel
Agent->>Agent: create/join call
Agent->>Conversation: create conversation
Conversation-->>Agent: conversation
Note over Agent,LLM: Provide conversation via public API
Agent->>LLM: set_conversation(conversation)
LLM-->>LLM: store conversation & parsed instructions
Agent->>LLM: simple_response(text) / VLM request
LLM->>Conversation: read history (if present)
alt VLM includes frames
LLM->>LLM: _get_frames_bytes -> JPEG/base64 frames
LLM->>ExternalModel: stream/request (includes frames)
else LLM only
LLM->>ExternalModel: stream/request
end
ExternalModel-->>LLM: streaming chunks / final
LLM->>Agent: emit LLMResponseChunkEvent / LLMResponseCompletedEvent
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas to inspect closely:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
.env.example(1 hunks)agents-core/vision_agents/core/llm/llm.py(2 hunks)plugins/baseten/README.md(1 hunks)plugins/baseten/pyproject.toml(1 hunks)plugins/baseten/vision_agents/plugins/baseten/__init__.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/events.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/baseten/vision_agents/plugins/baseten/__init__.pyplugins/baseten/vision_agents/plugins/baseten/events.pyagents-core/vision_agents/core/llm/llm.pyplugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (4)
plugins/baseten/vision_agents/plugins/baseten/__init__.py (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
BasetenVLM(32-274)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
PluginBaseEvent(52-54)
agents-core/vision_agents/core/llm/llm.py (1)
agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/llm.py (3)
LLMResponseEvent(38-42)VideoLLM(443-464)_conversation(83-86)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
89-158: Critical: User text is never sent to the model.The
textparameter containing the new user prompt is never added to the messages payload. The model only receives conversation history and frames, but cannot respond to the new input. This is a correctness bug that breaks the core functionality.Apply this diff to include the user text:
- frames_data = [] + frames_data: list[dict[str, object]] = [] for frame_bytes in self._get_frames_bytes(): frame_b64 = base64.b64encode(frame_bytes).decode("utf-8") frame_msg = { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"}, } frames_data.append(frame_msg) + if text: + frames_data.insert(0, {"type": "text", "text": text}) + + if not frames_data: + logger.warning( + "Cannot create an LLM response - no prompt text or frames available." + ) + return LLMResponseEvent(original=None, text="") + logger.debug( f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"' ) messages.append( { "role": "user", "content": frames_data, } )
🧹 Nitpick comments (5)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
86-87: Consider making frame dimensions configurable.The frame dimensions are hardcoded to 800x600. Different models or use cases might benefit from different resolutions. Consider adding
frame_widthandframe_heightas constructor parameters.Apply this diff to add configurable dimensions:
def __init__( self, model: str, api_key: Optional[str] = None, base_url: Optional[str] = None, fps: int = 1, frame_buffer_seconds: int = 10, + frame_width: int = 800, + frame_height: int = 600, client: Optional[AsyncOpenAI] = None, ):Then update the initialization:
- self._frame_width = 800 - self._frame_height = 600 + self._frame_width = frame_width + self._frame_height = frame_height
92-93: Unused parameter: processors.The
processorsparameter is declared but never used in the method. Either utilize it or remove it from the signature.
110-110: Address or remove TODO comment.The TODO comment references
_build_enhanced_instructions, but this method is not present or used. Clarify the intended implementation or remove the comment.
129-129: Consider limiting conversation history size.The TODO comment raises a valid concern about message volume. Sending unbounded conversation history could lead to token limit errors or increased latency. Consider implementing a sliding window or token-based truncation strategy.
276-308: Well-implemented frame conversion utility.The function correctly handles aspect ratio preservation and uses appropriate resampling quality (LANCZOS). The TODO comment about moving to core utils is valid—this utility could benefit other plugins.
Would you like me to open an issue to track moving this utility to a shared location?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (2)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (2)
1-28: LGTM!Imports are well-organized and all appear necessary for the implementation. No sys.path modifications present, adhering to coding guidelines.
263-273: LGTM!The method correctly iterates over buffered frames and converts them to JPEG bytes. Implementation is clean and well-documented.
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (3)
31-39: Complete the TODO in the class docstring.The docstring still contains a TODO placeholder. Please add a brief description of the class purpose (e.g., "A video language model backed by Baseten-hosted models that processes video frames alongside text prompts"), document key parameters, and provide usage guidance.
88-157: CRITICAL: User prompt is never sent to the model.The
textparameter (Line 90) is never added to the messages payload. Only video frames are included in the final user message (Lines 152-157). This is a correctness bug that breaks the core functionality—the model cannot respond to the user's actual question.Apply this diff to fix:
# Attach the latest bufferred frames to the request - frames_data = [] + frames_data: list[dict[str, object]] = [] for frame_bytes in self._get_frames_bytes(): frame_b64 = base64.b64encode(frame_bytes).decode("utf-8") frame_msg = { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"}, } frames_data.append(frame_msg) + if text: + frames_data.insert(0, {"type": "text", "text": text}) + + if not frames_data: + logger.warning( + "Cannot create an LLM response - no prompt text or frames available." + ) + return LLMResponseEvent(original=None, text="") + logger.debug( f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"' ) messages.append( { "role": "user", "content": frames_data, } )
247-257: Fix redundant condition and avoid starting an already-running forwarder.The condition
if not shared_forwarder:followed byshared_forwarder or VideoForwarder(...)contains dead code—theshared_forwarder orpart can never be reached. Additionally, callingawait self._video_forwarder.start()whenshared_forwarderis provided may attempt to start an already-running forwarder.Apply this diff:
logger.info("🎥 BasetenVLM subscribing to VideoForwarder") - if not shared_forwarder: - self._video_forwarder = shared_forwarder or VideoForwarder( + if shared_forwarder is None: + self._video_forwarder = VideoForwarder( cast(VideoStreamTrack, track), max_buffer=10, fps=1.0, # Low FPS for VLM name="baseten_vlm_forwarder", ) await self._video_forwarder.start() else: self._video_forwarder = shared_forwarder
🧹 Nitpick comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
70-73: Enhance credential error messages.The error messages for missing credentials could be more helpful by mentioning the environment variable names.
Apply this diff:
elif not api_key: - raise ValueError("api_key must be provided") + raise ValueError("api_key must be provided or set via BASETEN_API_KEY environment variable") elif not base_url: - raise ValueError("base_url must be provided") + raise ValueError("base_url must be provided or set via BASETEN_BASE_URL environment variable")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
plugins/baseten/example/README.md(1 hunks)plugins/baseten/example/pyproject.toml(1 hunks)plugins/baseten/example/qwen_vl_example.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/baseten/example/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/baseten/example/qwen_vl_example.py (4)
agents-core/vision_agents/core/agents/agents.py (3)
Agent(125-1357)create_call(698-703)finish(558-591)agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-117)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
simple_response(88-223)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (10)
plugins/baseten/example/pyproject.toml (2)
1-13: LGTM!Project metadata and dependencies align correctly with the example script's imports.
15-20: LGTM!Workspace source configuration correctly references all plugin dependencies.
plugins/baseten/example/qwen_vl_example.py (4)
1-11: LGTM!Imports and environment loading are correct.
14-28: LGTM!Agent construction correctly wires Baseten VLM with edge, STT, TTS, and appropriate instructions.
30-43: Verify that the critical bug in baseten_vlm.py was addressed.The join flow is correctly structured. However, Line 38 calls
agent.simple_response("Describe what you currently see"), which relies on BasetenVLM'ssimple_responsemethod. Past review comments identified a critical bug where thetextparameter is never added to the messages payload (lines 88-157 in baseten_vlm.py), meaning the prompt won't reach the model. Please ensure this bug was fixed before merging.
45-46: LGTM!CLI entry point correctly wires the AgentLauncher.
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (4)
1-28: LGTM!Imports and constants are correctly defined.
159-223: LGTM!Streaming response logic correctly handles API calls, error reporting via LLMErrorEvent, and emits appropriate chunk and completion events.
262-272: LGTM!The frame iterator correctly processes buffered frames.
275-307: LGTM!Frame-to-JPEG conversion correctly maintains aspect ratio and uses appropriate resampling. The TODO comment about moving to core utils is a valid future refactoring consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/openai/README.md (1)
19-31: Align the class name in the quickstart snippet
The snippet now importsRealtime, but it still instantiatesOpenAIRealtime, which no longer exists under that import path. Please update the example so the constructor matches the imported symbol; otherwise, readers will copy an import/class combination that raisesNameError.-from vision_agents.plugins.openai import Realtime - -# Initialize with API key -sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy") +from vision_agents.plugins.openai import Realtime + +# Initialize with API key +sts = Realtime(api_key="your_openai_api_key", voice="alloy")plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
324-335: Reset duplicate guard per LLM response
Hi! Once_all_sent_textslearns a sentence it never forgets, so every identical sentence in future responses is dropped (no HeyGen speech) and the set grows without bound in long sessions. A fresh response should get a clean slate.if item_id != self._current_response_id: if self._text_buffer: text_to_send = self._text_buffer.strip() if text_to_send and text_to_send not in self._all_sent_texts: await self._send_text_to_heygen(text_to_send) self._all_sent_texts.add(text_to_send) self._text_buffer = "" self._current_response_id = item_id + self._all_sent_texts.clear()
🧹 Nitpick comments (6)
examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py (1)
51-51: Minor: Log message timing could be more precise.The message "is now joining" suggests an action in progress, but at this point the agent has already completed joining (the
await agent.join(call)has resolved). Consider "Agent has joined the call" or "Agent joined the call successfully" for clarity.Apply this diff to improve clarity:
- logger.info("Agent is now joining the call") + logger.info("Agent has joined the call")agents-core/vision_agents/core/utils/video_utils.py (2)
32-34: Validate JPEG quality parameter.The
qualityparameter lacks bounds checking. JPEG quality should typically be in the range 1-100. Invalid values may cause unexpected behavior or errors during encoding.Consider adding validation:
def frame_to_jpeg_bytes( frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85 ) -> bytes: """ Convert a video frame to JPEG bytes with resizing. Args: frame: an instance of `av.VideoFrame`. target_width: target width in pixels. target_height: target height in pixels. - quality: JPEG quality. Default is 85. + quality: JPEG quality (1-100). Default is 85. Returns: frame as JPEG bytes. """ + if not 1 <= quality <= 100: + raise ValueError("JPEG quality must be between 1 and 100") + # Convert frame to a PIL image img = frame.to_image()Also applies to: 42-42, 62-62
50-58: Consider whether upscaling is intended behavior.The current implementation will upscale images when the source dimensions are smaller than the target dimensions (scale > 1). Upscaling can degrade image quality and may not be the intended behavior for a video frame processing utility. Consider clamping the scale factor to prevent upscaling:
# Calculate scale factor (fit within target dimensions) scale = min(target_width / src_width, target_height / src_height) + # Optional: prevent upscaling by clamping scale to 1.0 + scale = min(scale, 1.0) + new_width = int(src_width * scale) new_height = int(src_height * scale)If upscaling is intentional, consider documenting this behavior in the docstring.
plugins/openai/vision_agents/plugins/openai/__init__.py (1)
4-7: Export ChatCompletionsVLM alongside the LLM variant
You importChatCompletionsVLM, but it’s missing from__all__, sofrom vision_agents.plugins.openai import *(used in docs/examples) won’t pick it up. Please add it to the export list for consistency with the other public classes.-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"] +__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]plugins/openai/examples/qwen_vl_example/README.md (1)
4-4: Clarify video processing direction.The phrase "accepts text and video and responds with text vocalised" could mislead readers into thinking users send video to the agent. Based on the example code, the agent processes video frames internally and sends them to the VLM—users interact via voice/text only.
Consider revising to: "The model processes video frames from the call and responds with text vocalized with the TTS service of your choice."
agents-core/vision_agents/core/cli/cli_runner.py (1)
181-184: Consider using a more robust pattern for capability detection.The nested
hasattrchecks work but are somewhat fragile. If the edge interface is expected to haveopen_demo_for_agent, consider using a protocol or abstract base class to make this contract explicit.For example, you could define a protocol:
from typing import Protocol class DemoCapableEdge(Protocol): async def open_demo_for_agent(self, agent: "Agent", call_type: str, call_id: str) -> str: ...Then use
isinstancechecking or type narrowing instead ofhasattr.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (71)
.cursor/rules/python.mdc(1 hunks)README.md(1 hunks)agents-core/pyproject.toml(3 hunks)agents-core/vision_agents/core/agents/agent_launcher.py(1 hunks)agents-core/vision_agents/core/agents/agent_options.py(1 hunks)agents-core/vision_agents/core/agents/agents.py(17 hunks)agents-core/vision_agents/core/cli/cli_runner.py(2 hunks)agents-core/vision_agents/core/processors/base_processor.py(1 hunks)agents-core/vision_agents/core/utils/audio_queue.py(1 hunks)agents-core/vision_agents/core/utils/video_forwarder.py(1 hunks)agents-core/vision_agents/core/utils/video_queue.py(1 hunks)agents-core/vision_agents/core/utils/video_track.py(2 hunks)agents-core/vision_agents/core/utils/video_utils.py(1 hunks)examples/01_simple_agent_example/README.md(1 hunks)examples/01_simple_agent_example/simple_agent_example.py(3 hunks)examples/02_golf_coach_example/golf_coach_example.py(0 hunks)examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py(0 hunks)examples/other_examples/09_github_mcp_demo/github_mcp_demo.py(0 hunks)examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py(0 hunks)examples/other_examples/gemini_live_realtime/gemini_live_example.py(0 hunks)examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py(1 hunks)examples/other_examples/plugins_examples/audio_moderation/main.py(0 hunks)examples/other_examples/plugins_examples/mcp/main.py(0 hunks)examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py(0 hunks)examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py(0 hunks)examples/other_examples/plugins_examples/tts_cartesia/main.py(0 hunks)examples/other_examples/plugins_examples/tts_elevenlabs/main.py(0 hunks)examples/other_examples/plugins_examples/tts_kokoro/main.py(0 hunks)examples/other_examples/plugins_examples/vad_silero/main.py(0 hunks)examples/other_examples/plugins_examples/video_moderation/main.py(0 hunks)examples/other_examples/plugins_examples/wizper_stt_translate/main.py(0 hunks)plugins/aws/example/aws_llm_function_calling_example.py(0 hunks)plugins/aws/example/aws_qwen_example.py(0 hunks)plugins/aws/example/aws_realtime_function_calling_example.py(0 hunks)plugins/aws/example/aws_realtime_nova_example.py(0 hunks)plugins/fish/example/fish_example.py(0 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(2 hunks)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py(1 hunks)plugins/heygen/README.md(0 hunks)plugins/heygen/example/avatar_example.py(0 hunks)plugins/heygen/example/avatar_realtime_example.py(0 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py(11 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py(2 hunks)plugins/moondream/README.md(5 hunks)plugins/moondream/example/README.md(1 hunks)plugins/moondream/example/moondream_vlm_example.py(1 hunks)plugins/moondream/example/pyproject.toml(1 hunks)plugins/moondream/tests/test_moondream_local.py(4 hunks)plugins/moondream/tests/test_moondream_local_vlm.py(1 hunks)plugins/moondream/tests/test_moondream_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/__init__.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py(4 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(6 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py(2 hunks)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)plugins/openai/README.md(1 hunks)plugins/openai/examples/qwen_vl_example/README.md(1 hunks)plugins/openai/examples/qwen_vl_example/pyproject.toml(1 hunks)plugins/openai/examples/qwen_vl_example/qwen_vl_example.py(1 hunks)plugins/openai/tests/test_chat_completions.py(1 hunks)plugins/openai/vision_agents/plugins/openai/__init__.py(1 hunks)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py(1 hunks)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py(1 hunks)plugins/openai/vision_agents/plugins/openai/rtc_manager.py(1 hunks)plugins/openrouter/example/openrouter_example.py(0 hunks)plugins/sample_plugin/example/my_example.py(0 hunks)plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py(2 hunks)tests/test_audio_queue.py(1 hunks)tests/test_queue_and_video_forwarder.py(9 hunks)
💤 Files with no reviewable changes (25)
- examples/other_examples/plugins_examples/tts_elevenlabs/main.py
- examples/other_examples/plugins_examples/vad_silero/main.py
- plugins/heygen/example/avatar_realtime_example.py
- examples/02_golf_coach_example/golf_coach_example.py
- examples/other_examples/plugins_examples/tts_cartesia/main.py
- examples/other_examples/09_github_mcp_demo/github_mcp_demo.py
- plugins/heygen/example/avatar_example.py
- plugins/aws/example/aws_qwen_example.py
- examples/other_examples/gemini_live_realtime/gemini_live_example.py
- plugins/aws/example/aws_realtime_function_calling_example.py
- examples/other_examples/plugins_examples/audio_moderation/main.py
- plugins/aws/example/aws_realtime_nova_example.py
- examples/other_examples/plugins_examples/wizper_stt_translate/main.py
- examples/other_examples/plugins_examples/video_moderation/main.py
- examples/other_examples/plugins_examples/tts_kokoro/main.py
- examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py
- plugins/aws/example/aws_llm_function_calling_example.py
- examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py
- examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py
- plugins/openrouter/example/openrouter_example.py
- examples/other_examples/plugins_examples/mcp/main.py
- plugins/heygen/README.md
- plugins/sample_plugin/example/my_example.py
- examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py
- plugins/fish/example/fish_example.py
🧰 Additional context used
🧬 Code graph analysis (30)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
warmup(118-120)
agents-core/vision_agents/core/utils/video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
VideoLatestNQueue(6-28)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (2)
plugins/moondream/tests/test_moondream_local.py (3)
is_available(188-189)is_available(216-217)is_available(244-245)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
device(114-116)
plugins/moondream/tests/test_moondream_local_vlm.py (3)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)
LocalVLM(31-349)warmup(96-99)close(343-349)simple_response(313-334)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)
warmup(118-120)close(310-318)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
close(241-246)simple_response(197-218)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
VideoLatestNQueue(6-28)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
add_frame_handler(48-74)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
_send_video_frame(435-447)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
VideoLatestNQueue(6-28)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (6)
plugins/openai/tests/test_chat_completions.py (1)
llm(37-40)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLM(49-418)LLMResponseEvent(38-42)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-44)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
simple_response(90-185)_build_model_request(238-284)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
plugins/moondream/example/moondream_vlm_example.py (2)
agents-core/vision_agents/core/agents/agents.py (7)
Agent(93-1262)create_user(741-753)create_call(755-760)subscribe(452-464)simple_response(428-441)join(466-549)finish(578-611)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
CloudVLM(27-246)simple_response(197-218)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
add_frame_handler(48-74)plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
_send_video_frame(268-274)
agents-core/vision_agents/core/cli/cli_runner.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
open_demo_for_agent(350-354)
plugins/moondream/tests/test_moondream_vlm.py (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (3)
CloudVLM(27-246)close(241-246)simple_response(197-218)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (3)
HeyGenRTCManager(19-267)connect(60-145)close(256-267)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
HeyGenVideoTrack(14-187)stop(178-187)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)
connect(187-200)Realtime(53-679)close(372-386)plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)
connect(80-106)Realtime(40-487)close(153-154)agents-core/vision_agents/core/llm/events.py (3)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)RealtimeAgentSpeechTranscriptionEvent(148-153)agents-core/vision_agents/core/edge/types.py (1)
write(45-45)
tests/test_audio_queue.py (1)
agents-core/vision_agents/core/utils/audio_queue.py (11)
AudioQueue(12-274)empty(36-38)put(50-83)qsize(40-42)get(119-136)put_nowait(85-117)get_nowait(138-152)get_samples(154-237)get_duration(239-258)get_buffer_info(260-274)_current_duration_ms(44-48)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
CloudVLM(27-246)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
LocalVLM(31-349)
plugins/openai/vision_agents/plugins/openai/__init__.py (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
ChatCompletionsLLM(23-180)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
ChatCompletionsVLM(31-284)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (3)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-79)agents-core/vision_agents/core/utils/video_forwarder.py (1)
add_frame_handler(48-74)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
_process_and_add_frame(283-308)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
agents-core/vision_agents/core/agents/agents.py (2)
create_user(741-753)create_call(755-760)agents-core/vision_agents/core/edge/edge_transport.py (2)
create_user(30-31)open_demo(42-43)
plugins/openai/tests/test_chat_completions.py (6)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
ChatCompletionsLLM(23-180)simple_response(65-160)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
ChatCompletionsVLM(31-284)watch_video_track(187-224)simple_response(90-185)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)agents-core/vision_agents/core/events/manager.py (1)
wait(474-487)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(24-147)add_frame_handler(48-74)agents-core/vision_agents/core/utils/video_utils.py (1)
frame_to_jpeg_bytes(32-63)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
simple_response(65-160)_build_model_request(162-180)
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (2)
agents-core/vision_agents/core/edge/sfu_events.py (1)
name(2197-2201)agents-core/vision_agents/core/utils/video_forwarder.py (1)
add_frame_handler(48-74)
plugins/moondream/tests/test_moondream_local.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
LocalDetectionProcessor(28-318)
tests/test_queue_and_video_forwarder.py (3)
agents-core/vision_agents/core/utils/video_queue.py (1)
VideoLatestNQueue(6-28)conftest.py (1)
bunny_video_track(300-344)agents-core/vision_agents/core/utils/video_forwarder.py (4)
VideoForwarder(24-147)add_frame_handler(48-74)stop(102-112)remove_frame_handler(76-92)
plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (3)
agents-core/vision_agents/core/agents/agents.py (3)
Agent(93-1262)create_call(755-760)finish(578-611)agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-125)
agents-core/vision_agents/core/processors/base_processor.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (1)
name(2197-2201)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/utils/video_forwarder.py (3)
VideoForwarder(24-147)add_frame_handler(48-74)stop(102-112)agents-core/vision_agents/core/utils/video_queue.py (2)
VideoLatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (10)
watch_video_track(168-200)_stop_watching_video_track(336-341)_on_frame_received(202-208)_setup_stt_subscription(210-217)on_stt_transcript(216-217)_on_stt_transcript(306-311)_consume_stream(219-230)_process_frame(232-304)simple_response(313-334)close(343-349)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
agents-core/vision_agents/core/utils/video_queue.py (2)
VideoLatestNQueue(6-28)put_latest(14-20)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)
parse_detection_bbox(13-31)annotate_detections(48-111)handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-79)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
warmup(96-99)_prepare_moondream(101-109)_load_model_sync(111-166)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4)
process_video(105-146)_process_and_add_frame(208-237)_run_inference(166-178)_run_detection_sync(180-206)
agents-core/vision_agents/core/agents/agents.py (9)
agents-core/vision_agents/core/agents/agent_options.py (3)
AgentOptions(6-16)default_agent_options(23-24)update(9-16)agents-core/vision_agents/core/edge/sfu_events.py (22)
ParticipantJoinedEvent(1481-1526)participant(1496-1501)participant(1504-1507)participant(1545-1550)participant(1553-1556)participant(1625-1630)participant(1633-1636)participant(2100-2105)participant(2108-2111)participant(2156-2161)participant(2164-2167)Participant(229-270)track_type(579-583)track_type(1193-1197)track_type(2289-2293)user_id(489-493)user_id(856-860)user_id(901-905)user_id(1186-1190)user_id(2093-2097)user_id(2142-2146)name(2197-2201)agents-core/vision_agents/core/utils/audio_queue.py (4)
AudioQueue(12-274)put(50-83)get_duration(239-258)get(119-136)agents-core/vision_agents/core/edge/types.py (4)
Participant(22-24)Connection(27-35)OutputAudioTrack(39-47)write(45-45)agents-core/vision_agents/core/utils/video_forwarder.py (1)
VideoForwarder(24-147)agents-core/vision_agents/core/events/manager.py (4)
send(428-472)subscribe(301-370)wait(474-487)unsubscribe(274-299)agents-core/vision_agents/core/edge/events.py (3)
TrackAddedEvent(18-24)TrackRemovedEvent(28-34)AudioReceivedEvent(9-14)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
join(256-307)add_track_subscriber(319-322)agents-core/vision_agents/core/llm/llm.py (4)
simple_audio_response(428-440)set_conversation(194-204)watch_video_track(458-471)LLM(49-418)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)
agents-core/vision_agents/core/agents/agent_options.py (2)
AgentOptions(6-16)default_agent_options(23-24)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/utils/video_forwarder.py (3)
VideoForwarder(24-147)add_frame_handler(48-74)stop(102-112)agents-core/vision_agents/core/utils/video_queue.py (2)
VideoLatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
device(114-116)warmup(118-120)_prepare_moondream(122-132)close(310-318)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (10)
watch_video_track(66-98)_stop_watching_video_track(220-225)_on_frame_received(100-106)_setup_stt_subscription(108-115)on_stt_transcript(114-115)_on_stt_transcript(190-195)_consume_stream(117-130)_process_frame(132-188)simple_response(197-218)close(241-246)
🪛 LanguageTool
plugins/moondream/example/README.md
[typographical] ~1-~1: Consider adding a comma here.
Context: ## Moondream example Please see root readme for details.
(PLEASE_COMMA)
plugins/moondream/README.md
[uncategorized] ~8-~8: Possible missing comma found.
Context: ...s Choose between cloud-hosted or local processing depending on your needs. When running l...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~164-~164: Possible missing article found.
Context: ... the model from HuggingFace and runs on device. It supports both VQA and captioning mo...
(AI_HYDRA_LEO_MISSING_THE)
[uncategorized] ~233-~233: Possible missing comma found.
Context: ...ry configuration. If not provided, uses default which defaults to tempfile.gettempdir()...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~239-~239: Loose punctuation mark.
Context: ...e. ### CloudVLM Parameters - api_key: str - API key for Moondream Cloud API. ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~240-~240: Loose punctuation mark.
Context: ..._API_KEYenvironment variable. -mode`: Literal["vqa", "caption"] - "vqa" for v...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~247-~247: Loose punctuation mark.
Context: ...mits. ### LocalVLM Parameters - mode: Literal["vqa", "caption"] - "vqa" for v...
(UNLIKELY_OPENING_PUNCTUATION)
plugins/openai/examples/qwen_vl_example/README.md
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...
(UNLIKELY_OPENING_PUNCTUATION)
[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...
(TO_DO_HYPHEN)
[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...
(COMMA_COMPOUND_SENTENCE_2)
🪛 markdownlint-cli2 (0.18.1)
plugins/moondream/README.md
167-167: Bare URL used
(MD034, no-bare-urls)
README.md
165-165: Images should have alternate text (alt text)
(MD045, no-alt-text)
165-165: Images should have alternate text (alt text)
(MD045, no-alt-text)
165-165: Images should have alternate text (alt text)
(MD045, no-alt-text)
169-169: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (25)
agents-core/pyproject.toml (2)
24-24: Verify getstream 2.5.9 requirement and ensure changelog alignment.The bump from ≥2.5.8 to ≥2.5.9 suggests upstream changes are needed for the conversation management wiring. Ensure that the 2.5.9 release contains the necessary changes to support the LLM.set_conversation(…) pattern introduced in this PR.
85-95: Consolidate commented sources section and verify git revision.The
[tool.uv.sources]section and its contents are all commented out, creating redundant commenting. Clarify the intent:
- If these configurations are legacy/unused, remove them entirely.
- If they document alternate configurations, consolidate under a single comment block explaining their purpose.
- If line 94 is an active development reference, uncomment the section header and activate that line.
Additionally, verify that the git revision
85bd8ef00859ef6ed5ef4ffe7b7f40ae12d12973exists in the GetStream/stream-py repository and is the correct commit for supporting the conversation management changes in this PR.plugins/moondream/example/pyproject.toml (2)
16-22: All workspace packages verified and properly configured.The verification confirms that all seven workspace dependencies referenced in the configuration exist in the monorepo with correct pyproject.toml definitions:
- vision-agents (agents-core/pyproject.toml)
- vision-agents-plugins-moondream (plugins/moondream/pyproject.toml)
- vision-agents-plugins-getstream (plugins/getstream/pyproject.toml)
- vision-agents-plugins-deepgram (plugins/deepgram/pyproject.toml)
- vision-agents-plugins-elevenlabs (plugins/elevenlabs/pyproject.toml)
- vision-agents-plugins-vogent (plugins/vogent/pyproject.toml)
The workspace configuration is valid.
1-5: Python version requirement is consistent — no action needed.The vision-agents package requires Python 3.10 or newer, which aligns precisely with the
requires-python = ">=3.10"constraint specified in the project metadata. No conflicts or misalignment.plugins/openai/examples/qwen_vl_example/pyproject.toml (2)
5-5: No changes needed — Python version requirement is correctly aligned.The verification confirms that
requires-python = ">=3.10"inplugins/openai/examples/qwen_vl_example/pyproject.tomlmatches the core dependency requirement inagents-core/pyproject.toml(also>=3.10) and aligns with the vast majority of the workspace. The file is consistent with the project baseline.
1-20: Dependencies are correct; Baseten is a service provider, not a separate plugin.Baseten is an OpenAI-compatible VLM provider that integrates through the existing
vision-agents-plugins-openaidependency by configuring the API endpoint and credentials. No separatevision-agents-plugins-basetenpackage exists in the repository. The example's dependencies are appropriately configured.Likely an incorrect or invalid review comment.
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (1)
60-60: LGTM: Processor name attribute added.The
nameclass attribute properly identifies this processor instance.agents-core/vision_agents/core/utils/audio_queue.py (1)
154-237: LGTM onget_samplescomplexity.The splitting logic correctly handles partial chunk consumption and maintains sample accounting. The timeout-based waiting and metadata preservation are well implemented.
examples/01_simple_agent_example/README.md (1)
88-91: LGTM on documentation update.The updated flow correctly reflects the new CLI behavior where the demo UI opens automatically. The note about the
--no-demoflag is helpful.plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
293-296: LGTM on video forwarding refactor.The shift from
start_event_consumertoadd_frame_handleraligns with the new frame-handler-based architecture. The handler registration with fps and name parameters is clean and consistent with the VideoForwarder API.plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
7-11: LGTM on device handling utility.The
handle_device()function provides a clean, centralized way to select compute device and precision. The CUDA detection with CPU fallback is appropriate.plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (2)
8-8: LGTM! Import updated correctly.The import path and class name have been updated to use the renamed
VideoLatestNQueue.
30-30: LGTM! Type annotation updated correctly.The frame queue type annotation correctly uses
VideoLatestNQueue.examples/01_simple_agent_example/simple_agent_example.py (3)
7-7: LGTM! Imports simplified.Removed
vogentfrom imports, consistent with the switch to Deepgram's built-in turn detection.
29-29: LGTM! Clarified turn detection behavior.The comment helpfully explains that turn detection is not needed with Deepgram. This simplifies the agent configuration.
39-40: LGTM! Simplified join_call flow.The removal of explicit user creation and demo opening aligns with the new CLI-controlled demo opening pattern (via
--no-demoflag).plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)
5-6: LGTM! Docstring updated.The capability description now accurately reflects visual question answering and captioning without mentioning counting.
8-13: LGTM! Public API expanded with VLM support.The addition of
CloudVLMandLocalVLMimports and exports expands the plugin's capabilities with vision-language model support. The absolute import paths are clear and maintainable.agents-core/vision_agents/core/utils/video_queue.py (1)
6-6: LGTM! Class renamed for clarity.The rename from
LatestNQueuetoVideoLatestNQueuemakes the purpose more explicit and better reflects its use in video frame buffering contexts.agents-core/vision_agents/core/utils/video_track.py (2)
7-7: LGTM! Import updated correctly.The import path and class name have been updated to use the renamed
VideoLatestNQueue.
20-20: LGTM! Type annotation updated correctly.The frame queue type annotation correctly uses
VideoLatestNQueue.plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
9-9: LGTM! Import updated correctly.The import path and class name have been updated to use the renamed
VideoLatestNQueue.
35-35: LGTM! Type annotation updated correctly.The frame queue type annotation correctly uses
VideoLatestNQueue. Themaxlen=2is appropriately small for HeyGen's low-latency requirements.agents-core/vision_agents/core/cli/cli_runner.py (2)
153-158: LGTM! New CLI flag added.The
--no-demoflag provides users control over whether the demo UI opens automatically. Good UX improvement.
159-159: LGTM! Function signature extended correctly.The
no_demoparameter is properly added to the function signature.
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py
Show resolved
Hide resolved
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py
Show resolved
Hide resolved
4f85895 to
f3d5b11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
agents-core/vision_agents/core/utils/video_utils.py (1)
32-63: Validate dimensions to prevent division by zero.The scale calculation on line 53 divides by
src_widthandsrc_heightwithout validation. If either dimension is zero (or iftarget_width/target_heightare zero or negative), this will raise a runtime error. Add defensive checks before the division.Apply this diff to add validation:
def frame_to_jpeg_bytes( frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85 ) -> bytes: """ Convert a video frame to JPEG bytes with resizing. Args: frame: an instance of `av.VideoFrame`. target_width: target width in pixels. target_height: target height in pixels. quality: JPEG quality. Default is 85. Returns: frame as JPEG bytes. """ + if target_width <= 0 or target_height <= 0: + raise ValueError("Target dimensions must be positive") + # Convert frame to a PIL image img = frame.to_image() # Calculate scaling to maintain aspect ratio src_width, src_height = img.size + if src_width == 0 or src_height == 0: + raise ValueError(f"Source frame has invalid dimensions: {src_width}x{src_height}") + # Calculate scale factor (fit within target dimensions) scale = min(target_width / src_width, target_height / src_height)plugins/openai/tests/test_chat_completions.py (1)
36-47: Use set_conversation instead of direct assignment.Both fixtures (lines 39 and 46) directly assign to the private
_conversationattribute, bypassing the publicset_conversationmethod introduced in this PR. Tests should exercise the real API that agents use.Apply this diff:
@pytest.fixture() async def llm(openai_client_mock, conversation): llm_ = ChatCompletionsLLM(client=openai_client_mock, model="test") - llm_._conversation = conversation + llm_.set_conversation(conversation) return llm_ @pytest.fixture() async def vlm(openai_client_mock, conversation): llm_ = ChatCompletionsVLM(client=openai_client_mock, model="test") - llm_._conversation = conversation + llm_.set_conversation(conversation) return llm_
🧹 Nitpick comments (3)
plugins/openai/examples/qwen_vl_example/README.md (1)
56-57: Resolve past review comment: clarify environment variable naming convention.Baseten officially uses
BASETEN_API_KEYas the standard environment variable, yet this README usesOPENAI_API_KEYandOPENAI_BASE_URL. While this pattern is valid for the OpenAI-compatible client approach, it creates confusion for developers who might expect Baseten's standard naming.Recommend one of these approaches:
Document the mapping (preferred): Add a note explaining that
OPENAI_*variables are used because the OpenAI client is instantiated with Baseten's OpenAI-compatible endpoint. Consider showing both naming conventions:- **`OPENAI_API_KEY`**: Your Baseten API key (set this to your value from `BASETEN_API_KEY`) - **`OPENAI_BASE_URL`**: The base URL for your Baseten API endpoint (set this to your value from `BASETEN_BASE_URL`)And add: "See
.env.examplefor the canonicalBASETEN_*variable names if you prefer to use those for clarity."Align with Baseten's convention: Update the code example and README to explicitly use
BASETEN_API_KEYandBASETEN_BASE_URLenvironment variables, remapping them when creating the OpenAI client.agents-core/vision_agents/core/agents/agents.py (1)
540-541: Remove duplicate set_conversation call.The conversation is set twice in the join flow—once immediately after creation (line 541) and again after the optional wait_for_participant (line 548). This duplication is unnecessary; the LLM only needs the conversation set once. Consider removing the first call and keeping only the second one after all participant logic has completed.
Apply this diff to remove the duplicate:
# wait for conversation creation coro at the very end of the join flow self.conversation = await create_conversation_coro - # Provide conversation to the LLM so it can access the chat history. - self.llm.set_conversation(self.conversation) if wait_for_participant: self.logger.info("Agent is ready, waiting for participant to join")Also applies to: 547-548
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
210-220: Remove redundant expression in forwarder creation.On line 211, within the
if not shared_forwarder:block, the expressionshared_forwarder or VideoForwarder(...)is redundant. Sinceshared_forwarderis guaranteed to be falsy inside this branch, theshared_forwarder orpart is dead code and can be removed for clarity.Apply this diff:
if not shared_forwarder: - self._video_forwarder = shared_forwarder or VideoForwarder( + self._video_forwarder = VideoForwarder( cast(VideoStreamTrack, track), max_buffer=10, fps=1.0, # Low FPS for VLM name=f"{PLUGIN_NAME}_forwarder", )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
agents-core/vision_agents/core/agents/agents.py(1 hunks)agents-core/vision_agents/core/llm/llm.py(3 hunks)agents-core/vision_agents/core/utils/video_utils.py(1 hunks)plugins/anthropic/tests/test_anthropic_llm.py(2 hunks)plugins/aws/tests/test_aws.py(1 hunks)plugins/gemini/tests/test_gemini_llm.py(5 hunks)plugins/openai/README.md(1 hunks)plugins/openai/examples/qwen_vl_example/README.md(1 hunks)plugins/openai/examples/qwen_vl_example/pyproject.toml(1 hunks)plugins/openai/examples/qwen_vl_example/qwen_vl_example.py(1 hunks)plugins/openai/tests/test_chat_completions.py(1 hunks)plugins/openai/vision_agents/plugins/openai/__init__.py(1 hunks)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py(1 hunks)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py(1 hunks)plugins/openrouter/tests/test_openrouter_llm.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/anthropic/tests/test_anthropic_llm.py
- plugins/aws/tests/test_aws.py
- plugins/openai/examples/qwen_vl_example/qwen_vl_example.py
🧰 Additional context used
🧬 Code graph analysis (8)
plugins/gemini/tests/test_gemini_llm.py (4)
plugins/anthropic/tests/test_anthropic_llm.py (1)
llm(18-22)agents-core/vision_agents/core/llm/llm.py (2)
set_conversation(194-204)simple_response(75-81)agents-core/vision_agents/core/llm/events.py (1)
LLMResponseChunkEvent(87-102)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
simple_response(68-85)
plugins/openrouter/tests/test_openrouter_llm.py (3)
plugins/anthropic/tests/test_anthropic_llm.py (1)
llm(18-22)agents-core/vision_agents/core/llm/llm.py (2)
LLM(49-418)set_conversation(194-204)agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)
plugins/openai/tests/test_chat_completions.py (5)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
ChatCompletionsLLM(23-180)simple_response(65-160)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
ChatCompletionsVLM(31-284)watch_video_track(187-224)simple_response(90-185)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)
plugins/openai/tests/test_chat_completions.py (1)
llm(37-40)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLM(49-418)LLMResponseEvent(38-42)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-44)agents-core/vision_agents/core/events/manager.py (1)
register_events_from_module(219-256)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
simple_response(90-185)_build_model_request(238-284)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
agents-core/vision_agents/core/agents/agents.py (7)
plugins/anthropic/tests/test_anthropic_llm.py (1)
llm(18-22)plugins/aws/tests/test_aws.py (1)
llm(35-39)plugins/gemini/tests/test_gemini_llm.py (1)
llm(31-34)plugins/openrouter/tests/test_openrouter_llm.py (1)
llm(61-68)agents-core/vision_agents/core/llm/llm.py (1)
set_conversation(194-204)plugins/getstream/tests/test_message_chunking.py (2)
conversation(15-27)conversation(244-251)tests/test_conversation.py (1)
conversation(66-73)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(24-147)add_frame_handler(48-74)agents-core/vision_agents/core/utils/video_utils.py (1)
frame_to_jpeg_bytes(32-63)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
simple_response(65-160)_build_model_request(162-180)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
plugins/openai/vision_agents/plugins/openai/__init__.py (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
ChatCompletionsLLM(23-180)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
ChatCompletionsVLM(31-284)
agents-core/vision_agents/core/llm/llm.py (2)
agents-core/vision_agents/core/utils/utils.py (2)
Instructions(35-40)parse_instructions(89-127)agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)
🪛 LanguageTool
plugins/openai/examples/qwen_vl_example/README.md
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...
(UNLIKELY_OPENING_PUNCTUATION)
[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...
(TO_DO_HYPHEN)
[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...
(COMMA_COMPOUND_SENTENCE_2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (12)
plugins/openai/examples/qwen_vl_example/pyproject.toml (1)
1-21: LGTM! Appropriate dependency structure for the example project.The pyproject.toml correctly includes the core
vision-agentsframework and relevant plugins (OpenAI for VLM, GetStream for Edge, Deepgram for STT, ElevenLabs for TTS), with all workspace mappings properly configured. The Python requirement (>=3.10) is consistent with the broader project.plugins/openai/examples/qwen_vl_example/README.md (1)
74-79: Fix grammar and punctuation issues flagged by static analysis.Multiple punctuation and grammar corrections are needed to improve documentation clarity:
- Line 77: Add comma after "content" clause: "...reduce API costs but may miss fast-moving content**,** Default is 1 fps."
- Line 114: Hyphenate "to-dos": change
TODOstoto-dos.- Line 119: Add comma in compound sentence: "...are set correctly and the model name is valid." should be "...are set correctly**,** and the model name is valid."
The repeated "Loose punctuation" warnings (lines 56–79, 98) appear to relate to the Markdown list formatting with backticks and dashes; confirm these are false positives by ensuring your Markdown renders correctly.
Also applies to: 98-98, 114-114, 119-119
plugins/openrouter/tests/test_openrouter_llm.py (1)
66-67: Good refactoring to use the public API.The migration from direct
_conversationattribute assignment to the publicset_conversation()method properly encapsulates the conversation setup and aligns with the new LLM interface.plugins/gemini/tests/test_gemini_llm.py (2)
32-33: Good refactoring to use the public API.The migration from direct
_conversationattribute assignment to the publicset_conversation()method properly encapsulates the conversation setup and aligns with the new LLM interface.
84-85: Consistent API usage.Correctly applies the same public
set_conversation()pattern to the locally instantiated LLM in this test.agents-core/vision_agents/core/llm/llm.py (1)
61-63: LGTM! Conversation management API is well designed.The new
set_conversationmethod provides a clean public API for conversation wiring. The instruction parsing withInstructionstype is properly typed, and the separation of concerns between conversation management and instruction handling is clear.Also applies to: 194-204, 206-210
plugins/openai/vision_agents/plugins/openai/__init__.py (1)
4-7: Verify ChatCompletionsVLM export is intentionally omitted.
ChatCompletionsVLMis imported on line 5 but not included in__all__on line 7, meaning users cannot import it viafrom vision_agents.plugins.openai import ChatCompletionsVLM. If this plugin should be publicly available, add it to the exports list.If the VLM should be exported, apply this diff:
-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"] +__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)
39-64: LGTM! Clean plugin initialization.The constructor properly initializes the AsyncOpenAI client with sensible defaults and registers events. The flexibility to pass either credentials or a pre-configured client is a good design.
65-160: LGTM! Streaming implementation with proper event emission.The streaming response handling is well structured. Defensive check for uninitialized conversation prevents errors, and the event emission pattern (chunk events for deltas, completion event at finish) aligns with the framework's event-driven architecture. Error handling properly emits
LLMErrorEventon failures.
162-180: LGTM! Message construction correctly handles conversation context.The method properly constructs the messages array with system instructions and conversation history. The pattern of sending a system message when participant is None (line 173-175) is a reasonable way to handle direct LLM calls vs. participant-triggered responses.
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
267-284: LGTM! Frame encoding and message construction is well designed.The frame-to-JPEG encoding and base64 conversion properly prepares video frames for the model API. Logging the frame count on line 277 aids debugging, and the message structure with image_url content type aligns with OpenAI's multimodal API format.
plugins/openai/tests/test_chat_completions.py (1)
196-256: LGTM! Well-designed test stubs.The
AsyncStreamStubandVideoStreamTrackStubclasses provide clean mocks for streaming responses and video frame generation. The use of numpy for random frame data and proper timing metadata (pts, time_base) makes the video stub realistic.
| from vision_agents.plugins.openai import Realtime | ||
|
|
||
| # Initialize with API key | ||
| sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix import/usage mismatch for Realtime.
Line 19 imports Realtime, but line 22 still references OpenAIRealtime. This will cause a NameError at runtime.
Apply this diff:
-# Initialize with API key
-sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy")
+# Initialize with API key
+sts = Realtime(api_key="your_openai_api_key", voice="alloy")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from vision_agents.plugins.openai import Realtime | |
| # Initialize with API key | |
| sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy") | |
| from vision_agents.plugins.openai import Realtime | |
| # Initialize with API key | |
| sts = Realtime(api_key="your_openai_api_key", voice="alloy") |
🤖 Prompt for AI Agents
In plugins/openai/README.md around lines 19 to 22, the example imports Realtime
but instantiates OpenAIRealtime causing a NameError; update the instantiation to
use Realtime (e.g., replace OpenAIRealtime(...) with Realtime(...)) or
alternatively change the import to import OpenAIRealtime instead—ensure the
class name used when creating the instance matches the imported identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
agents-core/vision_agents/core/agents/agents.py(1 hunks)plugins/openai/tests/test_chat_completions.py(1 hunks)plugins/openai/vision_agents/plugins/openai/__init__.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/openai/vision_agents/plugins/openai/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
agents-core/vision_agents/core/agents/agents.py (5)
plugins/anthropic/tests/test_anthropic_llm.py (1)
llm(18-22)plugins/aws/tests/test_aws.py (1)
llm(35-39)plugins/openrouter/tests/test_openrouter_llm.py (1)
llm(61-68)plugins/gemini/tests/test_gemini_llm.py (1)
llm(31-34)agents-core/vision_agents/core/llm/llm.py (1)
set_conversation(194-204)
plugins/openai/tests/test_chat_completions.py (7)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
ChatCompletionsLLM(23-180)simple_response(65-160)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
ChatCompletionsVLM(31-284)watch_video_track(187-224)simple_response(90-185)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)agents-core/vision_agents/core/llm/llm.py (1)
set_conversation(194-204)agents-core/vision_agents/core/events/manager.py (1)
wait(474-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (7)
agents-core/vision_agents/core/agents/agents.py (1)
540-541: Clean refactor to public API.The change wires the LLM into the conversation context using the new public
set_conversationmethod, which is a cleaner approach than internal mutation. The placement immediately after conversation creation is logical and the comment clearly explains the intent.As a minor defensive check, you may want to verify that
edge.create_conversationnever returnsNone, sinceset_conversationexpects a non-NoneConversationinstance. If it can returnNone, consider adding a guard:self.conversation = await create_conversation_coro if self.conversation is not None: self.llm.set_conversation(self.conversation)plugins/openai/tests/test_chat_completions.py (6)
36-47: Past review comment addressed correctly.Both fixtures now call
set_conversation, which ensures the full initialization logic (wiring instructions/parsed state) is exercised. This aligns with the real code path used by agents.
104-124: Error handling test looks solid.The test properly verifies that model failures emit an
LLMErrorEventwith the correct error message, ensuring the error path is covered.
127-171: LLM success test validates streaming and event emission correctly.The test comprehensively checks that:
- Streaming chunks are accumulated into the final response text
- Events are emitted in the expected order (2 chunks + 1 completed)
- Messages sent to the model include conversation history and system prompt
173-193: Error handling path covered.The failure test ensures
LLMErrorEventis emitted when the underlying client raises an exception.
196-231: Mock stream implementation is well-structured.
AsyncStreamStubcorrectly implements the async iterator protocol and generates properly formattedChatCompletionChunkobjects, making it suitable for testing streaming behavior.
233-256: Video stub generates synthetic frames appropriately.The stub creates random
VideoFrameobjects with timing metadata, providing sufficient fidelity for testing video track consumption. The minimal sleep inrecvprevents tight loops while keeping tests fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
28-28: Track or remove the TODO comment.The TODO suggests updating documentation for the legacy openai.LLM API. Consider creating an issue to track this task or addressing it directly if documentation updates are straightforward.
Would you like me to open an issue to track this documentation update?
87-88: Consider making frame dimensions configurable.The frame width and height are hardcoded to 800x600. While this provides sensible defaults, different models or use cases might benefit from different resolutions.
If needed, you could add optional
frame_widthandframe_heightparameters to the constructor:def __init__( self, model: str, api_key: Optional[str] = None, base_url: Optional[str] = None, fps: int = 1, frame_buffer_seconds: int = 10, + frame_width: int = 800, + frame_height: int = 600, client: Optional[AsyncOpenAI] = None, ):plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
64-166: Consider extracting shared streaming logic.The
simple_responsemethod in this file is nearly identical to the one inchat_completions_vlm.py(lines 90-192). The primary difference is that the VLM version doesn't have video-specific logic in this method. This duplication could be reduced by extracting the streaming loop logic into a shared helper method in a base class or utility module.This is acceptable for the initial implementation, but future refactoring could improve maintainability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py(1 hunks)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(24-147)add_frame_handler(48-74)agents-core/vision_agents/core/utils/video_utils.py (1)
frame_to_jpeg_bytes(32-63)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
simple_response(64-166)_build_model_request(168-178)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)
plugins/openai/tests/test_chat_completions.py (1)
llm(37-40)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLM(49-418)LLMResponseEvent(38-42)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-44)agents-core/vision_agents/core/events/manager.py (1)
register_events_from_module(219-256)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
simple_response(90-192)_build_model_request(245-283)plugins/openai/vision_agents/plugins/openai/events.py (1)
LLMErrorEvent(15-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (4)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
173-177: Review the finish_reason warning condition.The code warns when
finish_reasonis "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)
22-36: Well-documented plugin class.The class docstring clearly describes the purpose and features of the ChatCompletionsLLM plugin, with a helpful usage example.
147-151: Review the finish_reason warning condition.The code warns when
finish_reasonis "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.
168-178: Clean request builder implementation.The
_build_model_requestmethod correctly assembles the prompt from instructions and conversation history. The implementation is appropriately simpler than the VLM variant, which also includes video frame handling.
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py
Outdated
Show resolved
Hide resolved
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
Outdated
Show resolved
Hide resolved
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
Outdated
Show resolved
Hide resolved
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
Outdated
Show resolved
Hide resolved
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
Outdated
Show resolved
Hide resolved
openai.chat_completions package to support OSS models
What's changed
New Features
openai.ChatCompletionsVLMbuffers video frames, converts to JPEG, and streams responses via OpenAI-compatible client;set_conversation()method for improved conversation handlingDocumentation
Tests:
set_conversationinstead of direct_conversationassignment.Summary by CodeRabbit
New Features
Documentation
Tests