-
Notifications
You must be signed in to change notification settings - Fork 114
Feat: Add support for Moondream VLM functions #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
d681edd
Scaffolding setup for Moondream VLM
Nash0x7E2 67a8529
Basic (broken) impl
Nash0x7E2 45509da
Fix parsing
Nash0x7E2 722662d
Add some handling around processing
Nash0x7E2 a184c78
Basic Moondream VLM example
Nash0x7E2 a9a092e
Remove extra character
Nash0x7E2 f838a1e
Clean up folder structure
Nash0x7E2 a0b5c9d
WIP local version
Nash0x7E2 e0b31d3
Fix broken track imports
Nash0x7E2 eaddf22
LocalVLM tests
Nash0x7E2 e32af63
Unused param
Nash0x7E2 02fad43
Ensure processors are wramed up during launch
Nash0x7E2 a82e2e0
Ruff and MyPy
Nash0x7E2 d1af35c
PR review - CloudVLM
Nash0x7E2 ec534fe
Add missing debug log for processor warmup
Nash0x7E2 2d4f0bc
Improve local device detection
Nash0x7E2 fa9847d
Formatting and clean up
Nash0x7E2 f1ba327
More clean up
Nash0x7E2 f9c91e9
Fix bug with processing lock
Nash0x7E2 97bc613
Ruff and MyPy final checks
Nash0x7E2 a801788
Expose device for verification
Nash0x7E2 83b32f0
Simplify example
Nash0x7E2 46f0f53
Update public doc strings
Nash0x7E2 0de1cdd
Update readme
Nash0x7E2 13fb325
unused import
Nash0x7E2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ## Moondream example | ||
| Please see root readme for details. |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| import asyncio | ||
| import logging | ||
| from dotenv import load_dotenv | ||
|
|
||
| from vision_agents.core import User, Agent, cli | ||
| from vision_agents.core.agents import AgentLauncher | ||
| from vision_agents.plugins import deepgram, getstream, elevenlabs, moondream | ||
| from vision_agents.core.events import CallSessionParticipantJoinedEvent | ||
| import os | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| load_dotenv() | ||
|
|
||
| async def create_agent(**kwargs) -> Agent: | ||
| llm = moondream.CloudVLM( | ||
| api_key=os.getenv("MOONDREAM_API_KEY"), | ||
| ) | ||
| # create an agent to run with Stream's edge, openAI llm | ||
| agent = Agent( | ||
| edge=getstream.Edge(), # low latency edge. clients for React, iOS, Android, RN, Flutter etc. | ||
| agent_user=User( | ||
| name="My happy AI friend", id="agent" | ||
| ), | ||
| llm=llm, | ||
| tts=elevenlabs.TTS(), | ||
| stt=deepgram.STT(), | ||
| ) | ||
| return agent | ||
|
|
||
|
|
||
| async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: | ||
| # ensure the agent user is created | ||
| await agent.create_user() | ||
| # Create a call | ||
| call = await agent.create_call(call_type, call_id) | ||
|
|
||
| @agent.events.subscribe | ||
| async def on_participant_joined(event: CallSessionParticipantJoinedEvent): | ||
| if event.participant.user.id != "agent": | ||
| await asyncio.sleep(2) | ||
| await agent.simple_response("Describe what you currently see") | ||
|
|
||
| # Have the agent join the call/room | ||
| with await agent.join(call): | ||
| # Open the demo UI | ||
| await agent.edge.open_demo(call) | ||
| # run till the call ends | ||
| await agent.finish() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| cli(AgentLauncher(create_agent=create_agent, join_call=join_call)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| [project] | ||
| name = "moondream-example" | ||
| version = "0.1.0" | ||
| description = "Example using Moondream Detect and VLM with Vision Agents" | ||
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "vision-agents", | ||
| "vision-agents-plugins-moondream", | ||
| "vision-agents-plugins-getstream", | ||
| "vision-agents-plugins-deepgram", | ||
| "vision-agents-plugins-elevenlabs", | ||
| "vision-agents-plugins-vogent", | ||
| "python-dotenv", | ||
| ] | ||
|
|
||
| [tool.uv.sources] | ||
| vision-agents = { workspace = true } | ||
| vision-agents-plugins-moondream = { workspace = true } | ||
| vision-agents-plugins-getstream = { workspace = true } | ||
| vision-agents-plugins-deepgram = { workspace = true } | ||
| vision-agents-plugins-elevenlabs = { workspace = true } | ||
| vision-agents-plugins-vogent = { workspace = true } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.