Skip to content

Commit 2dba995

Browse files
committed
Merge branch 'main' into feat/aws-fc
2 parents 32cdc56 + 3d7904c commit 2dba995

File tree

149 files changed

+9201
-10696
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+9201
-10696
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
name: repo-workflow-guide
3+
description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
4+
model: opus
5+
---
6+
7+
You are a Repository Workflow Specialist, an expert in interpreting and applying project-specific development guidelines, coding standards, and workflow instructions.
8+
9+
Your primary responsibility is to read, understand, and communicate the instructions and guidelines contained in the docs/ai directory of the repository. You serve as the authoritative source for how development work should be conducted within this specific codebase.
10+
11+
When activated, you will:
12+
13+
1. **Locate and Read Guidelines**: Immediately access all relevant files in the docs/ai directory. Read them thoroughly and understand their complete content, including:
14+
- Coding standards and style guides
15+
- Project structure and organization rules
16+
- Development workflow and processes
17+
- Testing requirements and conventions
18+
- Deployment procedures
19+
- Any specific technical constraints or preferences
20+
- Tool usage and configuration instructions
21+
22+
2. **Interpret Context**: Understand the specific task or question being asked and identify which guidelines are most relevant to address it.
23+
24+
3. **Provide Clear Guidance**: Deliver specific, actionable instructions based on the documented guidelines. Your responses should:
25+
- Quote or reference specific sections of the guidelines when appropriate
26+
- Explain the reasoning behind the guidelines when it helps with understanding
27+
- Provide concrete examples of how to follow the guidelines
28+
- Highlight any critical requirements or common pitfalls mentioned in the documentation
29+
30+
4. **Handle Missing Information**: If the docs/ai directory doesn't contain information relevant to the current question:
31+
- Clearly state what information is missing
32+
- Suggest reasonable defaults based on common industry practices
33+
- Recommend updating the documentation to cover this scenario
34+
35+
5. **Ensure Compliance**: Actively verify that proposed approaches align with all documented guidelines. If you identify any conflicts or violations, explicitly point them out and suggest compliant alternatives.
36+
37+
6. **Prioritize Accuracy**: Always base your guidance on the actual content of the documentation. Do not invent or assume guidelines that aren't explicitly documented.
38+
39+
7. **Stay Current**: If guidelines appear to conflict or if you notice outdated information, flag this for human review while providing the most reasonable interpretation.
40+
41+
Output Format:
42+
- Begin with a brief summary of the relevant guidelines
43+
- Provide specific, step-by-step instructions when appropriate
44+
- Include direct quotes or references to documentation sections
45+
- End with any important caveats, warnings, or additional considerations
46+
47+
Your goal is to ensure that all development work in this repository adheres to its documented standards and practices, reducing inconsistency and improving code quality through faithful application of project-specific guidelines.

.github/labeler.yml

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Core Framework Components
2+
agents-core:
3+
- changed-files:
4+
- any-glob-to-any-file: 'agents-core/**'
5+
6+
# Plugin System
7+
plugins:
8+
- changed-files:
9+
- any-glob-to-any-file: 'plugins/**'
10+
11+
12+
# Examples and Demos
13+
examples:
14+
- changed-files:
15+
- any-glob-to-any-file: 'examples/**'
16+
17+
# Documentation
18+
docs:
19+
- changed-files:
20+
- any-glob-to-any-file: 'docs/**'
21+
- any-glob-to-any-file: '**/*.md'
22+
23+
# Configuration and Build
24+
config:
25+
- changed-files:
26+
- any-glob-to-any-file: '**/*.toml'
27+
- any-glob-to-any-file: '**/*.yml'
28+
- any-glob-to-any-file: '**/*.yaml'
29+
- any-glob-to-any-file: '**/*.json'
30+
- any-glob-to-any-file: '**/*.ini'
31+
- any-glob-to-any-file: '**/*.cfg'
32+
- any-glob-to-any-file: '**/pyproject.toml'
33+
- any-glob-to-any-file: '**/pytest.ini'
34+
- any-glob-to-any-file: '**/conftest.py'
35+
36+
# CI/CD and GitHub
37+
ci:
38+
- changed-files:
39+
- any-glob-to-any-file: '.github/**'
40+
41+
42+
# CLI and Development Tools
43+
cli:
44+
- changed-files:
45+
- any-glob-to-any-file: '**/cli.py'
46+
- any-glob-to-any-file: '**/dev.py'
47+
- any-glob-to-any-file: '**/DEVELOPMENT.md'
48+
49+
# Dependencies
50+
dependencies:
51+
- changed-files:
52+
- any-glob-to-any-file: '**/uv.lock'
53+
- any-glob-to-any-file: '**/requirements*.txt'
54+
- any-glob-to-any-file: '**/poetry.lock'
55+
- any-glob-to-any-file: '**/Pipfile.lock'
56+
57+
# Assets and Resources
58+
assets:
59+
- changed-files:
60+
- any-glob-to-any-file: 'assets/**'
61+
- any-glob-to-any-file: '**/*.png'
62+
- any-glob-to-any-file: '**/*.jpg'
63+
- any-glob-to-any-file: '**/*.jpeg'
64+
- any-glob-to-any-file: '**/*.gif'
65+
- any-glob-to-any-file: '**/*.mp4'
66+
- any-glob-to-any-file: '**/*.wav'
67+
- any-glob-to-any-file: '**/*.mp3'
68+
69+
# License and Legal
70+
legal:
71+
- changed-files:
72+
- any-glob-to-any-file: 'LICENSE'
73+
- any-glob-to-any-file: '**/LICENSE.*'
74+
- any-glob-to-any-file: '**/*.license'
75+
76+
# README and Project Info
77+
project-info:
78+
- changed-files:
79+
- any-glob-to-any-file: '**/README.md'
80+
- any-glob-to-any-file: '**/CHANGELOG.md'
81+
- any-glob-to-any-file: '**/CONTRIBUTING.md'
82+
- any-glob-to-any-file: '**/SECURITY.md'

.github/workflows/labeler.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: "Pull Request Labeler"
2+
on:
3+
- pull_request_target
4+
5+
jobs:
6+
labeler:
7+
permissions:
8+
contents: read
9+
pull-requests: write
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/labeler@v5

.github/workflows/run_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ jobs:
4545
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
4646
AWS_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
4747
_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
48+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
4849
timeout-minutes: 30
4950
steps:
5051
- name: Checkout

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,4 @@ stream-py/
8484
# Artifacts / assets
8585
*.pt
8686
*.kef
87+
*.onnx

DEVELOPMENT.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,79 @@ To see how the agent work open up agents.py
109109
* The LLM uses the VideoForwarder to write the video to a websocket or webrtc connection
110110
* The STS writes the reply on agent.llm.audio_track and the RealtimeTranscriptEvent / RealtimePartialTranscriptEvent
111111

112+
## Audio management
113+
114+
Some important things about audio inside the library:
115+
116+
1. WebRTC uses Opus 48khz stereo but inside the library audio is always in PCM format
117+
2. Plugins / AI models work with different PCM formats, usually 16khz mono
118+
3. PCM data is always passed around using the `PcmData` object which contains information about sample rate, channels and format
119+
4. Text-to-speech plugins automatically return PCM in the format needed by WebRTC. This is exposed via the `set_output_format` method
120+
5. Audio resampling can be done using `PcmData.resample` method
121+
6. When resampling audio in chunks, it is important to re-use the same `av.AudioResampler` resampler (see `PcmData.resample` and `core.tts.TTS`)
122+
7. Adjusting from stereo to mono and vice-versa can be done using the `PcmData.resample` method
123+
124+
Some ground rules:
125+
126+
1. Do not build code to resample / adjust audio unless it is not covered already by `PcmData`
127+
2. Do not pass PCM as plain bytes around and write code that assumes specific sample rate or format. Use `PcmData` instead
128+
129+
## Example
130+
131+
```python
132+
import asyncio
133+
from getstream.video.rtc.track_util import PcmData
134+
from openai import AsyncOpenAI
135+
136+
async def example():
137+
client = AsyncOpenAI(api_key="sk-42")
138+
139+
resp = await client.audio.speech.create(
140+
model="gpt-4o-mini-tts",
141+
voice="alloy",
142+
input="pcm is cool, give me some of that please",
143+
response_format="pcm",
144+
)
145+
146+
# load response into PcmData, note that you need to specify sample_rate, channels and format
147+
pcm_data = PcmData.from_bytes(
148+
resp.content, sample_rate=24_000, channels=1, format="s16"
149+
)
150+
151+
# check if pcm_data is stereo (it's not in this case ofc)
152+
print(pcm_data.stereo)
153+
154+
# write the pcm to file
155+
with open("test.wav", "wb") as f:
156+
f.write(pcm_data.to_wav_bytes())
157+
158+
# resample pcm to be 48khz stereo
159+
resampled_pcm = pcm_data.resample(48_000, 2)
160+
161+
# play-out pcm using ffplay
162+
from vision_agents.core.edge.types import play_pcm_with_ffplay
163+
164+
await play_pcm_with_ffplay(resampled_pcm)
165+
166+
if __name__ == "__main__":
167+
asyncio.run(example())
168+
```
169+
170+
Other things that you get from the audio utilities:
171+
172+
1. Changing PCM format
173+
2. Iterate over audio chunks (`PcmData.chunks`)
174+
3. Process audio with pre/post buffers (`AudioSegmentCollector`)
175+
4. Accumulating audio (`PcmData.append`)
176+
177+
### Testing audio manually
178+
179+
Sometimes you need to test audio manually, here's some tips:
180+
181+
1. Do not use earplugs when testing PCM playback ;)
182+
2. You can use the `PcmData.to_wav_bytes` method to convert PCM into wav bytes (see `manual_tts_to_wav` for an example)
183+
3. If you have `ffplay` installed, you can playback pcm directly to check if audio is correct
184+
112185
## Dev / Contributor Guidelines
113186

114187
### Light wrapping
@@ -246,3 +319,26 @@ You can now see the metrics at `http://localhost:9464/metrics` (make sure that y
246319

247320
- Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is available
248321
- When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to Yolo etc
322+
323+
324+
## Onboarding Plan for new contributors
325+
326+
**Audio Formats**
327+
328+
You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz.
329+
Encoded as i16 or f32. Note that webrtc by default is 48khz.
330+
331+
A good first intro to audio formats can be found here:
332+
333+
**Using Cursor**
334+
335+
You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
336+
See the docs folder for other ai instruction files
337+
338+
**Learning Roadmap**
339+
340+
1. Quick refresher on audio formats
341+
2. Build a TTS integration
342+
3. Build a STT integration
343+
4. Build an LLM integration
344+
5. Write a pytest test with a fixture

0 commit comments

Comments
 (0)