-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add voice bridge -- talk to AI agents via speech #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Standalone voice interface that connects microphone to OpenCode via Silero VAD, Whisper MLX STT, and Edge TTS. ~6-8s round-trip on Apple Silicon with full tool execution capability. Features: - Swappable STT (whisper-mlx, faster-whisper) and TTS (edge-tts, macos-say, facebookMMS) engines - Voice exit phrases with substring matching for natural endings - STT sanity checking via LLM prompt (corrects transcription errors) - Session handback transcript on exit for calling agent context - TTY detection with graceful degradation in TUI subprocess - Natural conversational confirmations for task execution New files: - .agents/scripts/voice-bridge.py (main Python bridge, ~800 lines) - .agents/scripts/voice-helper.sh (shell wrapper with setup/talk/ status/benchmark/voices/devices commands) Updated: README.md, AGENTS.md, subagent-index.toon, speech-to-speech.md, .gitignore
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the user interaction model by introducing a robust voice interface for AI coding agents. The primary purpose is to enable a more natural and efficient way for users to command and receive feedback from their AI, moving beyond traditional text-based interactions. This change aims to accelerate development workflows by allowing hands-free operation and real-time conversational guidance for complex coding tasks, ultimately making the AI agent more accessible and responsive. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR introduces a comprehensive Voice Bridge system enabling real-time voice interactions with AI agents. The implementation adds a Python orchestration layer with pluggable STT/TTS/VAD components, a shell script wrapper for environment setup and command-line access, and documentation updates explaining the new voice interface. Changes
Sequence DiagramsequenceDiagram
participant User
participant VoiceBridge as Voice Bridge
participant VAD as Silero VAD
participant STT as STT Engine
participant OpenCode as OpenCode LLM
participant TTS as TTS Engine
participant Speaker
User->>VoiceBridge: Start voice session
VoiceBridge->>VoiceBridge: Initialize VAD, STT, TTS
loop Voice Interaction Loop
VoiceBridge->>VAD: Stream audio chunks
VAD-->>VoiceBridge: Speech detected?
alt Speech Detected
VoiceBridge->>STT: Transcribe audio_int16
STT-->>VoiceBridge: Transcript text
VoiceBridge->>VoiceBridge: Accumulate transcript history
VoiceBridge->>OpenCode: Query with transcript
OpenCode-->>VoiceBridge: Response text
VoiceBridge->>TTS: Synthesize response
TTS-->>VoiceBridge: Audio stream ready
VoiceBridge->>Speaker: Play audio output
Speaker-->>VoiceBridge: Playback complete
else Silence Detected
VoiceBridge->>VoiceBridge: Check exit phrase / timeout
end
end
User->>VoiceBridge: Exit (Esc/Ctrl+C)
VoiceBridge-->>User: Emit transcript handback
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The Python voice bridge introduces 834 lines of heterogeneous logic spanning VAD detection, multiple STT/TTS engine implementations, OpenCode server communication, multi-threaded audio processing, and CLI orchestration. The shell script adds 423 lines of system-level dependency management, server lifecycle control, and command dispatching. The breadth of new public APIs (14+ functions/classes), integration density, and need to validate correct orchestration of multiple components across platform-specific implementations (macOS vs. general) requires careful review of each subsystem. Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 03:23:24 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature: a voice bridge to allow users to interact with AI agents via speech. The implementation includes a comprehensive Python script for the bridge logic, a helper shell script for user interaction, and updates to documentation.
The core Python script is well-structured with separate classes for different components like VAD, STT, and TTS. However, I've identified a few areas for improvement:
- There's a critical issue with the
SIGINThandler that prevents graceful shutdown. - The voice barge-in feature is currently non-functional due to the audio callback logic.
- There are several instances of unused imports and dead code that should be cleaned up for better maintainability.
- The benchmark command in the helper script uses a deprecated and insecure method for creating temporary files.
I've left specific comments with suggestions to address these points. Overall, this is a great addition, and with these fixes, it will be more robust and maintainable.
| if self.is_speaking: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The voice barge-in feature appears to be non-functional. These lines cause the _audio_callback to return immediately during TTS playback, preventing any voice input from being processed. This makes voice-triggered barge-in impossible, and the associated logic (e.g., self.barge_in, _barge_in_frames) is effectively dead code. The comment on line 411 is also misleading, as using headphones won't enable barge-in with the current code.
To fix this, you would need a more sophisticated approach, likely involving acoustic echo cancellation (AEC). Given the complexity, I recommend either removing the non-functional barge-in code and comments or disabling this check and clearly documenting that it requires headphones and may have false positives.
.agents/scripts/voice-bridge.py
Outdated
| return | ||
|
|
||
| # Handle Ctrl+C gracefully | ||
| signal.signal(signal.SIGINT, lambda s, f: sys.exit(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom SIGINT handler calls sys.exit(0), which terminates the process immediately. This will prevent the finally block in the run method from executing, so the graceful shutdown logic, including _print_handback(), will be skipped. This can lead to loss of the session transcript. Removing this line will allow the default KeyboardInterrupt to be raised, which is handled correctly in the run method.
| # edge-tts | ||
| try: | ||
| import asyncio, edge_tts, tempfile, os | ||
| async def t(): | ||
| c = edge_tts.Communicate(text, 'en-US-GuyNeural') | ||
| f = tempfile.mktemp(suffix='.mp3') | ||
| await c.save(f) | ||
| os.unlink(f) | ||
| start = time.time() | ||
| asyncio.run(t()) | ||
| print(f' edge-tts: {time.time()-start:.3f}s') | ||
| except Exception as e: | ||
| print(f' edge-tts: FAILED ({e})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python code for the edge-tts benchmark uses tempfile.mktemp, which is insecure and has been deprecated. It's vulnerable to a race condition where another process could create a file with the same name between the time mktemp returns the name and your script attempts to use it. You should use tempfile.NamedTemporaryFile within a with block to create temporary files securely.
| # edge-tts | |
| try: | |
| import asyncio, edge_tts, tempfile, os | |
| async def t(): | |
| c = edge_tts.Communicate(text, 'en-US-GuyNeural') | |
| f = tempfile.mktemp(suffix='.mp3') | |
| await c.save(f) | |
| os.unlink(f) | |
| start = time.time() | |
| asyncio.run(t()) | |
| print(f' edge-tts: {time.time()-start:.3f}s') | |
| except Exception as e: | |
| print(f' edge-tts: FAILED ({e})') | |
| # edge-tts | |
| try: | |
| import asyncio, edge_tts, tempfile | |
| async def t(): | |
| c = edge_tts.Communicate(text, 'en-US-GuyNeural') | |
| with tempfile.NamedTemporaryFile(suffix='.mp3', delete=True) as f: | |
| await c.save(f.name) | |
| start = time.time() | |
| asyncio.run(t()) | |
| print(f' edge-tts: {time.time()-start:.3f}s') | |
| except Exception as e: | |
| print(f' edge-tts: FAILED ({e})') |
| # macos say | ||
| try: | ||
| import subprocess, tempfile, os | ||
| f = tempfile.mktemp(suffix='.aiff') | ||
| start = time.time() | ||
| subprocess.run(['say', '-v', 'Samantha', '-o', f, text], check=True, capture_output=True) | ||
| print(f' macos-say: {time.time()-start:.3f}s') | ||
| os.unlink(f) | ||
| except Exception as e: | ||
| print(f' macos-say: FAILED ({e})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python code for the macos-say benchmark uses tempfile.mktemp, which is insecure and has been deprecated due to race condition vulnerabilities. Please use tempfile.NamedTemporaryFile to ensure temporary files are created securely.
| # macos say | |
| try: | |
| import subprocess, tempfile, os | |
| f = tempfile.mktemp(suffix='.aiff') | |
| start = time.time() | |
| subprocess.run(['say', '-v', 'Samantha', '-o', f, text], check=True, capture_output=True) | |
| print(f' macos-say: {time.time()-start:.3f}s') | |
| os.unlink(f) | |
| except Exception as e: | |
| print(f' macos-say: FAILED ({e})') | |
| # macos say | |
| try: | |
| import subprocess, tempfile | |
| with tempfile.NamedTemporaryFile(suffix='.aiff', delete=True) as f: | |
| start = time.time() | |
| subprocess.run(['say', '-v', 'Samantha', '-o', f.name, text], check=True, capture_output=True) | |
| print(f' macos-say: {time.time()-start:.3f}s') | |
| except Exception as e: | |
| print(f' macos-say: FAILED ({e})') |
.agents/scripts/voice-bridge.py
Outdated
| import io | ||
| import json | ||
| import logging | ||
| import os | ||
| import signal | ||
| import subprocess | ||
| import sys | ||
| import tempfile | ||
| import threading | ||
| import time | ||
| import wave | ||
| from collections import deque | ||
| from pathlib import Path | ||
| from queue import Empty, Queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.agents/scripts/voice-bridge.py
Outdated
| def _start_server(self): | ||
| """Start opencode serve in background.""" | ||
| log.info(f"Starting opencode serve on port {self.server_port}...") | ||
| proc = subprocess.Popen( | ||
| ["opencode", "serve", "--port", str(self.server_port)], | ||
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.DEVNULL, | ||
| cwd=self.cwd, | ||
| ) | ||
| # Wait for server to be ready | ||
| for _ in range(20): | ||
| time.sleep(0.5) | ||
| try: | ||
| import urllib.request | ||
|
|
||
| req = urllib.request.Request(self.server_url, method="HEAD") | ||
| urllib.request.urlopen(req, timeout=1) | ||
| self.use_attach = True | ||
| log.info("OpenCode server started") | ||
| return proc | ||
| except Exception: | ||
| continue | ||
| log.warning("OpenCode server failed to start, using standalone mode") | ||
| return proc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for line in lines: | ||
| stripped = line.strip() | ||
| if stripped.startswith("> Build+"): | ||
| continue | ||
| if stripped.startswith("$") and "aidevops" in stripped: | ||
| continue | ||
| if stripped.startswith("aidevops v"): | ||
| continue | ||
| if not stripped: | ||
| continue | ||
| clean_lines.append(stripped) | ||
| response = " ".join(clean_lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response from opencode is cleaned by stripping specific hardcoded prefixes. This approach is fragile and might break if the output format of opencode changes. Consider checking if opencode can provide a more structured output format (e.g., JSON) to make parsing more robust. If not, adding a comment to explain why this manual parsing is necessary would be helpful for future maintenance.
- Remove unused imports (io, json, wave, Path, Empty, Queue, signal) - Remove dead _start_server method (server started by voice-helper.sh) - Remove SIGINT handler that prevented graceful shutdown with handback - Remove non-functional barge-in code and BARGE_IN_FRAMES constant - Fix tempfile.mktemp (deprecated/insecure) to NamedTemporaryFile in benchmark - Add comment about fragile opencode output parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.agents/subagent-index.toon (1)
81-122:⚠️ Potential issue | 🟡 MinorTOON header count is stale:
scripts[39]but there are 41 entries.After adding
voice-helper.shandvoice-bridge.py, the actual script count in lines 82–122 is 41, not 39. Update the header to match.-<!--TOON:scripts[39]{name,purpose}: +<!--TOON:scripts[41]{name,purpose}:
🤖 Fix all issues with AI agents
In @.agents/scripts/voice-bridge.py:
- Around line 17-35: Remove the unused imports flagged by static analysis:
delete io, json, wave, Path (from pathlib), Empty, and Queue from the
top-of-file import block in .agents/scripts/voice-bridge.py; update the import
lines that currently include these symbols (e.g., remove Path from "from pathlib
import Path" and remove "Empty, Queue" from "from queue import Empty, Queue")
and run tests/lint to ensure nothing else in the file references these symbols
before committing.
- Around line 283-306: Delete the dead helper method _start_server from the
OpenCodeBridge class: it's never invoked (OpenCodeBridge.__init__ only calls
_check_server() and server startup is handled by
voice-helper.sh:ensure_opencode_server), so remove the entire _start_server
method definition to eliminate unused code and related imports/variables if they
become unused after its removal.
In @.agents/scripts/voice-helper.sh:
- Around line 265-343: The cmd_benchmark function leaks temp files when the
macOS "say" subprocess raises because tempfile.mktemp() is used and os.unlink(f)
is skipped; replace the tempfile.mktemp() usage and ensure the file is always
removed by wrapping the subprocess.run(['say', ...]) call in a try/finally that
unlinks the temp path (or switch to tempfile.NamedTemporaryFile(delete=True) and
write/read via its .name), updating the macos-say block so that the temporary
file is always cleaned up even on failure.
In @.agents/tools/voice/speech-to-speech.md:
- Around line 289-313: The phrase "The full S2S pipeline below is for advanced
use cases" in the "Voice Bridge (Recommended)" section is incorrect; update that
sentence to reference "above" instead of "below" or move the entire "Voice
Bridge (Recommended)" section so it appears before the S2S pipeline;
specifically edit the sentence that reads "The full S2S pipeline below is for
advanced use cases" in the Voice Bridge header block so it either says "above"
or reflow the document so the S2S pipeline content appears after the Voice
Bridge section.
🧹 Nitpick comments (3)
.agents/scripts/voice-helper.sh (1)
127-153:ensure_opencode_serversilently succeeds on timeout — downstream will run without a server.When the server fails to start within the 20-attempt window, the function logs a warning but returns 0.
cmd_talkthen proceeds assuming the server is available, and the Python bridge will fall back to standalone mode — so this is technically handled downstream. However, the warning message "continuing anyway..." could be clearer that it's falling back to cold-start mode, which is significantly slower (~30s vs ~6s).Suggested clarity improvement
- print_warning "OpenCode server slow to start, continuing anyway..." + print_warning "OpenCode server not ready after 10s — voice bridge will use cold-start mode (~30s per query)".agents/scripts/voice-bridge.py (2)
140-181: EdgeTTSspeak(): theNamedTemporaryFilepattern is safe but thenoqacomment is a no-op.Line 144's
# noqa: F401suppresses a rule that isn't enabled in the project's Ruff config (confirmed by Ruff's RUF100 warning). The import itself is intentionally used as a fail-fast availability check — that's fine, but the noqa directive should be removed.The temp file handling (lines 161–179) is correct: the
withblock closes the file handle beforecommunicate.save()writes to it, andos.unlinkin thefinallyblock ensures cleanup.Remove stale noqa
- import edge_tts # noqa: F401 - verify import + import edge_tts # verify availability at init time
522-545: Barge-in code is effectively dead — mic is muted during TTS playback.Lines 409–413 return early from
_audio_callbackwhenself.is_speakingis True, so_barge_in_framesis never populated. The check at line 539 (if self.barge_in and self._barge_in_frames) will always be False.The comment at lines 409–411 explains this is intentional (no echo cancellation → must mute mic), but the remaining barge-in infrastructure at lines 522–545 and the instance variables at lines 398–400 are dead code. Consider either removing the barge-in scaffolding or adding a
# TODO: enable when echo cancellation is availablecomment so future maintainers know it's deliberately dormant.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 03:29:05 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- TOON scripts[39] → scripts[41] after adding voice-bridge.py and voice-helper.sh - Fix 'below' → 'above' in speech-to-speech.md Voice Bridge section
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 03:48:10 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
New Files
.agents/scripts/voice-bridge.py-- Main Python voice bridge (~800 lines).agents/scripts/voice-helper.sh-- Shell wrapper with talk/status/benchmark/devices/voices commandsFeatures
opencode run --attachto a runningopencode servefor low-latency responsesUsage
Updated Docs
Testing
Tested live in OpenCode TUI sessions:
Summary by CodeRabbit
Release Notes
New Features
Documentation