-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add voice AI integration with HuggingFace speech-to-speech pipeline #403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…line - Add tools/voice/speech-to-speech.md subagent with full component docs (VAD, STT, LLM, TTS options), deployment modes (local Mac/CUDA/server/Docker), multi-language support, and integration patterns - Add speech-to-speech-helper.sh with setup/start/stop/status/client/config/benchmark commands, auto-detection of platform and GPU, config presets, zero ShellCheck violations - Update README with voice usage instructions, hardware recommendations, cloud GPU provider guidance, and Voice AI service coverage section - Add Voice row to AGENTS.md progressive disclosure table - Update subagent-index.toon with voice subagent and helper script entries - Add future TODOs: t131 (tools/vision/), t132 (multimodal taxonomy), t133 (cloud GPU guide)
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR introduces a comprehensive Speech-to-Speech voice pipeline featuring a new Bash orchestration script supporting local, Docker, and remote GPU deployments, complete with multi-stage architecture documentation (VAD→STT→LLM→TTS), updated service registry, and enhanced README with hardware recommendations. Changes
Sequence DiagramsequenceDiagram
participant User
participant Helper as speech-to-speech-helper.sh
participant Setup as Setup/Pip
participant Pipeline as Voice Pipeline<br/>(Local/Docker)
participant VAD as VAD Module
participant STT as STT Module
participant LLM as LLM Module
participant TTS as TTS Module
User->>Helper: cmd_setup
Helper->>Setup: Clone repo, detect GPU/platform
Setup->>Setup: Install dependencies (NLTK, torch)
Setup-->>Helper: Ready
User->>Helper: cmd_start [mode]
Helper->>Pipeline: Validate, check PID
activate Pipeline
Pipeline->>VAD: Initialize voice activity detection
Pipeline->>STT: Initialize speech-to-text
Pipeline->>LLM: Initialize language model
Pipeline->>TTS: Initialize text-to-speech
User->>Pipeline: Voice input
VAD->>STT: Audio detected
STT->>LLM: Transcription
LLM->>TTS: Response text
TTS->>User: Audio output
deactivate Pipeline
User->>Helper: cmd_stop
Helper->>Pipeline: Terminate process/Docker
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's AI capabilities by integrating a full-fledged, modular voice AI pipeline. It provides tools and documentation for setting up and managing speech-to-speech functionalities, supporting various deployment modes (local, server, Docker) and hardware configurations, including Apple Silicon and NVIDIA GPUs. The changes aim to make voice-driven interactions and integrations more accessible and robust within the ecosystem. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 22:32:03 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature: a voice AI integration using a HuggingFace speech-to-speech pipeline. It includes a comprehensive helper script for managing the pipeline, detailed documentation for the new subagent, and updates to the main README. The code is generally well-structured and robust. My review focuses on improving error handling, clarity, and documentation accuracy in the new helper script and subagent markdown file. I've identified a few areas for improvement, such as un-suppressing important error messages, clarifying an obscure shell command, and correcting documentation to match the implemented functionality. All original comments have been retained as they do not contradict or require modification based on the provided rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @.agents/scripts/speech-to-speech-helper.sh:
- Around line 488-494: Update the help examples in the
speech-to-speech-helper.sh script to avoid concrete values: replace the example
" $0 client --host 192.168.1.100" with a placeholder version such as " $0
client --host <HOST>" (or --host HOST) so the examples use generic placeholders;
edit the echo lines that print examples (the block that includes "Examples:" and
the subsequent echo strings) to use the placeholder instead of the hard-coded
IP.
- Around line 186-193: The auto-detect logic currently sets mode="cuda" for
unknown GPUs causing failure on CPU-only hosts; update the case in the block
that calls detect_gpu to choose a CPU-safe default (e.g., mode="local-cpu" or
"cpu") or require explicit user opt-in when detect_gpu returns "cpu"/unknown;
modify the case statement handling (the section that sets mode based on the gpu
variable) so the default (*) and the "cpu" branch set a CPU-compatible mode or
print a clear error asking the user to pass an explicit --mode.
- Around line 343-348: The Docker shutdown block (using S2S_DIR and the docker
compose calls) needs the same guard used elsewhere to avoid failing when Docker
isn't installed; before running any `docker compose -f
"${S2S_DIR}/docker-compose.yml" ...` commands, check that the docker binary is
available (e.g., `command -v docker >/dev/null 2>&1`) and only run the `docker
compose` commands if that check passes and the compose file exists, otherwise
skip the stop logic so `set -e` won't cause a non-zero exit when Docker is
absent.
In @.agents/tools/voice/speech-to-speech.md:
- Around line 220-223: The docs reference a non-existent "transcribe" subcommand
in speech-to-speech-helper.sh; update the example to use the supported STT-only
invocation by pointing users to s2s_pipeline.py flags (e.g., replace the code
block using "speech-to-speech-helper.sh transcribe" with guidance to run "python
s2s_pipeline.py -h" and a brief example of the current STT-only flags), or
alternatively implement a "transcribe" wrapper in speech-to-speech-helper.sh
that internally calls s2s_pipeline.py in STT-only mode; update the file
.agents/tools/voice/speech-to-speech.md accordingly and reference the helper
script name (speech-to-speech-helper.sh) and the pipeline entrypoint
(s2s_pipeline.py) so readers can find the right command.
- Around line 76-82: Add a short note near the `--llm open_api` entry (e.g.,
directly under the table or after the "Model selection" line) telling users to
store their OpenAI API key in the project's secure credential store and not in
plain files; reference the expected env var name (e.g., OPENAI_API_KEY) and
where the project looks for credentials (the established secure storage
mechanism used by the repo) so CLI use of `--llm open_api` reads credentials
securely instead of from source. Ensure the note mentions `--llm open_api` and
the model selection flags (`--lm_model_name` / `--mlx_lm_model_name`) so readers
know when to apply this guidance.
In `@README.md`:
- Around line 665-721: Condense the "Voice Integration" section: replace the
long Quick start, full CLI examples, and exhaustive hardware/method tables with
a short summary paragraph and a single compact "Getting started" one-liner that
mentions speech-to-speech-helper.sh; fold the detailed setup steps, presets,
hardware recommendations, and extra methods into a pointer to the subagent doc
(speech-to-speech.md) and include a file:line reference there; keep one small
expandable details/blockquote if needed for progressive disclosure but remove
duplicated how-to content from README.md.
- Around line 687-697: Replace the hardcoded IP in the client example
(speech-to-speech-helper.sh client --host 192.168.1.100) with a placeholder like
<SERVER_IP>, and update the "low-latency" preset description to include a short
note about secure API credential storage for OpenAI (or cloud APIs) by pointing
users to store keys in ~/.config/aidevops/mcp-env.sh with strict permissions
(chmod 600). Ensure the README text references the client invocation
(speech-to-speech-helper.sh client --host <SERVER_IP>) and the credential file
path and permissions in the same section so users know where and how to securely
store their API keys.
- Remove undocumented transcribe command reference from subagent docs - Show NLTK download errors (redirect stdout only, not stderr) - Show GPU detection output in status command - Use explicit shift guard instead of shift 2>/dev/null pattern
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Feb 6 22:39:01 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
- Fix CPU-only auto-detect falling back to cuda (crashes on CPU hosts) Now defaults to --server mode with warning - Add docker guard to cmd_stop (consistent with cmd_docker_start/cmd_status) - Replace || exit with || return 1 in docker functions (proper set -e handling) - Replace hardcoded 192.168.1.100 with <server-ip> placeholder in help - Fix PyTorch version typo: 2.10+ -> 2.4+ (matches upstream torch>=2.4.0) - Add API key storage guidance for OpenAI API usage Already fixed in prior PRs (replied with evidence): - nltk stderr suppression (PR #447) - cmd_stop fixed sleep 2 -> polling loop (PR #447) - transcribe command docs removed (commit fd2aa84) - detect_gpu stderr no longer suppressed - shift pattern refactored to explicit if/shift - README Voice section already condensed ShellCheck: zero violations.
- Fix CPU-only auto-detect falling back to cuda (crashes on CPU hosts) Now defaults to --server mode with warning - Add docker guard to cmd_stop (consistent with cmd_docker_start/cmd_status) - Replace || exit with || return 1 in docker functions (proper set -e handling) - Replace hardcoded 192.168.1.100 with <server-ip> placeholder in help - Fix PyTorch version typo: 2.10+ -> 2.4+ (matches upstream torch>=2.4.0) - Add API key storage guidance for OpenAI API usage Already fixed in prior PRs (replied with evidence): - nltk stderr suppression (PR #447) - cmd_stop fixed sleep 2 -> polling loop (PR #447) - transcribe command docs removed (commit fd2aa84) - detect_gpu stderr no longer suppressed - shift pattern refactored to explicit if/shift - README Voice section already condensed ShellCheck: zero violations.
* chore: mark t141, t147.6 complete - all PR #403 review threads resolved * fix(docs): align speech-to-speech.md with actual helper commands (t141) - Add missing 'client' command to Quick Reference command list - Clarify Voice-Driven DevOps as conceptual integration pattern - Fix Transcription section: point to transcription.md instead of implying the S2S helper has a transcribe command - Separate 'how to use S2S for transcription' from standalone guidance



Summary
tools/voice/speech-to-speech.mdsubagent documenting the full HuggingFace speech-to-speech pipeline (VAD/STT/LLM/TTS) with all component options, deployment modes, and integration patternsspeech-to-speech-helper.shwith setup/start/stop/status/client/config/benchmark commands, auto-detection of platform and GPU, config presets (low-latency, low-vram, quality, mac, multilingual)New Files
.agents/tools/voice/speech-to-speech.md— Subagent covering modular voice pipeline with swappable components (7 STT, 3 LLM, 6 TTS implementations), local Mac/CUDA/server/Docker deployment, multi-language support, recommended configs by VRAM, and integration with Twilio/Remotion/DevOps.agents/scripts/speech-to-speech-helper.sh— Helper script (zero ShellCheck violations) with platform auto-detection, PID management for background mode, Docker support, and config presetsQuality
Summary by CodeRabbit
Release Notes
New Features
Documentation