feat: add voice AI integration with HuggingFace speech-to-speech pipeline #403

marcusquinn · 2026-02-06T22:31:29Z

Summary

Add tools/voice/speech-to-speech.md subagent documenting the full HuggingFace speech-to-speech pipeline (VAD/STT/LLM/TTS) with all component options, deployment modes, and integration patterns
Add speech-to-speech-helper.sh with setup/start/stop/status/client/config/benchmark commands, auto-detection of platform and GPU, config presets (low-latency, low-vram, quality, mac, multilingual)
Update README with expanded voice usage instructions, recommended hardware specs table, cloud GPU provider guidance, and Voice AI service coverage section
Add Voice row to AGENTS.md progressive disclosure table
Update subagent-index.toon with new entries (40 subagents, 39 scripts)
Add future TODOs: t131 (tools/vision/), t132 (multimodal taxonomy), t133 (cloud GPU guide)

New Files

.agents/tools/voice/speech-to-speech.md — Subagent covering modular voice pipeline with swappable components (7 STT, 3 LLM, 6 TTS implementations), local Mac/CUDA/server/Docker deployment, multi-language support, recommended configs by VRAM, and integration with Twilio/Remotion/DevOps
.agents/scripts/speech-to-speech-helper.sh — Helper script (zero ShellCheck violations) with platform auto-detection, PID management for background mode, Docker support, and config presets

Quality

ShellCheck: 0 violations on new script
Markdownlint: 0 violations on new/modified files
All pre-existing lint issues are unrelated to this PR

Summary by CodeRabbit

Release Notes

New Features
- Added speech-to-speech voice pipeline supporting local, GPU, and cloud deployment modes.
- Added management utility for voice pipeline setup, configuration, and operation.
- Expanded voice tool catalog with multiple AI voice options.
Documentation
- Added comprehensive speech-to-speech pipeline documentation with deployment guides and configuration presets.
- Updated README with hardware recommendations and voice AI integration details.

…line - Add tools/voice/speech-to-speech.md subagent with full component docs (VAD, STT, LLM, TTS options), deployment modes (local Mac/CUDA/server/Docker), multi-language support, and integration patterns - Add speech-to-speech-helper.sh with setup/start/stop/status/client/config/benchmark commands, auto-detection of platform and GPU, config presets, zero ShellCheck violations - Update README with voice usage instructions, hardware recommendations, cloud GPU provider guidance, and Voice AI service coverage section - Add Voice row to AGENTS.md progressive disclosure table - Update subagent-index.toon with voice subagent and helper script entries - Add future TODOs: t131 (tools/vision/), t132 (multimodal taxonomy), t133 (cloud GPU guide)

coderabbitai · 2026-02-06T22:31:47Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 23 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR introduces a comprehensive Speech-to-Speech voice pipeline featuring a new Bash orchestration script supporting local, Docker, and remote GPU deployments, complete with multi-stage architecture documentation (VAD→STT→LLM→TTS), updated service registry, and enhanced README with hardware recommendations.

Changes

Cohort / File(s)	Summary
Voice AI Implementation `.agents/scripts/speech-to-speech-helper.sh`, `.agents/tools/voice/speech-to-speech.md`	New 524-line orchestration script with setup, start/stop, status, client, config, and benchmark commands supporting multiple deployment modes (local-mac, cuda, server, docker). Comprehensive documentation covering pipeline architecture, component options, deployment recipes, multi-language support, cloud providers, and troubleshooting.
Registry & Index Updates `.agents/AGENTS.md`, `.agents/subagent-index.toon`	Added Voice domain entry to agent documentation; incremented subagent/script indices (39→40, 38→39); registered speech-to-speech-helper.sh and tools/voice/ in public catalog; expanded tool categories (credentials, security, task-management, automation, etc.).
Project Documentation `README.md`, `TODO.md`	Expanded Voice AI section in README with full pipeline description, Quick Start examples, hardware tier guidance (minimum/recommended/power-user/server), and setup instructions. Updated TODO backlog with vision/tools, multimodal evaluation, and cloud GPU deployment tasks.

Sequence Diagram

sequenceDiagram
    participant User
    participant Helper as speech-to-speech-helper.sh
    participant Setup as Setup/Pip
    participant Pipeline as Voice Pipeline<br/>(Local/Docker)
    participant VAD as VAD Module
    participant STT as STT Module
    participant LLM as LLM Module
    participant TTS as TTS Module

    User->>Helper: cmd_setup
    Helper->>Setup: Clone repo, detect GPU/platform
    Setup->>Setup: Install dependencies (NLTK, torch)
    Setup-->>Helper: Ready

    User->>Helper: cmd_start [mode]
    Helper->>Pipeline: Validate, check PID
    activate Pipeline
    Pipeline->>VAD: Initialize voice activity detection
    Pipeline->>STT: Initialize speech-to-text
    Pipeline->>LLM: Initialize language model
    Pipeline->>TTS: Initialize text-to-speech

    User->>Pipeline: Voice input
    VAD->>STT: Audio detected
    STT->>LLM: Transcription
    LLM->>TTS: Response text
    TTS->>User: Audio output
    deactivate Pipeline

    User->>Helper: cmd_stop
    Helper->>Pipeline: Terminate process/Docker

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🎤 A voice pipeline springs to life,
VAD, STT, through the strife—
LLM thinks, TTS speaks clear,
GPU-powered conversations here! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely summarizes the main change: adding voice AI integration with a HuggingFace speech-to-speech pipeline, which is the core objective of this changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/voice-speech-to-speech

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-06T22:31:48Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's AI capabilities by integrating a full-fledged, modular voice AI pipeline. It provides tools and documentation for setting up and managing speech-to-speech functionalities, supporting various deployment modes (local, server, Docker) and hardware configurations, including Apple Silicon and NVIDIA GPUs. The changes aim to make voice-driven interactions and integrations more accessible and robust within the ecosystem.

Highlights

Voice AI Integration: Introduced a comprehensive voice AI pipeline based on HuggingFace's speech-to-speech, covering VAD, STT, LLM, and TTS components for modular and flexible voice interactions.
Helper Script for Pipeline Management: Added a new speech-to-speech-helper.sh script to streamline the setup, start, stop, status, client connection, configuration, and benchmarking of the voice AI pipeline, featuring platform auto-detection and Docker support.
Extensive Documentation and Guidance: Provided detailed documentation in speech-to-speech.md and updated README.md with usage instructions, recommended hardware specifications, and guidance for cloud GPU deployment for voice AI workloads.
Agent Index and TODO Updates: Integrated the new voice subagent and helper script into the AGENTS.md and subagent-index.toon files, and outlined future tasks for vision AI integration and a general cloud GPU deployment guide in TODO.md.

Changelog

.agents/AGENTS.md
- Added a new row for 'Voice' under the tools section, linking to the new speech-to-speech.md subagent.
.agents/scripts/speech-to-speech-helper.sh
- New script added to manage the HuggingFace speech-to-speech pipeline.
- Includes functions for setup (cloning repo, installing dependencies, NLTK data), starting the pipeline (with modes like local-mac, cuda, server, docker, and language options), stopping, checking status, connecting as a client, providing config presets, and running benchmarks.
- Features platform auto-detection (mac-arm64, linux-cuda, linux-cpu) and GPU detection (mps, cuda, cpu).
- Manages PID files for background processes and supports Docker.
.agents/subagent-index.toon
- Updated the total count of subagents from 39 to 40.
- Added an entry for tools/voice/speech-to-speech with its purpose and key files.
- Updated the total count of scripts from 38 to 39.
- Added an entry for speech-to-speech-helper.sh with its purpose.
.agents/tools/voice/speech-to-speech.md
- New subagent documentation file.
- Provides a quick reference for the HuggingFace speech-to-speech pipeline, its purpose, architecture (VAD -> STT -> LLM -> TTS), helper script usage, installation directory, and supported languages.
- Details component options for VAD (Silero), STT (Whisper, Faster Whisper, MLX variants, Paraformer, Parakeet, Moonshine), LLM (Transformers, MLX-LM, OpenAI API), and TTS (Parler-TTS, MeloTTS, ChatTTS, Kokoro, FacebookMMS, Pocket TTS).
- Explains deployment modes: Local (macOS Apple Silicon, CUDA GPU), Server/Client, and Docker.
- Covers setup instructions, requirements (Python, PyTorch, uv, CUDA/MPS, sounddevice, VRAM), multi-language support, CLI parameters, and integration patterns (Voice-Driven DevOps, Transcription Pipeline, Twilio Phone Integration, Video Narration).
- Lists cloud GPU providers and recommended configurations for low-latency, low-VRAM, best quality, and macOS optimal setups.
- Includes a troubleshooting section.
README.md
- Significantly expanded the 'Voice Integration' section to describe the new open-source speech-to-speech pipeline, its architecture, quick start commands, recommended hardware, cloud GPU providers, and supported languages.
- Added a new 'Recommended Hardware' table under the 'Requirements' section, detailing CPU, RAM, and GPU needs for different tiers (Minimum, Recommended, Power User, Server).
- Introduced a 'Voice AI' subsection under 'Service Coverage' to highlight the new Speech-to-Speech and Pipecat frameworks.
TODO.md
- Added three new pending TODO items: t131 (Create tools/vision/ category), t132 (Evaluate multimodal category vs. cross-references), and t133 (Cloud GPU deployment guide).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-06T22:32:05Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 24 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Feb 6 22:31:58 UTC 2026: Code review monitoring started
Fri Feb 6 22:31:59 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 24
Fri Feb 6 22:31:59 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Fri Feb 6 22:32:01 UTC 2026: Codacy analysis completed with auto-fixes
Fri Feb 6 22:32:02 UTC 2026: Applied 1 automatic fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 24
VULNERABILITIES: 0

Generated on: Fri Feb 6 22:32:03 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request introduces a significant new feature: a voice AI integration using a HuggingFace speech-to-speech pipeline. It includes a comprehensive helper script for managing the pipeline, detailed documentation for the new subagent, and updates to the main README. The code is generally well-structured and robust. My review focuses on improving error handling, clarity, and documentation accuracy in the new helper script and subagent markdown file. I've identified a few areas for improvement, such as un-suppressing important error messages, clarifying an obscure shell command, and correcting documentation to match the implemented functionality. All original comments have been retained as they do not contradict or require modification based on the provided rules.

.agents/tools/voice/speech-to-speech.md

.agents/scripts/speech-to-speech-helper.sh

.agents/tools/voice/speech-to-speech.md

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In @.agents/scripts/speech-to-speech-helper.sh:
- Around line 488-494: Update the help examples in the
speech-to-speech-helper.sh script to avoid concrete values: replace the example
"  $0 client --host 192.168.1.100" with a placeholder version such as "  $0
client --host <HOST>" (or --host HOST) so the examples use generic placeholders;
edit the echo lines that print examples (the block that includes "Examples:" and
the subsequent echo strings) to use the placeholder instead of the hard-coded
IP.
- Around line 186-193: The auto-detect logic currently sets mode="cuda" for
unknown GPUs causing failure on CPU-only hosts; update the case in the block
that calls detect_gpu to choose a CPU-safe default (e.g., mode="local-cpu" or
"cpu") or require explicit user opt-in when detect_gpu returns "cpu"/unknown;
modify the case statement handling (the section that sets mode based on the gpu
variable) so the default (*) and the "cpu" branch set a CPU-compatible mode or
print a clear error asking the user to pass an explicit --mode.
- Around line 343-348: The Docker shutdown block (using S2S_DIR and the docker
compose calls) needs the same guard used elsewhere to avoid failing when Docker
isn't installed; before running any `docker compose -f
"${S2S_DIR}/docker-compose.yml" ...` commands, check that the docker binary is
available (e.g., `command -v docker >/dev/null 2>&1`) and only run the `docker
compose` commands if that check passes and the compose file exists, otherwise
skip the stop logic so `set -e` won't cause a non-zero exit when Docker is
absent.

In @.agents/tools/voice/speech-to-speech.md:
- Around line 220-223: The docs reference a non-existent "transcribe" subcommand
in speech-to-speech-helper.sh; update the example to use the supported STT-only
invocation by pointing users to s2s_pipeline.py flags (e.g., replace the code
block using "speech-to-speech-helper.sh transcribe" with guidance to run "python
s2s_pipeline.py -h" and a brief example of the current STT-only flags), or
alternatively implement a "transcribe" wrapper in speech-to-speech-helper.sh
that internally calls s2s_pipeline.py in STT-only mode; update the file
.agents/tools/voice/speech-to-speech.md accordingly and reference the helper
script name (speech-to-speech-helper.sh) and the pipeline entrypoint
(s2s_pipeline.py) so readers can find the right command.
- Around line 76-82: Add a short note near the `--llm open_api` entry (e.g.,
directly under the table or after the "Model selection" line) telling users to
store their OpenAI API key in the project's secure credential store and not in
plain files; reference the expected env var name (e.g., OPENAI_API_KEY) and
where the project looks for credentials (the established secure storage
mechanism used by the repo) so CLI use of `--llm open_api` reads credentials
securely instead of from source. Ensure the note mentions `--llm open_api` and
the model selection flags (`--lm_model_name` / `--mlx_lm_model_name`) so readers
know when to apply this guidance.

In `@README.md`:
- Around line 665-721: Condense the "Voice Integration" section: replace the
long Quick start, full CLI examples, and exhaustive hardware/method tables with
a short summary paragraph and a single compact "Getting started" one-liner that
mentions speech-to-speech-helper.sh; fold the detailed setup steps, presets,
hardware recommendations, and extra methods into a pointer to the subagent doc
(speech-to-speech.md) and include a file:line reference there; keep one small
expandable details/blockquote if needed for progressive disclosure but remove
duplicated how-to content from README.md.
- Around line 687-697: Replace the hardcoded IP in the client example
(speech-to-speech-helper.sh client --host 192.168.1.100) with a placeholder like
<SERVER_IP>, and update the "low-latency" preset description to include a short
note about secure API credential storage for OpenAI (or cloud APIs) by pointing
users to store keys in ~/.config/aidevops/mcp-env.sh with strict permissions
(chmod 600). Ensure the README text references the client invocation
(speech-to-speech-helper.sh client --host <SERVER_IP>) and the credential file
path and permissions in the same section so users know where and how to securely
store their API keys.

.agents/scripts/speech-to-speech-helper.sh

.agents/tools/voice/speech-to-speech.md

README.md

- Remove undocumented transcribe command reference from subagent docs - Show NLTK download errors (redirect stdout only, not stderr) - Show GPU detection output in status command - Use explicit shift guard instead of shift 2>/dev/null pattern

github-actions · 2026-02-06T22:39:02Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 24 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Feb 6 22:38:55 UTC 2026: Code review monitoring started
Fri Feb 6 22:38:55 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 24
Fri Feb 6 22:38:55 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Fri Feb 6 22:38:57 UTC 2026: Codacy analysis completed with auto-fixes
Fri Feb 6 22:38:59 UTC 2026: Applied 1 automatic fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 24
VULNERABILITIES: 0

Generated on: Fri Feb 6 22:39:01 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-06T22:39:21Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

- Fix CPU-only auto-detect falling back to cuda (crashes on CPU hosts) Now defaults to --server mode with warning - Add docker guard to cmd_stop (consistent with cmd_docker_start/cmd_status) - Replace || exit with || return 1 in docker functions (proper set -e handling) - Replace hardcoded 192.168.1.100 with <server-ip> placeholder in help - Fix PyTorch version typo: 2.10+ -> 2.4+ (matches upstream torch>=2.4.0) - Add API key storage guidance for OpenAI API usage Already fixed in prior PRs (replied with evidence): - nltk stderr suppression (PR #447) - cmd_stop fixed sleep 2 -> polling loop (PR #447) - transcribe command docs removed (commit fd2aa84) - detect_gpu stderr no longer suppressed - shift pattern refactored to explicit if/shift - README Voice section already condensed ShellCheck: zero violations.

* chore: mark t141, t147.6 complete - all PR #403 review threads resolved * fix(docs): align speech-to-speech.md with actual helper commands (t141) - Add missing 'client' command to Quick Reference command list - Clarify Voice-Driven DevOps as conceptual integration pattern - Fix Transcription section: point to transcription.md instead of implying the S2S helper has a transcribe command - Separate 'how to use S2S for transcription' from standalone guidance

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

coderabbitai bot requested changes Feb 6, 2026

View reviewed changes

marcusquinn merged commit 439a4ea into main Feb 6, 2026
11 checks passed

marcusquinn mentioned this pull request Feb 7, 2026

fix: triage PR #403 voice AI review feedback (t147.6) #458

Merged

coderabbitai bot approved these changes Feb 7, 2026

View reviewed changes

marcusquinn added a commit that referenced this pull request Feb 7, 2026

chore: mark t141, t147.6 complete - all PR #403 review threads resolved

12bcb2a

marcusquinn mentioned this pull request Feb 7, 2026

chore: mark t141, t147.6 complete - all PR #403 review threads resolved #473

Closed

marcusquinn added a commit that referenced this pull request Feb 7, 2026

chore: mark t141, t147.6 complete - all PR #403 review threads resolved

eb734d6

marcusquinn mentioned this pull request Feb 7, 2026

fix: triage all 50 review threads across 11 merged PRs (t147) #487

Merged

coderabbitai bot mentioned this pull request Feb 8, 2026

feat: add hyprwhspr speech-to-text subagent for Linux (t027) #575

Merged

feat: add voice AI integration with HuggingFace speech-to-speech pipeline #403

feat: add voice AI integration with HuggingFace speech-to-speech pipeline #403

Conversation

marcusquinn commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Files

Quality

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 6, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 6, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading