feat(tts): add voice upload API for Qwen3-TTS by zhaotyer · Pull Request #1201 · vllm-project/vllm-omni

zhaotyer · 2026-02-04T11:38:36Z

Add POST /v1/audio/voices endpoint for uploading custom voice samples
modify GET /v1/audio/voices endpoint for listing available voices
Update API documentation with voice management examples
Support voice cloning with uploaded samples in Base task

The new endpoints allow users to:

Upload custom voice samples (max 10MB, various audio formats)
List all available voices (built-in + uploaded)
Use uploaded voices for voice cloning in TTS requests

Purpose

support add voice upload API for Qwen3-TTS

Test Plan

Test Result

Details

Files changed: - vllm_omni/entrypoints/openai/api_server.py - vllm_omni/entrypoints/openai/serving_speech.py - examples/online_serving/qwen3_tts/README.md

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

zhaotyer · 2026-02-04T11:39:39Z

Add POST /v1/audio/voices endpoint for uploading custom voice samples

modify GET /v1/audio/voices endpoint for listing available voices

Update API documentation with voice management examples

Support voice cloning with uploaded samples in Base task

The new endpoints allow users to:

Upload custom voice samples (max 10MB, various audio formats)

List all available voices (built-in + uploaded)

Use uploaded voices for voice cloning in TTS requests

Purpose

support add voice upload API for Qwen3-TTS

Test Plan

Test Result

Details
BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@hsliuustc0106

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e405d2eff

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-04T11:40:43Z

        # Validate Base task requirements
-        if task_type == "Base":
+        if task_type == "Base" and request.voice is None:
            if request.ref_audio is None:
                return "Base task requires 'ref_audio' for voice cloning"


Require ref_audio for Base when voice isn't uploaded

The new Base-task validation only enforces ref_audio when voice is missing, so a request like task_type=Base with a built-in speaker name but no ref_audio now passes validation. In that case _build_tts_params will send no ref_audio to the model (because the auto-fill only happens for uploaded voices), which breaks the Base task’s voice-cloning requirement and likely yields a model error or incorrect output. Consider requiring ref_audio unless voice refers to an uploaded speaker that will be auto-populated.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-04T11:40:43Z

+        # Generate filename
+        timestamp = int(time.time())
+        file_ext = audio_file.filename.split('.')[-1] if '.' in audio_file.filename else "wav"
+        filename = f"{name}_{consent}_{timestamp}.{file_ext}"
+        file_path = self.uploaded_speakers_dir / filename
+


Prevent path traversal in uploaded voice filename

The upload endpoint builds filename directly from untrusted name and consent and then writes file_path = self.uploaded_speakers_dir / filename. If either field contains path separators or .., the resulting path can escape /tmp/voice_samples and overwrite arbitrary files on the host. This is a security issue that can be triggered by a client POSTing a crafted name/consent. Sanitize these inputs (e.g., allowlist safe characters) or normalize and validate that the resolved path stays within the upload directory.

Useful? React with 👍 / 👎.

linyueqian · 2026-02-04T15:41:19Z

A few thoughts: (1) Consider supporting pre-extracted embedding uploads (.pt/.npy) in addition to audio files to skip extraction overhead at inference time. (2) The /tmp/voice_samples storage is volatile. Maybe make this path configurable or document this limitation. (3) Missing a DELETE endpoint to remove uploaded voices.

hsliuustc0106 · 2026-02-05T04:38:04Z

please also update the docs as well in apiserver

linyueqian · 2026-02-05T04:39:32Z

please also update the docs as well in apiserver

we probably need to merge #1206 first

Copilot

Pull request overview

This pull request adds voice upload functionality for Qwen3-TTS, allowing users to upload custom voice samples for voice cloning. The implementation adds new API endpoints for uploading and listing voice samples, along with automatic integration into the TTS workflow.

Changes:

Added POST /v1/audio/voices endpoint for uploading custom voice samples (max 10MB)
Modified GET /v1/audio/voices endpoint to return both built-in and uploaded voices
Implemented auto-set behavior that automatically uses uploaded voice audio for Base task TTS requests

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 20 comments.

File	Description
vllm_omni/entrypoints/openai/serving_speech.py	Core voice upload logic including file storage, metadata management, and auto-set ref_audio behavior for uploaded voices
vllm_omni/entrypoints/openai/api_server.py	API endpoint definitions for voice upload and enhanced voice listing with uploaded voice details
examples/online_serving/qwen3_tts/README.md	Documentation for new voice management endpoints with usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T04:49:07Z


        # Validate Base task requirements
-        if task_type == "Base":
+        if task_type == "Base" and request.voice is None:


The validation doesn't check if an uploaded voice file actually exists when using Base task with an uploaded voice. If task_type is "Base" and voice is an uploaded voice name, but the audio file is missing or unreadable, the auto-set logic at lines 320-325 will silently fail (returning None from _get_uploaded_audio_data), and the Base task will proceed without ref_audio, potentially causing downstream errors. Consider adding validation to ensure uploaded voices have accessible audio files, especially for Base task.

Suggested change

if task_type == "Base" and request.voice is None:

if task_type == "Base":

# Base task always requires explicit ref_audio to avoid relying on

# potentially failing auto-set logic from uploaded voices.

Copilot · 2026-02-05T04:49:08Z

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
+        # Initialize uploaded speakers storage
+        self.uploaded_speakers_dir = Path("/tmp/voice_samples")


Using a hardcoded path '/tmp/voice_samples' poses several issues:

Security: Multiple users/deployments on the same system will share this directory

Persistence: Files in /tmp may be deleted by system cleanup processes

Portability: This path may not work on all operating systems (e.g., Windows)

Consider using a configurable directory path that can be set via environment variable or configuration parameter, and ensure proper isolation for multi-tenant scenarios.

Suggested change

self.uploaded_speakers_dir = Path("/tmp/voice_samples")

base_dir_env = os.getenv("VLLM_OMNI_VOICE_SAMPLES_DIR")

if base_dir_env:

self.uploaded_speakers_dir = Path(base_dir_env)

else:

# Use a portable, user-specific cache directory by default

xdg_cache_home = os.getenv("XDG_CACHE_HOME")

if xdg_cache_home:

cache_base = Path(xdg_cache_home)

else:

cache_base = Path.home() / ".cache"

self.uploaded_speakers_dir = cache_base / "vllm_omni" / "voice_samples"

Copilot · 2026-02-05T04:49:08Z

+
+        # Save audio file
+        try:
+            with open(file_path, 'wb') as f:
+                content = await audio_file.read()
+                f.write(content)
+        except Exception as e:
+            raise ValueError(f"Failed to save audio file: {e}")
+
+        # Update metadata
+        self.uploaded_speakers[voice_name_lower] = {
+            "name": name,
+            "consent": consent,
+            "file_path": str(file_path),
+            "created_at": timestamp,
+            "mime_type": mime_type,
+            "original_filename": audio_file.filename,
+            "file_size": file_size
+        }
+
+        # Update supported speakers
+        self.supported_speakers.add(voice_name_lower)
+
+        # Save metadata
+        self._save_uploaded_speakers()
+
+        logger.info(f"Uploaded new voice '{name}' with consent ID '{consent}'")
+
+        return {
+            "name": name,
+            "consent": consent,
+            "file_path": str(file_path),
+            "created_at": timestamp,
+            "mime_type": mime_type,
+            "file_size": file_size


There's a potential race condition: if the file is successfully written but saving metadata fails, the uploaded file becomes orphaned. Consider using a transaction-like pattern where you first save the file with a temporary name, then update metadata, and only rename to final name if both succeed. Also consider cleanup of orphaned files on initialization.

Suggested change

# Save audio file

try:

with open(file_path, 'wb') as f:

content = await audio_file.read()

f.write(content)

except Exception as e:

raise ValueError(f"Failed to save audio file: {e}")

# Update metadata

self.uploaded_speakers[voice_name_lower] = {

"name": name,

"consent": consent,

"file_path": str(file_path),

"created_at": timestamp,

"mime_type": mime_type,

"original_filename": audio_file.filename,

"file_size": file_size

}

# Update supported speakers

self.supported_speakers.add(voice_name_lower)

# Save metadata

self._save_uploaded_speakers()

logger.info(f"Uploaded new voice '{name}' with consent ID '{consent}'")

return {

"name": name,

"consent": consent,

"file_path": str(file_path),

"created_at": timestamp,

"mime_type": mime_type,

"file_size": file_size

temp_file_path = self.uploaded_speakers_dir / f"{filename}.tmp"

# Save audio file to a temporary path first to avoid orphaned files

try:

content = await audio_file.read()

with open(temp_file_path, "wb") as f:

f.write(content)

# Update metadata in memory

self.uploaded_speakers[voice_name_lower] = {

"name": name,

"consent": consent,

"file_path": str(file_path),

"created_at": timestamp,

"mime_type": mime_type,

"original_filename": audio_file.filename,

"file_size": file_size,

}

# Update supported speakers

self.supported_speakers.add(voice_name_lower)

# Persist metadata

self._save_uploaded_speakers()

# Atomically move the temp file to its final location

os.replace(temp_file_path, file_path)

except Exception as e:

# Clean up temp file and roll back in-memory state on failure

try:

if isinstance(temp_file_path, Path):

if temp_file_path.exists():

temp_file_path.unlink()

else:

if os.path.exists(temp_file_path):

os.remove(temp_file_path)

except Exception:

# Best-effort cleanup; ignore secondary errors

pass

# Roll back any partially updated metadata

if hasattr(self, "uploaded_speakers"):

self.uploaded_speakers.pop(voice_name_lower, None)

if hasattr(self, "supported_speakers"):

try:

self.supported_speakers.discard(voice_name_lower)

except AttributeError:

# In case supported_speakers is not a set-like object

try:

self.supported_speakers.remove(voice_name_lower)

except Exception:

pass

raise ValueError(f"Failed to upload voice: {e}")

logger.info(f"Uploaded new voice '{name}' with consent ID '{consent}'")

return {

"name": name,

"consent": consent,

"file_path": str(file_path),

"created_at": timestamp,

"mime_type": mime_type,

"file_size": file_size,

Copilot · 2026-02-05T04:49:08Z

+    def _save_uploaded_speakers(self) -> None:
+        """Save uploaded speakers to metadata file."""
+        try:
+            metadata = {"uploaded_speakers": self.uploaded_speakers}
+            with open(self.metadata_file, 'w') as f:
+                json.dump(metadata, f, indent=2)
+        except Exception as e:
+            logger.error(f"Could not save uploaded speakers metadata: {e}")


The metadata.json file could grow unbounded as users upload more voices. There's no mechanism to limit the number of uploaded voices or to delete old voices. Consider implementing:

A maximum number of uploaded voices per instance

An API endpoint to delete uploaded voices

A cleanup mechanism for old/unused voices

Copilot · 2026-02-05T04:49:08Z

+    def _save_uploaded_speakers(self) -> None:
+        """Save uploaded speakers to metadata file."""
+        try:
+            metadata = {"uploaded_speakers": self.uploaded_speakers}
+            with open(self.metadata_file, 'w') as f:
+                json.dump(metadata, f, indent=2)
+        except Exception as e:
+            logger.error(f"Could not save uploaded speakers metadata: {e}")


The metadata file is not protected by any locking mechanism. In a multi-process or multi-threaded environment, concurrent uploads could lead to race conditions where:

Two processes read the same metadata

Both add their voice

One overwrites the other's changes when saving

Consider using file locking (e.g., fcntl on Unix, msvcrt on Windows) or a database for thread-safe metadata storage.

Copilot · 2026-02-05T04:49:11Z

@@ -1,7 +1,11 @@
 import asyncio
+import json
+import os


The 'os' module is imported but never used in the code. This import should be removed to keep the codebase clean.

Suggested change

import os

Copilot · 2026-02-05T04:49:11Z

+  "voice": {
+    "name": "custom_voice_1",
+    "consent": "user_consent_id",
+    "file_path": "/tmp/voice_samples/custom_voice_1_user_consent_id_1738660000.wav",


The documentation exposes the internal file path '/tmp/voice_samples/' in the response example. This is a potential information disclosure issue as it reveals the server's internal directory structure. Consider either:

Not returning the file_path in the API response

Sanitizing the path to not reveal absolute server paths

Returning a relative or opaque identifier instead

Suggested change

"file_path": "/tmp/voice_samples/custom_voice_1_user_consent_id_1738660000.wav",

"file_path": "custom_voice_1_user_consent_id_1738660000.wav",

Copilot · 2026-02-05T04:49:11Z

+
+        # Generate filename
+        timestamp = int(time.time())
+        file_ext = audio_file.filename.split('.')[-1] if '.' in audio_file.filename else "wav"


The file extension extraction logic is fragile. If the filename has no extension or multiple dots (e.g., 'my.voice.sample.wav'), splitting by '.' and taking the last element works, but if there's no dot in the filename, the entire filename becomes the extension. This should be handled more robustly, perhaps by using Path(audio_file.filename).suffix or providing a default extension if none is found.

Suggested change

file_ext = audio_file.filename.split('.')[-1] if '.' in audio_file.filename else "wav"

raw_filename = audio_file.filename or ""

suffix = Path(raw_filename).suffix.lstrip(".")

file_ext = suffix if suffix else "wav"

Copilot · 2026-02-05T04:49:11Z

+    consent: str = Form(...),
+    name: str = Form(...),


The consent parameter is stored but never validated or used for any authorization checks. If consent is meant to represent user consent for voice cloning, there should be validation logic to verify:

The consent ID format/validity

Whether the consent is still active

Logging/audit trail for consent usage

Without proper consent validation, this could lead to compliance issues with privacy regulations.

Copilot · 2026-02-05T04:49:11Z

+
+#### POST /v1/audio/voices
+
+Upload a new voice sample for voice cloning in Base task TTS requests.


The documentation states that uploaded voices can be used "for voice cloning in Base task TTS requests", but the implementation doesn't enforce that uploaded voices are only used with Base task. An uploaded voice can be used with any task type due to the auto-set logic at lines 320-325, which could lead to unexpected behavior. Consider either:

Clarifying in the documentation that uploaded voices work with any task type

Restricting uploaded voices to Base task only in the code

Making the auto-set behavior conditional on task_type being "Base"

Suggested change

Upload a new voice sample for voice cloning in Base task TTS requests.

Upload a new voice sample that can be used for voice cloning in subsequent TTS requests with any supported task type.

zhaotyer · 2026-02-05T07:08:35Z

please also update the docs as well in apiserver

already add docs in apiserver, copy from #1206

linyueqian · 2026-02-05T22:56:12Z

A few issues from the Copilot review still look unaddressed after the latest commit:

Security (must fix before merge):

Path traversal: name and consent are used directly in the filename (f"{name}_{consent}_{timestamp}.{file_ext}"). A crafted name like ../../etc/cron.d/evil escapes the upload directory. Sanitize to alphanumeric/underscore/hyphen only, or validate the resolved path stays within uploaded_speakers_dir.
File path disclosure: The API response returns the full server path (/tmp/voice_samples/...). Drop file_path from the response or return an opaque identifier instead.

Logic bugs (must fix):

Base task validation bypass: The change if task_type == "Base" and request.voice is None means a request with a built-in speaker name (e.g. voice=vivian) + task_type=Base + no ref_audio now passes validation. The auto-fill only kicks in for uploaded voices, so this will break downstream. Should check request.voice.lower() in self.uploaded_speakers specifically.
Silent auto-set failure: If an uploaded voice's audio file is missing/deleted, _get_uploaded_audio_data returns None silently, and the Base task proceeds without ref_audio. Should return an error instead.

Minor (nice to have):

Move import base64 to top of file
Use Path(filename).suffix instead of split('.')[-1] for extension extraction
The consent/name emptiness checks in api_server.py are dead code since Form(...) already enforces required
Docs response examples still show /tmp/voice_samples/ paths

Also heads up: PR #1227 adds speaker_embedding support and touches the same validation/param-building code. Coordinating so #1227 rebases after this lands (commented there already).

zhaotyer · 2026-02-06T08:30:38Z

A few issues from the Copilot review still look unaddressed after the latest commit:

Security (must fix before merge):

Path traversal: name and consent are used directly in the filename (f"{name}_{consent}_{timestamp}.{file_ext}"). A crafted name like ../../etc/cron.d/evil escapes the upload directory. Sanitize to alphanumeric/underscore/hyphen only, or validate the resolved path stays within uploaded_speakers_dir.

File path disclosure: The API response returns the full server path (/tmp/voice_samples/...). Drop file_path from the response or return an opaque identifier instead.

Logic bugs (must fix):

Base task validation bypass: The change if task_type == "Base" and request.voice is None means a request with a built-in speaker name (e.g. voice=vivian) + task_type=Base + no ref_audio now passes validation. The auto-fill only kicks in for uploaded voices, so this will break downstream. Should check request.voice.lower() in self.uploaded_speakers specifically.

Silent auto-set failure: If an uploaded voice's audio file is missing/deleted, _get_uploaded_audio_data returns None silently, and the Base task proceeds without ref_audio. Should return an error instead.

Minor (nice to have):

Move import base64 to top of file

Use Path(filename).suffix instead of split('.')[-1] for extension extraction

The consent/name emptiness checks in api_server.py are dead code since Form(...) already enforces required

Docs response examples still show /tmp/voice_samples/ paths

Also heads up: PR #1227 adds speaker_embedding support and touches the same validation/param-building code. Coordinating so #1227 rebases after this lands (commented there already).

you are right,i will fix it

Port the voice upload API (POST /v1/audio/voices) from upstream vllm-project#1201 into the HT branch, adapted to coexist with HT's existing streaming and audio extraction changes. - Add upload_voice(), _load/_save_uploaded_speakers() to serving_speech - Add POST /v1/audio/voices endpoint to api_server - Modify GET /v1/audio/voices to include uploaded voice details - Auto-set ref_audio for uploaded voices in Base task - Add docs/serving/speech_api.md documentation Note: Known upstream review issues (path traversal, metadata locking, validation bypass for built-in voices) are carried as-is for parity and will be addressed in a follow-up. Upstream-PR: vllm-project#1201 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fixes security and logic issues flagged in upstream PR vllm-project#1201 review: Security: - Sanitize name/consent to alphanumeric/underscore/hyphen only - Validate resolved path stays within upload directory - Remove file_path from API responses (information disclosure) Logic bugs: - Base task validation now correctly requires ref_audio unless voice is specifically an uploaded voice (not just any voice name) - _get_uploaded_audio_data raises ValueError instead of returning None when audio file is missing, preventing silent failures Robustness: - Atomic metadata writes via tempfile + os.replace - File locking (fcntl.flock) on metadata.json reads and writes - Use Path().suffix for file extension extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zhaotyer · 2026-02-09T06:54:36Z

A few issues from the Copilot review still look unaddressed after the latest commit:

Security (must fix before merge):

Path traversal: name and consent are used directly in the filename (f"{name}_{consent}_{timestamp}.{file_ext}"). A crafted name like ../../etc/cron.d/evil escapes the upload directory. Sanitize to alphanumeric/underscore/hyphen only, or validate the resolved path stays within uploaded_speakers_dir.

File path disclosure: The API response returns the full server path (/tmp/voice_samples/...). Drop file_path from the response or return an opaque identifier instead.

Logic bugs (must fix):

Base task validation bypass: The change if task_type == "Base" and request.voice is None means a request with a built-in speaker name (e.g. voice=vivian) + task_type=Base + no ref_audio now passes validation. The auto-fill only kicks in for uploaded voices, so this will break downstream. Should check request.voice.lower() in self.uploaded_speakers specifically.

Silent auto-set failure: If an uploaded voice's audio file is missing/deleted, _get_uploaded_audio_data returns None silently, and the Base task proceeds without ref_audio. Should return an error instead.

Minor (nice to have):

Move import base64 to top of file

Use Path(filename).suffix instead of split('.')[-1] for extension extraction

The consent/name emptiness checks in api_server.py are dead code since Form(...) already enforces required

Docs response examples still show /tmp/voice_samples/ paths

Also heads up: PR #1227 adds speaker_embedding support and touches the same validation/param-building code. Coordinating so #1227 rebases after this lands (commented there already).

I have made changes to these parts and added the speaker emb caching feature.
@linyueqian

hsliuustc0106 · 2026-02-09T07:23:43Z

fix precommits&DCO please

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-09T07:44:03Z

+**Response Example:**
+
+```json
+{
+  "success": true,
+  "voice": {
+    "name": "custom_voice_1",
+    "consent": "user_consent_id",
+    "file_path": "/tmp/voice_samples/custom_voice_1_user_consent_id_1738660000.wav",
+    "created_at": 1738660000,
+    "mime_type": "audio/wav",
+    "file_size": 1024000
+  }


The documented response example for POST /v1/audio/voices includes file_path, but the implementation intentionally does not return server file paths. Update the example to match the actual response schema (and also document the updated GET /v1/audio/voices response shape, which now includes uploaded_voices).

Copilot · 2026-02-09T07:44:03Z

+@router.post(
+    "/v1/audio/voices",
+    responses={
+        HTTPStatus.OK.value: {"model": dict},
+        HTTPStatus.BAD_REQUEST.value: {"model": ErrorResponse},
+        HTTPStatus.INTERNAL_SERVER_ERROR.value: {"model": ErrorResponse},
+    },
+)
+async def upload_voice(
+    raw_request: Request,
+    audio_sample: UploadFile = File(...),
+    consent: str = Form(...),
+    name: str = Form(...),
+):
+    """Upload a new voice sample for voice cloning.
+
+    Uploads an audio file that can be used as a reference for voice cloning
+    in Base task TTS requests. The voice can then be referenced by name
+    in subsequent TTS requests.
+
+    Args:
+        audio_sample: Audio file (max 10MB)
+        consent: Consent recording ID
+        name: Name for the new voice
+        raw_request: Raw FastAPI request
+
+    Returns:
+        JSON response with voice information
+    """
+    handler = Omnispeech(raw_request)
+    if handler is None:
+        return base(raw_request).create_error_response(message="The model does not support Speech API")
+
+    try:
+        # Validate required parameters
+        if not consent:
+            return base(raw_request).create_error_response(message="consent is required")
+        if not name:
+            return base(raw_request).create_error_response(message="name is required")
+
+        # Upload the voice
+        result = await handler.upload_voice(audio_sample, consent, name)
+
+        return JSONResponse(content={
+            "success": True,
+            "voice": result
+        })


There are existing unit tests for Speech serving, but this PR adds a new POST /v1/audio/voices endpoint and new behavior for Base requests using voice without ref_audio (uploaded voices). Add tests that cover: successful upload (metadata persisted), duplicate name handling, file size/type validation, and that a subsequent Base request with voice=<uploaded> works without ref_audio.

Copilot · 2026-02-09T07:44:03Z

+@router.post(
+    "/v1/audio/voices",
+    responses={
+        HTTPStatus.OK.value: {"model": dict},
+        HTTPStatus.BAD_REQUEST.value: {"model": ErrorResponse},
+        HTTPStatus.INTERNAL_SERVER_ERROR.value: {"model": ErrorResponse},
+    },
+)
+async def upload_voice(
+    raw_request: Request,
+    audio_sample: UploadFile = File(...),
+    consent: str = Form(...),
+    name: str = Form(...),
+):
+    """Upload a new voice sample for voice cloning.
+
+    Uploads an audio file that can be used as a reference for voice cloning
+    in Base task TTS requests. The voice can then be referenced by name
+    in subsequent TTS requests.
+
+    Args:
+        audio_sample: Audio file (max 10MB)
+        consent: Consent recording ID
+        name: Name for the new voice
+        raw_request: Raw FastAPI request
+
+    Returns:
+        JSON response with voice information
+    """
+    handler = Omnispeech(raw_request)
+    if handler is None:
+        return base(raw_request).create_error_response(message="The model does not support Speech API")
+
+    try:
+        # Validate required parameters
+        if not consent:
+            return base(raw_request).create_error_response(message="consent is required")
+        if not name:
+            return base(raw_request).create_error_response(message="name is required")
+
+        # Upload the voice
+        result = await handler.upload_voice(audio_sample, consent, name)
+
+        return JSONResponse(content={
+            "success": True,
+            "voice": result
+        })
+


POST /v1/audio/voices persists uploaded files under SPEECH_VOICE_SAMPLES with no retention/cleanup or quota beyond per-file size. Over time this can fill disk (especially with many uploads). Consider adding a configurable max number/total size of uploaded voices per server instance, and/or a delete endpoint / periodic cleanup policy.

Copilot · 2026-02-09T07:44:03Z

+        except Exception as e:
+            logger.error(f"Failed to save cache for speaker {speaker}: {e}")
+            # Update status to failed
+            self.update_metadata_cache_info(speaker, Path(""), "failed")
+            return False


In the exception path of save_voice_cache, update_metadata_cache_info is called with Path(""), which stringifies to "." and gets persisted as cache_file. This leaves misleading metadata and can point to a directory. Prefer clearing cache_file/cache_generated_at (e.g., write null/empty) when status is "failed", or update only the status field on failures.

Copilot · 2026-02-09T07:44:04Z

+
+            # Check cache status and file existence
+            if cache_file_path and cache_file_path.exists() and cache_status == "ready":
+                logger.info(f"Using cached voice clone prompt for speaker: {speaker}")
+
+                # Load cached VoiceClonePromptItem
+                cached_items = torch.load(cache_file_path, map_location="cpu", weights_only=False)


load_cached_voice_prompt loads cache_file from metadata.json and passes it directly into torch.load(..., weights_only=False). If metadata.json is tampered with, this can point outside the intended directory and torch.load can execute arbitrary code via pickle. Validate that cache_file_path resolves within SPEECH_VOICE_SAMPLES (and ideally only allow .pt files you created), and consider switching to a safer serialization format (e.g., store plain tensors in a JSON + safetensors/npz) so untrusted paths can’t trigger code execution.

Suggested change

# Check cache status and file existence

if cache_file_path and cache_file_path.exists() and cache_status == "ready":

logger.info(f"Using cached voice clone prompt for speaker: {speaker}")

# Load cached VoiceClonePromptItem

cached_items = torch.load(cache_file_path, map_location="cpu", weights_only=False)

# Validate cache file path before loading with torch.load

resolved_cache_path: Optional[Path] = None

if cache_file_path is not None:

try:

resolved_cache_path = cache_file_path.resolve()

except Exception as exc:

logger.warning(

f"Failed to resolve cache file path for speaker {speaker}: {exc}"

)

return None

# If a base directory for speech voice samples is defined, ensure the

# resolved cache path is within that directory to prevent directory traversal.

base_dir = globals().get("SPEECH_VOICE_SAMPLES")

if base_dir is not None:

try:

base_dir_path = Path(base_dir).resolve()

resolved_cache_path.relative_to(base_dir_path)

except Exception:

logger.warning(

f"Cache file path for speaker {speaker} is outside of the allowed "

f"directory; refusing to load."

)

return None

# Optionally restrict to known-safe extension (e.g., .pt)

if resolved_cache_path.suffix != ".pt":

logger.warning(

f"Unexpected cache file extension for speaker {speaker}: "

f"{resolved_cache_path.suffix!r}; refusing to load."

)

return None

# Check cache status and file existence

if resolved_cache_path and resolved_cache_path.exists() and cache_status == "ready":

logger.info(f"Using cached voice clone prompt for speaker: {speaker}")

# Load cached VoiceClonePromptItem

cached_items = torch.load(resolved_cache_path, map_location="cpu", weights_only=False)

Copilot · 2026-02-09T07:44:04Z

+    def _save_uploaded_speakers(self) -> None:
+        """Save uploaded speakers to metadata file."""
+        try:
+            metadata = {"uploaded_speakers": self.uploaded_speakers}
+            with open(self.metadata_file, 'w') as f:
+                json.dump(metadata, f, indent=2)
+        except Exception as e:


_save_uploaded_speakers overwrites metadata.json with the in-memory self.uploaded_speakers dict. Since cache generation updates metadata.json from VoiceCacheManager independently, this write can silently clobber cache_status/cache_file/cache_generated_at (and concurrent uploads can lose updates/corrupt JSON). Use an atomic write + file lock, and merge/reload the on-disk metadata before writing so cache updates aren’t lost.

Copilot · 2026-02-09T07:44:04Z

+
+            # If voice is an uploaded speaker and no ref_audio provided, auto-set it
+            if request.voice.lower() in self.uploaded_speakers and request.ref_audio is None:
+                audio_data = self._get_uploaded_audio_data(request.voice)
+                if audio_data:
+                    params["ref_audio"] = [audio_data]
+                    params["x_vector_only_mode"] = [True]
+                    logger.info(f"Auto-set ref_audio for uploaded voice: {request.voice}")
+                else:
+                    raise ValueError(f"Audio file for uploaded voice '{request.voice}' is missing or corrupted")
+


For uploaded voices, _build_tts_params base64-encodes the audio file on every request when ref_audio isn’t provided, even though generate_voice_clone can use the on-disk cache after the first run. This adds avoidable CPU/memory overhead (up to 10MB read + encode per request). Consider checking cache_status == "ready" (and cache file existence) in metadata and skipping the ref_audio injection when the cache is already available.

Suggested change

# If voice is an uploaded speaker and no ref_audio provided, auto-set it

if request.voice.lower() in self.uploaded_speakers and request.ref_audio is None:

audio_data = self._get_uploaded_audio_data(request.voice)

if audio_data:

params["ref_audio"] = [audio_data]

params["x_vector_only_mode"] = [True]

logger.info(f"Auto-set ref_audio for uploaded voice: {request.voice}")

else:

raise ValueError(f"Audio file for uploaded voice '{request.voice}' is missing or corrupted")

# If voice is an uploaded speaker and no ref_audio provided, auto-set it

if request.voice.lower() in self.uploaded_speakers and request.ref_audio is None:

speaker_key = request.voice.lower()

metadata = self.uploaded_speakers.get(speaker_key)

# If a cache is already ready on disk for this uploaded voice, rely on it

cache_status = None

cache_path = None

if isinstance(metadata, dict):

cache_status = metadata.get("cache_status")

cache_path = metadata.get("cache_path")

if cache_status == "ready" and cache_path and os.path.exists(cache_path):

# Cache is ready; no need to inject ref_audio (generate_voice_clone will use cache)

logger.info(

"Using existing cache for uploaded voice '%s'; skipping ref_audio injection",

request.voice,

)

else:

audio_data = self._get_uploaded_audio_data(request.voice)

if audio_data:

params["ref_audio"] = [audio_data]

params["x_vector_only_mode"] = [True]

logger.info(f"Auto-set ref_audio for uploaded voice: {request.voice}")

else:

raise ValueError(

f"Audio file for uploaded voice '{request.voice}' is missing or corrupted"

)

zhaotyer · 2026-02-10T06:57:43Z

@linyueqian

Gaohan123 · 2026-02-10T08:40:01Z

PR #1206 merged. Please rebase

hsliuustc0106 · 2026-02-23T00:57:23Z

pleas resolve conflicts

hsliuustc0106 · 2026-02-23T14:19:38Z

🤖 Code Review: PR #1201 — Voice Upload API 🔴

Verdict: Request Changes

Summary: Adds REST endpoints to upload, list, and delete custom voice samples for voice cloning. Extends OmniOpenAIServingSpeech with uploaded_speakers management and auto-sets ref_audio + x_vector_only_mode when an uploaded voice is used. Architecture is clean with good separation between API layer (serving_speech.py) and storage/caching (VoiceSampleManager in qwen3_tts.py). This is the best-tested of the 4 PRs. However, there are path traversal vulnerabilities, state persistence issues on restart, and race conditions on concurrent uploads that must be fixed.

Key Concerns:

⚠️ Path traversal vulnerability: the 'name' parameter is used to construct file paths (/tmp/voice_samples/{name}{consent}{timestamp}.wav). If name contains '../' or other path components, this could write outside the intended directory. Must sanitize input.
⚠️ In-memory + file state inconsistency: uploaded_speakers is an in-memory dict, but metadata is persisted to JSON via VoiceSampleManager. On server restart, the in-memory dict is empty but files remain on disk. Verify load_uploaded_speakers_from_metadata() is called during init.
⚠️ Race condition on concurrent uploads: multiple simultaneous uploads could race on writing the metadata JSON file. No file locking or atomic write.
⚠️ Storage in /tmp/voice_samples/ is world-readable on most systems and cleared on reboot. Should be configurable for production use.

Detailed Feedback

📄 `vllm_omni/entrypoints/openai/serving_speech.py`

🔴 CRITICAL (security)

Path traversal vulnerability: the 'name' parameter from user input is used directly to construct file paths (/tmp/voice_samples/{name}{consent}{timestamp}.wav). A name containing '../' could write files outside the intended directory. This is exploitable via the upload API endpoint.

import re

def _sanitize_voice_name(name: str) -> str:
    """Sanitize voice name to prevent path traversal."""
    # Only allow alphanumeric, underscore, hyphen
    sanitized = re.sub(r'[^a-zA-Z0-9_-]', '', name)
    if not sanitized:
        raise ValueError('Voice name must contain at least one alphanumeric character')
    return sanitized

🟠 HIGH (architecture)

In-memory uploaded_speakers dict is not populated from persisted metadata on startup. If the server restarts, _build_tts_params() checks self.uploaded_speakers (in-memory) which will be empty, so previously uploaded voices are silently lost. Must call load_uploaded_speakers_from_metadata() during init.

# In OmniOpenAIServingSpeech.__init__():
def __init__(self, ...):
    # ... existing init ...
    self.uploaded_speakers = {}
    self.voice_manager = VoiceSampleManager(storage_dir=self.voice_storage_dir)
    # Restore persisted voices on startup
    self.uploaded_speakers = self.voice_manager.load_uploaded_speakers_from_metadata()

🟡 MEDIUM (security)

No authentication on upload/delete endpoints. Any client can upload or delete voices. Acceptable for initial implementation but should be documented as a known limitation with a TODO for auth.

🟡 MEDIUM (security)

MIME type validation checks the client-reported Content-Type which can be spoofed. Consider also checking file magic bytes to verify the uploaded file is actually audio.

# After saving the file, verify magic bytes:
import struct
with open(filepath, 'rb') as f:
    header = f.read(4)
# WAV files start with 'RIFF'
if header[:4] != b'RIFF':
    os.remove(filepath)
    raise ValueError('Uploaded file is not a valid WAV file')

📄 `vllm_omni/model_executor/models/qwen3_tts.py`

🟠 HIGH (architecture)

Race condition: multiple simultaneous uploads could race on writing the metadata JSON file. No file locking or atomic write pattern. Concurrent writes could corrupt the JSON.

import tempfile
import os

def _save_metadata(self, metadata: dict):
    """Atomically write metadata JSON."""
    # Write to temp file first, then atomic rename
    fd, tmp_path = tempfile.mkstemp(
        dir=os.path.dirname(self.metadata_path),
        suffix='.tmp'
    )
    try:
        with os.fdopen(fd, 'w') as f:
            json.dump(metadata, f, indent=2)
        os.replace(tmp_path, self.metadata_path)
    except:
        os.unlink(tmp_path)
        raise

🟡 MEDIUM (architecture)

Voice storage hardcoded to /tmp/voice_samples/. This directory is world-readable on most systems and cleared on reboot. For production, the storage directory should be configurable via server args or environment variable.

# Make configurable via server args:
DEFAULT_VOICE_STORAGE = os.environ.get(
    'VLLM_VOICE_STORAGE_DIR',
    '/tmp/voice_samples'
)

📄 `tests/test_voice_upload.py`

🟢 LOW (quality) — Line 511

Leftover debug code: '# import pdb;pdb.set_trace()' should be removed.

🟡 MEDIUM (tests)

Good test breadth (upload success, file too large, invalid MIME, duplicate name, delete, list, build_tts_params integration) but missing security edge cases: path traversal in name field, concurrent upload race condition, restart persistence.

def test_upload_path_traversal():
    """Verify names with path traversal are rejected."""
    response = client.post('/v1/audio/voices', data={
        'name': '../../../etc/evil',
        'file': wav_fixture,
    })
    assert response.status_code == 400

def test_upload_name_special_chars():
    """Verify names with special chars are sanitized."""
    response = client.post('/v1/audio/voices', data={
        'name': 'voice;rm -rf /',
        'file': wav_fixture,
    })
    assert response.status_code == 400

📄 `docs/serving/speech_api.md`

🟢 LOW (docs) — Line 108

Missing closing ``` for the 'use upload voice' curl example. The bash code block is not closed before the '## Supported Models' section.

🟢 LOW (docs)

Examples show port 8091 but this may not match the default config. Verify consistency.

Suggestions

Sanitize the name parameter: reject names containing '/', '..', or non-alphanumeric characters (except '_' and '-').
Make the voice storage directory configurable via server args rather than hardcoding /tmp/voice_samples/.
Add atomic writes for the metadata JSON (write to temp file, then os.replace).
Load persisted voices on server startup to survive restarts.
Remove the debug comment on line 511.
Fix the unclosed code block in the docs.

Part of Issue #938 — Qwen3-TTS Production Ready

Review generated by vLLM-Omni PR Review Agent

hsliuustc0106 · 2026-02-24T08:02:59Z

@vllm-omni-reviewer

hsliuustc0106 · 2026-02-27T12:11:51Z

fix precommits please

linyueqian · 2026-02-27T16:09:19Z

Please rebase on latest main. Also a couple issues:

await self.metadata_manager.create_speaker(...) in upload_voice() but create_speaker is sync (returns bool). This will raise TypeError at runtime. delete_voice correctly calls it without await.
VoiceCacheManager() in qwen3_tts.py creates its own MetadataManager instead of sharing the one from serving_speech.py. Two managers for the same metadata.json is fragile - pass the shared instance.
Docs still hardcode /tmp/voice_samples/ but storage is now configurable via SPEECH_VOICE_SAMPLES.

linyueqian · 2026-03-04T21:04:18Z

Rebased onto main, resolved conflicts, and fixed pre-commit issues. Also fixed the await on sync create_speaker and removed file_path from the docs response example.

Tested locally with the Base model - upload, list, TTS with uploaded voice, duplicate rejection, and delete all work correctly.

@hsliuustc0106 this should be ready to merge.

linyueqian

Tested locally, LGTM.

linyueqian · 2026-03-04T21:11:00Z

Uploaded voices always use x_vector_only_mode=True, which skips ICL and gives lower quality. Adding an optional ref_text field to the upload endpoint would let stored voices use ICL mode instead. The cache infrastructure already supports it. Worth considering as a follow-up.

ukemamaster · 2026-03-06T10:51:53Z

Hi @zhaotyer , I appreciate that you addressed this issue which can occur in production environments.

Can you please clarify the following:

Only a wav file will be accepted in the "POST /v1/audio/voices endpoint for uploading custom voice samples"? or the .pt file can also be passed, which is pre-computed "voice_clone_prompt" as:

voice_clone_prompt = base_model.create_voice_clone_prompt(
       ref_audio=(wavs[0], sr),
       ref_text=text
)
torch.save(voice_clone_prompt, path)

Need to add voices on every vllm server startup? Or there is automation to load the already added voices from disk?

Signed-off-by: linyueqian <linyueqian@outlook.com>

…mment Signed-off-by: linyueqian <linyueqian@outlook.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>

Signed-off-by: linyueqian <linyueqian@outlook.com> Co-authored-by: linyueqian <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Signed-off-by: lishunyang <lishunyang12@163.com>

JuanPZuluaga · 2026-03-13T14:37:03Z

+
+    # Get uploaded speakers details
+    uploaded_speakers = []
+    if hasattr(handler, "uploaded_speakers"):
+        for voice_name, info in handler.uploaded_speakers.items():
+            uploaded_speakers.append(
+                {
+                    "name": info.get("name", voice_name),
+                    "consent": info.get("consent", ""),
+                    "created_at": info.get("created_at", 0),
+                    "file_size": info.get("file_size", 0),
+                    "mime_type": info.get("mime_type", ""),
+                }
+            )
+
+    return JSONResponse(content={"voices": speakers, "uploaded_voices": uploaded_speakers})
+
+
+@router.post(
+    "/v1/audio/voices",
+    responses={


@Sy0307 @linyueqian hi, I have a quick question. Did this router get deleted?

i think it may be deleted by accident. let me check if i can revert it.

yeah, #1201 introduced it by PR #1719 removed it by accident, probably in the rebase. :(

yeah, #1201 introduced it by PR #1719 removed it by accident, probably in the rebase. :(

Sorry it is a mistake. I removed it as rebase fault. Apologize for it :(

it's already back in main :-) @Sy0307

it's already back in main :-) @Sy0307

Yeah I got it. Just thanks for your guys' report and fix. I missed some emails last week XD

ukemamaster · 2026-03-31T11:42:50Z

@zhaotyer Once a custom voice is added to the Base model, how to use it for TTS? Is there a python client code for this, which support streaming?
I tried the openai_speech_client.py but i get this error:
{"error":{"message":"Base task requires 'ref_audio' or 'speaker_embedding' for voice cloning","type":"BadRequestError","param":null,"code":400}}

Signed-off-by: linyueqian <linyueqian@outlook.com> Co-authored-by: linyueqian <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>

zhaotyer requested a review from hsliuustc0106 as a code owner February 4, 2026 11:38

chatgpt-codex-connector Bot reviewed Feb 4, 2026

View reviewed changes

hsliuustc0106 requested a review from Copilot February 5, 2026 04:37

Copilot started reviewing on behalf of hsliuustc0106 February 5, 2026 04:38 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

This was referenced Feb 5, 2026

[RFC]: Qwen3-TTS Production Ready - February Milestone #938

Open

[Frontend] Speaker embedding support for speech and voices APIs #1227

Merged

marksverdhai mentioned this pull request Feb 6, 2026

feat(tts): integrate voice upload API (upstream #1201) heiervang-technologies/ht-vllm-omni#12

Closed

5 tasks

hsliuustc0106 requested a review from Copilot February 9, 2026 07:23

tzhouam self-requested a review February 9, 2026 07:30

Copilot started reviewing on behalf of hsliuustc0106 February 9, 2026 07:32 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

zhaotyer force-pushed the add_custom_voice branch 4 times, most recently from e5140f1 to b71ab32 Compare February 10, 2026 02:43

zhaotyer force-pushed the add_custom_voice branch from b71ab32 to d1ce0d8 Compare February 10, 2026 10:38

Gaohan123 mentioned this pull request Feb 27, 2026

[Bug]: Qwen3 TTS Voice Clone Mode Fails to Run #1553

Closed

1 task

linyueqian mentioned this pull request Mar 4, 2026

[RFC]: voice_clone for Qwen3-TTS Base model #1603

Closed

1 task

linyueqian force-pushed the add_custom_voice branch 2 times, most recently from a5e6bba to e161519 Compare March 4, 2026 20:53

linyueqian added ready label to trigger buildkite CI labels Mar 4, 2026

linyueqian modified the milestone: v0.18.0 Mar 4, 2026

linyueqian approved these changes Mar 4, 2026

View reviewed changes

linyueqian added 4 commits March 8, 2026 21:49

Rebase voice upload feature onto main and fix pre-commit

2c6d99b

Signed-off-by: linyueqian <linyueqian@outlook.com>

Fix await on sync create_speaker method

eb163bb

Signed-off-by: linyueqian <linyueqian@outlook.com>

Remove file_path from docs response example, add VoiceCacheManager co…

0a76363

…mment Signed-off-by: linyueqian <linyueqian@outlook.com>

Fix CI test failures in test_serving_speech

52c3aa9

Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian force-pushed the add_custom_voice branch from 10cfa28 to 52c3aa9 Compare March 9, 2026 01:53

Merge branch 'main' into add_custom_voice

eee09b8

hsliuustc0106 merged commit 41964cb into vllm-project:main Mar 9, 2026
6 of 7 checks passed

linyueqian mentioned this pull request Mar 10, 2026

[Feature]: Add a separate endpoint for create_voice_clone_prompt for qwen3-TTS model #1760

Open

1 task

JuanPZuluaga reviewed Mar 13, 2026

View reviewed changes

linyueqian mentioned this pull request Mar 13, 2026

[Bugfix] Restore voice upload API and profiler endpoints reverted by #1719 #1879

Merged

3 tasks

-        self.uploaded_speakers_dir = Path("/tmp/voice_samples")
+        base_dir_env = os.getenv("VLLM_OMNI_VOICE_SAMPLES_DIR")
+        if base_dir_env:
+            self.uploaded_speakers_dir = Path(base_dir_env)
+        else:
+            # Use a portable, user-specific cache directory by default
+            xdg_cache_home = os.getenv("XDG_CACHE_HOME")
+            if xdg_cache_home:
+                cache_base = Path(xdg_cache_home)
+            else:
+                cache_base = Path.home() / ".cache"
+            self.uploaded_speakers_dir = cache_base / "vllm_omni" / "voice_samples"

	"file_path": "/tmp/voice_samples/custom_voice_1_user_consent_id_1738660000.wav",
	"file_path": "custom_voice_1_user_consent_id_1738660000.wav",

-        file_ext = audio_file.filename.split('.')[-1] if '.' in audio_file.filename else "wav"
+        raw_filename = audio_file.filename or ""
+        suffix = Path(raw_filename).suffix.lstrip(".")
+        file_ext = suffix if suffix else "wav"


		#### POST /v1/audio/voices

		Upload a new voice sample for voice cloning in Base task TTS requests.

	Upload a new voice sample for voice cloning in Base task TTS requests.
	Upload a new voice sample that can be used for voice cloning in subsequent TTS requests with any supported task type.

-            # Check cache status and file existence
-            if cache_file_path and cache_file_path.exists() and cache_status == "ready":
-                logger.info(f"Using cached voice clone prompt for speaker: {speaker}")
-                # Load cached VoiceClonePromptItem
-                cached_items = torch.load(cache_file_path, map_location="cpu", weights_only=False)
+            # Validate cache file path before loading with torch.load
+            resolved_cache_path: Optional[Path] = None
+            if cache_file_path is not None:
+                try:
+                    resolved_cache_path = cache_file_path.resolve()
+                except Exception as exc:
+                    logger.warning(
+                        f"Failed to resolve cache file path for speaker {speaker}: {exc}"
+                    )
+                    return None
+                # If a base directory for speech voice samples is defined, ensure the
+                # resolved cache path is within that directory to prevent directory traversal.
+                base_dir = globals().get("SPEECH_VOICE_SAMPLES")
+                if base_dir is not None:
+                    try:
+                        base_dir_path = Path(base_dir).resolve()
+                        resolved_cache_path.relative_to(base_dir_path)
+                    except Exception:
+                        logger.warning(
+                            f"Cache file path for speaker {speaker} is outside of the allowed "
+                            f"directory; refusing to load."
+                        )
+                        return None
+                # Optionally restrict to known-safe extension (e.g., .pt)
+                if resolved_cache_path.suffix != ".pt":
+                    logger.warning(
+                        f"Unexpected cache file extension for speaker {speaker}: "
+                        f"{resolved_cache_path.suffix!r}; refusing to load."
+                    )
+                    return None
+            # Check cache status and file existence
+            if resolved_cache_path and resolved_cache_path.exists() and cache_status == "ready":
+                logger.info(f"Using cached voice clone prompt for speaker: {speaker}")
+                # Load cached VoiceClonePromptItem
+                cached_items = torch.load(resolved_cache_path, map_location="cpu", weights_only=False)

Conversation

zhaotyer commented Feb 4, 2026

Purpose

Test Plan

Test Result

Uh oh!

zhaotyer commented Feb 4, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Feb 4, 2026

Uh oh!

hsliuustc0106 commented Feb 5, 2026

Uh oh!

linyueqian commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

zhaotyer commented Feb 5, 2026

Uh oh!

linyueqian commented Feb 5, 2026

Uh oh!

zhaotyer commented Feb 6, 2026

Uh oh!

zhaotyer commented Feb 9, 2026

Uh oh!

hsliuustc0106 commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

📄 `vllm_omni/entrypoints/openai/serving_speech.py`

📄 `vllm_omni/model_executor/models/qwen3_tts.py`

📄 `tests/test_voice_upload.py`

📄 `docs/serving/speech_api.md`