Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions docs/serving/speech_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,81 @@ Lists available voices for the loaded model.
"voices": ["aiden", "dylan", "eric", "ono_anna", "ryan", "serena", "sohee", "uncle_fu", "vivian"]
}
```
```
POST /v1/audio/voices
Content-Type: multipart/form-data
```

Upload a new voice sample for voice cloning in Base task TTS requests.

**Form Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `audio_sample` | file | Yes | Audio file (max 10MB, supported formats: wav, mp3, flac, ogg, aac, webm, mp4) |
| `consent` | string | Yes | Consent recording ID |
| `name` | string | Yes | Name for the new voice |

**Response Example:**

```json
{
"success": true,
"voice": {
"name": "custom_voice_1",
"consent": "user_consent_id",
"created_at": 1738660000,
"mime_type": "audio/wav",
"file_size": 1024000
}
Comment on lines +143 to +154
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented response example for POST /v1/audio/voices includes file_path, but the implementation intentionally does not return server file paths. Update the example to match the actual response schema (and also document the updated GET /v1/audio/voices response shape, which now includes uploaded_voices).

Copilot uses AI. Check for mistakes.
}
```

**Usage Example:**

```bash
curl -X POST http://localhost:8091/v1/audio/voices \
-F "audio_sample=@/path/to/voice_sample.wav" \
-F "consent=user_consent_id" \
-F "name=custom_voice_1"
```


```bash
DELETE /v1/audio/voices/{name}
```

Delete an uploaded voice sample.

**Path Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `name` | string | Yes | Name of the voice to delete |

**Response Example:**

```json
{
"success": true,
"message": "Voice 'custom_voice_1' deleted successfully"
}
```

**Error Response (404 Not Found):**

```json
{
"success": false,
"error": "Voice 'unknown_voice' not found"
}
```

**Usage Example:**

```bash
curl -X DELETE http://localhost:8091/v1/audio/voices/custom_voice_1
```

## Examples

Expand Down Expand Up @@ -185,6 +260,25 @@ curl -X POST http://localhost:8091/v1/audio/speech \
}' --output cloned.wav
```

upload voice
```bash
curl -X POST http://localhost:8091/v1/audio/voices \
-F "audio_sample=@/path/to/voice_sample.wav" \
-F "consent=user_consent_id" \
-F "name=custom_voice_1"
```

use upload voice
```bash
curl -X POST http://localhost:8091/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, this is a cloned voice",
"task_type": "Base",
"voice": "custom_voice_1"
}' --output cloned.wav
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing ``` for this code block. The bash example runs into ## Supported Models without being closed.

```

## Supported Models

| Model | Task Type | Description |
Expand Down
63 changes: 51 additions & 12 deletions examples/online_serving/qwen3_tts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,29 +184,68 @@ sudo apt install ffmpeg

## API Reference

### Endpoint

```
POST /v1/audio/speech
Content-Type: application/json
```
### Voices Endpoint

This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters.
#### GET /v1/audio/voices

### Voices Endpoint
List all available voices/speakers from the loaded model, including both built-in model voices and uploaded custom voices.

**Response Example:**
```json
{
"voices": ["vivian", "ryan", "custom_voice_1"],
"uploaded_voices": [
{
"name": "custom_voice_1",
"consent": "user_consent_id",
"created_at": 1738660000,
"file_size": 1024000,
"mime_type": "audio/wav"
}
]
}
```
GET /v1/audio/voices
```

Lists available voices for the loaded model:
#### POST /v1/audio/voices

Upload a new voice sample for voice cloning in Base task TTS requests.
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states that uploaded voices can be used "for voice cloning in Base task TTS requests", but the implementation doesn't enforce that uploaded voices are only used with Base task. An uploaded voice can be used with any task type due to the auto-set logic at lines 320-325, which could lead to unexpected behavior. Consider either:

  1. Clarifying in the documentation that uploaded voices work with any task type
  2. Restricting uploaded voices to Base task only in the code
  3. Making the auto-set behavior conditional on task_type being "Base"
Suggested change
Upload a new voice sample for voice cloning in Base task TTS requests.
Upload a new voice sample that can be used for voice cloning in subsequent TTS requests with any supported task type.

Copilot uses AI. Check for mistakes.

**Form Parameters:**
- `audio_sample` (required): Audio file (max 10MB, supported formats: wav, mp3, flac, ogg, aac, webm, mp4)
- `consent` (required): Consent recording ID
- `name` (required): Name for the new voice

**Response Example:**
```json
{
"voices": ["aiden", "dylan", "eric", "one_anna", "ryan", "serena", "sohee", "uncle_fu", "vivian"]
"success": true,
"voice": {
"name": "custom_voice_1",
"consent": "user_consent_id",
"created_at": 1738660000,
"mime_type": "audio/wav",
"file_size": 1024000
}
}
```

**Usage Example:**
```bash
curl -X POST http://localhost:8000/v1/audio/voices \
-F "audio_sample=@/path/to/voice_sample.wav" \
-F "consent=user_consent_id" \
-F "name=custom_voice_1"
```

### Endpoint

```
POST /v1/audio/speech
Content-Type: application/json
```

This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters.

### Request Body

```json
Expand Down
Loading