-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(tts): add voice upload API for Qwen3-TTS #1201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2c6d99b
eb163bb
0a76363
52c3aa9
eee09b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -125,6 +125,81 @@ Lists available voices for the loaded model. | |
| "voices": ["aiden", "dylan", "eric", "ono_anna", "ryan", "serena", "sohee", "uncle_fu", "vivian"] | ||
| } | ||
| ``` | ||
| ``` | ||
| POST /v1/audio/voices | ||
| Content-Type: multipart/form-data | ||
| ``` | ||
|
|
||
| Upload a new voice sample for voice cloning in Base task TTS requests. | ||
|
|
||
| **Form Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| |-----------|------|----------|-------------| | ||
| | `audio_sample` | file | Yes | Audio file (max 10MB, supported formats: wav, mp3, flac, ogg, aac, webm, mp4) | | ||
| | `consent` | string | Yes | Consent recording ID | | ||
| | `name` | string | Yes | Name for the new voice | | ||
|
|
||
| **Response Example:** | ||
|
|
||
| ```json | ||
| { | ||
| "success": true, | ||
| "voice": { | ||
| "name": "custom_voice_1", | ||
| "consent": "user_consent_id", | ||
| "created_at": 1738660000, | ||
| "mime_type": "audio/wav", | ||
| "file_size": 1024000 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| **Usage Example:** | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:8091/v1/audio/voices \ | ||
| -F "audio_sample=@/path/to/voice_sample.wav" \ | ||
| -F "consent=user_consent_id" \ | ||
| -F "name=custom_voice_1" | ||
| ``` | ||
|
|
||
|
|
||
| ```bash | ||
| DELETE /v1/audio/voices/{name} | ||
| ``` | ||
|
|
||
| Delete an uploaded voice sample. | ||
|
|
||
| **Path Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| |-----------|------|----------|-------------| | ||
| | `name` | string | Yes | Name of the voice to delete | | ||
|
|
||
| **Response Example:** | ||
|
|
||
| ```json | ||
| { | ||
| "success": true, | ||
| "message": "Voice 'custom_voice_1' deleted successfully" | ||
| } | ||
| ``` | ||
|
|
||
| **Error Response (404 Not Found):** | ||
|
|
||
| ```json | ||
| { | ||
| "success": false, | ||
| "error": "Voice 'unknown_voice' not found" | ||
| } | ||
| ``` | ||
|
|
||
| **Usage Example:** | ||
|
|
||
| ```bash | ||
| curl -X DELETE http://localhost:8091/v1/audio/voices/custom_voice_1 | ||
| ``` | ||
|
|
||
| ## Examples | ||
|
|
||
|
|
@@ -185,6 +260,25 @@ curl -X POST http://localhost:8091/v1/audio/speech \ | |
| }' --output cloned.wav | ||
| ``` | ||
|
|
||
| upload voice | ||
| ```bash | ||
| curl -X POST http://localhost:8091/v1/audio/voices \ | ||
| -F "audio_sample=@/path/to/voice_sample.wav" \ | ||
| -F "consent=user_consent_id" \ | ||
| -F "name=custom_voice_1" | ||
| ``` | ||
|
|
||
| use upload voice | ||
| ```bash | ||
| curl -X POST http://localhost:8091/v1/audio/speech \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "input": "Hello, this is a cloned voice", | ||
| "task_type": "Base", | ||
| "voice": "custom_voice_1" | ||
| }' --output cloned.wav | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing closing |
||
| ``` | ||
|
|
||
| ## Supported Models | ||
|
|
||
| | Model | Task Type | Description | | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -184,29 +184,68 @@ sudo apt install ffmpeg | |||||
|
|
||||||
| ## API Reference | ||||||
|
|
||||||
| ### Endpoint | ||||||
|
|
||||||
| ``` | ||||||
| POST /v1/audio/speech | ||||||
| Content-Type: application/json | ||||||
| ``` | ||||||
| ### Voices Endpoint | ||||||
|
|
||||||
| This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters. | ||||||
| #### GET /v1/audio/voices | ||||||
|
|
||||||
| ### Voices Endpoint | ||||||
| List all available voices/speakers from the loaded model, including both built-in model voices and uploaded custom voices. | ||||||
|
|
||||||
| **Response Example:** | ||||||
| ```json | ||||||
| { | ||||||
| "voices": ["vivian", "ryan", "custom_voice_1"], | ||||||
| "uploaded_voices": [ | ||||||
| { | ||||||
| "name": "custom_voice_1", | ||||||
| "consent": "user_consent_id", | ||||||
| "created_at": 1738660000, | ||||||
| "file_size": 1024000, | ||||||
| "mime_type": "audio/wav" | ||||||
| } | ||||||
| ] | ||||||
| } | ||||||
| ``` | ||||||
| GET /v1/audio/voices | ||||||
| ``` | ||||||
|
|
||||||
| Lists available voices for the loaded model: | ||||||
| #### POST /v1/audio/voices | ||||||
|
|
||||||
| Upload a new voice sample for voice cloning in Base task TTS requests. | ||||||
|
||||||
| Upload a new voice sample for voice cloning in Base task TTS requests. | |
| Upload a new voice sample that can be used for voice cloning in subsequent TTS requests with any supported task type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documented response example for
POST /v1/audio/voicesincludesfile_path, but the implementation intentionally does not return server file paths. Update the example to match the actual response schema (and also document the updatedGET /v1/audio/voicesresponse shape, which now includesuploaded_voices).