Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
1d11533
feat: add minimal split-stage VoxCPM integration
Celeste-jq Mar 18, 2026
013a220
docs: add minimal offline VoxCPM example
Celeste-jq Mar 18, 2026
a70f4ee
support stream_generate
lyj-jjj Mar 23, 2026
8b4a638
merge endtoend.py&endtoend_streaming.py
lyj-jjj Mar 23, 2026
03f4c5d
abstract common methods
lyj-jjj Mar 24, 2026
779afac
support stream
lyj-jjj Apr 1, 2026
5c10527
Merge upstream main into lyj pure_voxcpm baseline
Celeste-jq Apr 1, 2026
8ab0c3a
Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…
Celeste-jq Apr 2, 2026
d14585d
refactor: simplify voxcpm streaming path and align with upstream
Celeste-jq Apr 2, 2026
61f2841
feat: add voxcpm batch tts and clone examples
Celeste-jq Apr 3, 2026
c71733b
add voxcpm example helper files
IsleOfDawnlight Apr 3, 2026
ca3834a
add voxcpm model
IsleOfDawnlight Apr 3, 2026
6adba2e
Merge remote-tracking branch 'celeste/voxcpm_streaming_0180' into pur…
Celeste-jq Apr 3, 2026
de5644c
add voxcpm model
IsleOfDawnlight Apr 3, 2026
0e96d52
align voxcpm content with voxcpm_streaming_0180
Celeste-jq Apr 3, 2026
164ccbd
fix: avoid torchcodec dependency for voxcpm clone
Celeste-jq Apr 3, 2026
94ebf07
sync voxcpm files with latest voxcpm_streaming_0180
Celeste-jq Apr 3, 2026
a80bd7a
sync voxcpm files with latest voxcpm_streaming_0180
Celeste-jq Apr 3, 2026
3e4e2b2
Merge remote-tracking branch 'upstream/main' into pure_voxcpm
Celeste-jq Apr 7, 2026
a735478
style: apply ruff formatting for voxcpm files
Celeste-jq Apr 7, 2026
3fd7bb9
align voxcpm and bridge files with Celeste upstream branches
Celeste-jq Apr 7, 2026
ae1c107
style: apply ruff formatting for voxcpm files
Celeste-jq Apr 8, 2026
8408448
fix: address VoxCPM PR review feedback
Celeste-jq Apr 8, 2026
fa1f921
style: fix nightly index pre-commit import order
Celeste-jq Apr 8, 2026
5f97bec
Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…
Celeste-jq Apr 8, 2026
5d07e61
refactor: reduce shared VoxCPM review drift
Celeste-jq Apr 8, 2026
d4d61e9
sync voxcpm files with Celeste pr_voxcpm
Celeste-jq Apr 9, 2026
0b1f88f
remove voxcpm example test script
Celeste-jq Apr 9, 2026
e695164
style: apply pre-commit fixes
Celeste-jq Apr 9, 2026
537b8e5
sync voxcpm files with Celeste pr_voxcpm
Celeste-jq Apr 10, 2026
dbe54d2
sync voxcpm files with Celeste pr_voxcpm
Celeste-jq Apr 10, 2026
da6cc01
sync voxcpm files with Celeste pr_voxcpm
Celeste-jq Apr 10, 2026
68705a6
Merge upstream/main into pure_voxcpm without rewriting history
Celeste-jq Apr 13, 2026
de2d779
align tree to upstream main and reapply VoxCPM
Celeste-jq Apr 13, 2026
9d505f6
sync missing VoxCPM files from pr_voxcpm
Celeste-jq Apr 13, 2026
cfb981d
sync latest pr_voxcpm model-local updates
Celeste-jq Apr 13, 2026
4a93286
fix pre-commit issues in voxcpm files
Celeste-jq Apr 13, 2026
f5b951f
test: add VoxCPM API and e2e coverage
Celeste-jq Apr 14, 2026
5868f29
Merge remote-tracking branch 'upstream/main' into pure_voxcpm
Celeste-jq Apr 14, 2026
1f2dc5d
fix: avoid restarting voxcpm async stream
Celeste-jq Apr 14, 2026
3ae9cb1
fix: consolidate voxcpm async stream updates
Celeste-jq Apr 14, 2026
4dcf000
test: align voxcpm UTs with current interfaces
Celeste-jq Apr 14, 2026
2a1e779
style: fix voxcpm pre-commit issues
Celeste-jq Apr 14, 2026
5bcd68d
chore: remove voxcpm debug logging
Celeste-jq Apr 14, 2026
98a45fd
Merge remote-tracking branch 'upstream/main' into pure_voxcpm
Celeste-jq Apr 14, 2026
a2bc6f4
ci: add VoxCPM E2E pre-merge test to test-ready.yml
linyueqian Apr 14, 2026
f6a27cf
test: guard cleanup_dist_env_and_memory on NPU
Celeste-jq Apr 15, 2026
c4bf6ae
test: scope NPU cleanup guard to VoxCPM e2e
Celeste-jq Apr 15, 2026
1f69d9e
style: fix voxcpm e2e import ordering
Celeste-jq Apr 15, 2026
68db41e
test: prepare VoxCPM e2e model dir and hf config
Celeste-jq Apr 15, 2026
25258b2
refactor: move VoxCPM model prep helpers out of tests
Celeste-jq Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .buildkite/test-ready.yml
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,31 @@ steps:
volumes:
- "/fsx/hf_cache:/fsx/hf_cache"

- label: "VoxCPM E2E Test"
timeout_in_minutes: 20
depends_on: upload-ready-pipeline
commands:
- |
timeout 20m bash -c '
pip install voxcpm
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_WORKER_MULTIPROC_METHOD=spawn
pytest -s -v tests/e2e/offline_inference/test_voxcpm.py -m "core_model" --run-level "core_model"
'
agents:
queue: "gpu_1_queue"
plugins:
- docker#v5.2.0:
image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
always-pull: true
propagate-environment: true
shm-size: "8gb"
environment:
- "HF_HOME=/fsx/hf_cache"
- "HF_TOKEN"
volumes:
- "/fsx/hf_cache:/fsx/hf_cache"

- label: "VoxCPM2 Native AR E2E Test"
timeout_in_minutes: 20
depends_on: upload-ready-pipeline
Expand Down
119 changes: 119 additions & 0 deletions benchmarks/voxcpm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# VoxCPM Benchmark

This directory contains both:

- online serving benchmark through the OpenAI-compatible `/v1/audio/speech` API
- offline benchmark for `Omni` / `AsyncOmni`
- full offline smoke-matrix orchestration

Both benchmark paths report:

- TTFP: time to first PCM packet
- E2E latency
- RTF: real-time factor (`e2e / audio_duration`)

## Offline Benchmark

Single offline benchmark run:

```bash
python benchmarks/voxcpm/vllm_omni/bench_tts_offline.py \
--model /path/to/voxcpm-model \
--stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm.yaml \
--text "This is a split-stage VoxCPM synthesis example running on vLLM Omni." \
--warmup-runs 1 \
--output-dir benchmarks/voxcpm/results/offline_single
```

Streaming offline benchmark:

```bash
python benchmarks/voxcpm/vllm_omni/bench_tts_offline.py \
--model /path/to/voxcpm-model \
--stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm_async_chunk.yaml \
--text "This is a split-stage VoxCPM streaming example running on vLLM Omni." \
--warmup-runs 1 \
--output-dir benchmarks/voxcpm/results/offline_streaming
```

Full fixed offline matrix, equivalent to the old `examples/offline_inference/voxcpm/test.py`:

```bash
python benchmarks/voxcpm/vllm_omni/run_offline_matrix.py \
--model /path/to/voxcpm-model \
--ref-audio /path/to/reference.wav \
--ref-text "The exact transcript spoken in reference.wav." \
--output-root benchmarks/voxcpm/results/offline_matrix
```

The full matrix covers both routes:

- streaming: `voxcpm_async_chunk.yaml`
- sync: `voxcpm.yaml`

And these six scenarios under each route:

- warmup + single TTS
- warmup + single voice cloning
- warmup + batch TTS
- warmup + batch voice cloning
- cold single TTS
- cold single voice cloning

`bench_tts_offline.py` itself no longer writes `summary.json` / `results.json`; it prints TTFP / RTF inline and saves generated WAV files only. The matrix runner keeps only per-case `run.log`.

## Start the Server

Async-chunk:

```bash
vllm serve /path/to/voxcpm-model \
--stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm_async_chunk.yaml \
--trust-remote-code \
--enforce-eager \
--omni \
--port 8091
```

Non-streaming:

```bash
vllm serve /path/to/voxcpm-model \
--stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm.yaml \
--trust-remote-code \
--enforce-eager \
--omni \
--port 8091
```

## Run the Benchmark

```bash
python benchmarks/voxcpm/vllm_omni/bench_tts_serve.py \
--host 127.0.0.1 \
--port 8091 \
--num-prompts 20 \
--max-concurrency 1 \
--result-dir /tmp/voxcpm_bench
```

Voice cloning benchmark:

```bash
python benchmarks/voxcpm/vllm_omni/bench_tts_serve.py \
--host 127.0.0.1 \
--port 8091 \
--num-prompts 10 \
--max-concurrency 1 \
--ref-audio https://example.com/reference.wav \
--ref-text "The exact transcript spoken in the reference audio." \
--result-dir /tmp/voxcpm_clone_bench
```

## Notes

- The benchmark uses `stream=true` and `response_format=pcm` so TTFP is measured from the first audio packet.
- `RTF < 1.0` means the server generates audio faster than real time.
- For `voxcpm_async_chunk.yaml`, keep concurrency at `1`. This matches native VoxCPM streaming more closely.
- Do not benchmark concurrent online streaming on `voxcpm_async_chunk.yaml`; use `voxcpm.yaml` for multi-request throughput runs.
- For the offline matrix mode, `--ref-audio` and `--ref-text` are required because clone cases are part of the fixed coverage set.
Loading
Loading