[BugFix][VoxCPM2]: split multichar Chinese tokens to match training tokenization by Sy0307 · Pull Request #2832 · vllm-project/vllm-omni

Sy0307 · 2026-04-15T19:48:47Z

Purpose

Fix garbled Chinese audio output from VoxCPM2 via the /v1/audio/speech API.

Root cause: VoxCPM2 was trained with mask_multichar_chinese_tokens which splits multi-character Chinese tokens (e.g. "你好" id=23523) into single-character IDs ("你" id=59496, "好" id=59495). The HuggingFace openbmb/VoxCPM2 model repo ships a plain LlamaTokenizerFast without this splitting, so the model receives token IDs it was never trained on, producing garbled Chinese output.

Related: #2758 (comment)

Test Plan

Tested on NVIDIA H20 with latest main (50ae1de), without the custom tokenization_voxcpm2.py that was previously masking the bug:

Start server: vllm-omni serve openbmb/VoxCPM2 --stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm2.yaml --omni --trust-remote-code
Send Chinese TTS: curl /v1/audio/speech -d '{"input": "你好，这是一个测试程序。", ...}'
ASR verify with whisper-base

Test Result

Correctness (whisper-base ASR):

Input	ASR Output	Status
你好，这是一个测试程序。	你好,这是一个测试程序	Pass
人工智能正在深刻改变...AI技术的应用范围越来越广泛。	人工智能正在深刻改变...AI技术的应用范围越来越广泛	Pass
Hello, this is a quick test of VoxCPM2 synthesis.	Hello, this is a quick test of Vox CPM2 synthesis.	Pass

Performance (A/B on H20, origin/main, torch.compile + CUDA Graph enabled):

	Baseline (no fix)	With fix	Diff
Avg RTF	0.111	0.108	-2.7% (noise)

Zero performance impact — the split map is lazily built once and the per-request lookup runs only during prefill on a few dozen tokens.

cc @linyueqian @gesla2024

chatgpt-codex-connector · 2026-04-15T19:49:00Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian · 2026-04-15T21:22:39Z

Tested this on H20 (h20-server-1, GPU 1) on the PR branch (commit 64cf5ce = your fix rebased onto current main). Reverted my own prior tokenizer wrapper and any other local hacks before testing.

Server log confirms the fix code is live:

voxcpm2_talker.py:1014  VoxCPM2: built multichar Chinese split map (19400 entries)

But Whisper still reports garbled Chinese:

Input	Audio	whisper-large-v3	whisper-small (forced lang=zh)
`你好，这是一个测试程序。`	2.24s	`ふっガタン` (lang=ja)	`方日遊`
`Hello, this is a test program.`	3.36s	correct EN	n/a
`你好，这是一个voxcpm2 测试程序在vllm-omni 0.19 中测试的。`	6.56s	mostly garbled	`wichtig订阅今天节目点个 Esen 辑入`

For comparison, the broken pre-fix zh_only.wav was ~1.12s. With your fix it's 2.24s, so the split is doing something (the model is consuming a longer/different ID sequence), but the audio itself is still unintelligible.

One hypothesis worth ruling out: the split map is built from self.tts.text_tokenizer.tokenizer.get_vocab(), but the IDs reaching preprocess() come from vllm's request-side tokenizer (the one wired through cached_tokenizer_from_config for /v1/audio/speech). If those two tokenizers don't share the exact same vocab IDs (e.g. different added_tokens.json ordering), the split map would expand the wrong IDs and the model would still see out-of-distribution sequences. Could you double-check that self.tts.text_tokenizer.tokenizer is <the one vllm uses for input encoding>, or at minimum that tokenizer.encode("你好") returns the same ID list in both paths?

Server start command used:

CUDA_VISIBLE_DEVICES=1 HF_HOME=/mnt/data4/huggingface \
  vllm serve openbmb/VoxCPM2 \
    --stage-configs-path vllm_omni/model_executor/stage_configs/voxcpm2.yaml \
    --omni --port 8071 --trust-remote-code --enforce-eager \
    --gpu-memory-utilization 0.8

Happy to share the WAVs if useful. cc @Sy0307

hsliuustc0106 · 2026-04-15T22:22:31Z

Fix looks correct. The lazy initialization of the split map is clean and the performance data shows no regression.

Missing automated regression test for the tokenization behavior — manual ASR verification is good but not sufficient for preventing regressions. A unit test that asserts tokenized input "你好" produces the expected single-char token IDs would catch future regressions.

…zation VoxCPM2 was trained with mask_multichar_chinese_tokens which splits multi-character Chinese tokens (e.g. "你好" id=23523) into single-char IDs ("你" id=59496, "好" id=59495). The HuggingFace openbmb/VoxCPM2 model repo ships a plain LlamaTokenizerFast without this splitting, causing garbled Chinese audio output via the /v1/audio/speech API. Add _split_multichar_chinese() in preprocess() to fix up token IDs before they reach the model. The split map is lazily built from the tokenizer vocab on first request. The operation is idempotent so it works correctly regardless of whether the tokenizer already does char-level splitting. Signed-off-by: Sy03 <1370724210@qq.com>

Signed-off-by: Sy03 <1370724210@qq.com>

gesla2024 · 2026-04-16T04:47:28Z

I continued using the gradio_demo.py program and found that when not using voice cloning, the generated streaming and non-streaming output audio had no issues, except that on the webpage, the streaming output audio played twice continuously. However, when using voice cloning, both streaming and non-streaming output contents changed.

These are the results of my tests.

6.mp4

5.mp4

The branch used is: fix-voxcpm2-chinese-tokenizer

Below is the operation log I used.

(voxcpm-omni) root@AS-4124GS-TNR:/home/www# git clone https://github.com/vllm-project/vllm-omni.git
Cloning into 'vllm-omni'...
remote: Enumerating objects: 22046, done.
remote: Counting objects: 100% (456/456), done.
remote: Compressing objects: 100% (364/364), done.
remote: Total 22046 (delta 283), reused 92 (delta 92), pack-reused 21590 (from 4)
Receiving objects: 100% (22046/22046), 22.74 MiB | 4.12 MiB/s, done.
Resolving deltas: 100% (14865/14865), done.

(voxcpm-omni) root@AS-4124GS-TNR:/home/www# cd vllm-omni
(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# git fetch origin pull/2832/head:fix-voxcpm2-chinese-tokenizer
remote: Enumerating objects: 16, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 16 (delta 14), reused 16 (delta 14), pack-reused 0 (from 0)
Unpacking objects: 100% (16/16), 3.27 KiB | 418.00 KiB/s, done.
From https://github.com/vllm-project/vllm-omni

[new ref] refs/pull/2832/head -> fix-voxcpm2-chinese-tokenizer

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# git checkout fix-voxcpm2-chinese-tokenizer
Switched to branch 'fix-voxcpm2-chinese-tokenizer'

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# git branch --show-current
fix-voxcpm2-chinese-tokenizer

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# pip install -e .
Looking in indexes: https://mirrors.pku.edu.cn/pypi/web/simple
Obtaining file:///home/www/vllm-omni
....
....
Created wheel for vllm-omni: filename=vllm_omni-0.19.0rc2.dev143+g7df2dc9ac-0.editable-py3-none-any.whl size=11373 sha256=9695c3d794f9da5329cdc4279ac6e8340bf4ecbc0209828ddeab6e267cb449ee
Stored in directory: /tmp/pip-ephem-wheel-cache-cx0eq_ew/wheels/40/d1/3d/b5974f53b81623adf2d6340c122f931855062fa6d82ca19137
Successfully built vllm-omni
Installing collected packages: vllm-omni
Successfully installed vllm-omni-0.19.0rc2.dev143+g7df2dc9ac

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# pip show vllm-omni
Name: vllm-omni
Version: 0.19.0rc2.dev143+g7df2dc9ac
Summary: A framework for efficient model inference with omni-modality models
Home-page: https://github.com/vllm-project/vllm-omni
Author: vLLM-Omni Team
Author-email:
License-Expression: Apache-2.0
Location: /root/miniconda3/envs/voxcpm-omni/lib/python3.12/site-packages
Editable project location: /home/www/vllm-omni
Requires: accelerate, aenum, av, cache-dit, diffusers, einops, fa3-fwd, imageio, janus, omegaconf, onnxruntime, openai-whisper, prettytable, pydub, pyzmq, resampy, soundfile, sox, torchsde, tqdm, x-transformers
Required-by:

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni#

(voxcpm-omni) root@AS-4124GS-TNR:/home/www/vllm-omni# vllm-omni serve /home/VoxCPM/models/VoxCPM2 \ --stage-configs-path /home/www/vllm-omni/vllm_omni/model_executor/stage_configs/voxcpm2.yaml \ --omni \ --port 8071 \ --trust-remote-code \ --enforce-eager \ --gpu-memory-utilization 0.8 (APIServer pid=577724) INFO 04-16 12:20:30 [utils.py:299] vLLM server version 0.19.0, serving model /home/VoxCPM/models/VoxCPM2 (APIServer pid=577724) INFO 04-16 12:20:30 [utils.py:233] non-default args: {'model_tag': '/home/VoxCPM/models/VoxCPM2', 'port': 8071, 'model': '/home/VoxCPM/models/VoxCPM2', 'trust_remote_code': True, 'enforce_eager': True, 'gpu_memory_utilization': 0.8} (APIServer pid=577724) INFO 04-16 12:20:30 [omni_base.py:93] [AsyncOmni] Initializing with model /home/VoxCPM/models/VoxCPM2 (APIServer pid=577724) INFO 04-16 12:20:30 [async_omni_engine.py:244] [AsyncOmniEngine] Initializing with model /home/VoxCPM/models/VoxCPM2 (APIServer pid=577724) WARNING 04-16 12:20:30 [async_omni_engine.py:1290] stage_configs_path is set — the following top-level engine args are ignored (per-stage YAML takes precedence): attention_config, compilation_config, enforce_eager, eplb_config, gpu_memory_utilization, kernel_config, profiler_config, reasoning_parser_plugin, structured_outputs_config, trust_remote_code (APIServer pid=577724) WARNING 04-16 12:20:30 [utils.py:115] Filtered out 1 callable object(s) from base_engine_args that are not compatible with OmegaConf: ['dispatch_function']. (APIServer pid=577724) INFO: Started server process [577724] (APIServer pid=577724) INFO: Waiting for application startup. (APIServer pid=577724) INFO: Application startup complete. (APIServer pid=577724) INFO 04-16 12:22:02 [serving_speech.py:831] VoxCPM2 serving: built multichar split map (19789 entries) (APIServer pid=577724) INFO 04-16 12:22:02 [serving_speech.py:1599] TTS speech request speech-8da1d309698701d6: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:31987 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) WARNING 04-16 12:22:02 [input_processor.py:235] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat(). (APIServer pid=577724) INFO 04-16 12:22:02 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-8da1d309698701d6 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:22:02 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-8da1d309698701d6 (Worker pid=578523) WARNING 04-16 12:22:02 [gpu_model_runner.py:350] additional_information on request data is deprecated, use model_intermediate_buffer (Worker pid=578523) INFO 04-16 12:22:03 [voxcpm2_talker.py:1037] VoxCPM2: built multichar Chinese split map (19789 entries) (Worker pid=578523) WARNING 04-16 12:22:03 [gpu_model_runner.py:1378] _merge_additional_information_update is deprecated, use _update_intermediate_buffer (Worker pid=578523) INFO 04-16 12:22:03 [voxcpm2_talker.py:574] VoxCPM2: torch.compile applied to: LocDiT, feat_encoder, AudioVAE, scaffold+residual (CUDA Graph, skipping compile), projections (APIServer pid=577724) INFO 04-16 12:22:13 [async_omni.py:272] [AsyncOmni] Request speech-8da1d309698701d6 aborted. (APIServer pid=577724) INFO 04-16 12:22:13 [serving_speech.py:1208] Streaming request speech-8da1d309698701d6 cancelled by client (APIServer pid=577724) INFO 04-16 12:22:13 [orchestrator.py:868] [Orchestrator] Aborted request(s) ['speech-8da1d309698701d6'] (APIServer pid=577724) INFO 04-16 12:22:14 [serving_speech.py:1599] TTS speech request speech-9018a3c9a173f56c: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:28370 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:22:14 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-9018a3c9a173f56c prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:22:14 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-9018a3c9a173f56c (Worker pid=578523) INFO 04-16 12:22:17 [voxcpm2_talker.py:647] CUDA Graph captured for scaffold (batch_size=1) (Worker pid=578523) INFO 04-16 12:22:17 [voxcpm2_talker.py:647] CUDA Graph captured for residual (batch_size=1) (APIServer pid=577724) INFO 04-16 12:22:18 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO 04-16 12:22:25 [serving_speech.py:1599] TTS speech request speech-928c90092de4b36f: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:43881 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:22:25 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-928c90092de4b36f prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:22:25 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-928c90092de4b36f (APIServer pid=577724) INFO 04-16 12:22:25 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO 04-16 12:22:43 [serving_speech.py:1599] TTS speech request speech-9ade2235cc52453f: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:8409 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:22:43 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-9ade2235cc52453f prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:22:43 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-9ade2235cc52453f (APIServer pid=577724) INFO 04-16 12:22:44 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO 04-16 12:23:25 [serving_speech.py:1599] TTS speech request speech-87ea32c87659cc30: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:47383 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:23:25 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-87ea32c87659cc30 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:23:25 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-87ea32c87659cc30 (APIServer pid=577724) INFO 04-16 12:24:04 [async_omni.py:272] [AsyncOmni] Request speech-87ea32c87659cc30 aborted. (APIServer pid=577724) INFO 04-16 12:24:04 [serving_speech.py:1208] Streaming request speech-87ea32c87659cc30 cancelled by client (APIServer pid=577724) INFO 04-16 12:24:04 [orchestrator.py:868] [Orchestrator] Aborted request(s) ['speech-87ea32c87659cc30'] (APIServer pid=577724) INFO 04-16 12:24:07 [serving_speech.py:1599] TTS speech request speech-934c0a9617435fff: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO 04-16 12:24:07 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-934c0a9617435fff prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:24:07 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-934c0a9617435fff (APIServer pid=577724) INFO 04-16 12:24:13 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO: 160.213.131.122:6553 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:25:13 [serving_speech.py:1599] TTS speech request speech-9f875e40994aa22f: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO: 160.213.131.122:45180 - "POST /v1/audio/speech HTTP/1.1" 200 OK (APIServer pid=577724) INFO 04-16 12:25:13 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-9f875e40994aa22f prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:25:13 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-9f875e40994aa22f (APIServer pid=577724) INFO 04-16 12:25:20 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO 04-16 12:25:35 [serving_speech.py:1599] TTS speech request speech-a5b415b8015a0b79: text='你好吗，我是一个测试程序', model=voxcpm2 (APIServer pid=577724) INFO 04-16 12:25:35 [orchestrator.py:670] [Orchestrator] _handle_add_request: stage=0 req=speech-a5b415b8015a0b79 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=0 num_sampling_params=1 (APIServer pid=577724) INFO 04-16 12:25:35 [stage_engine_core_client.py:170] [StageEngineCoreClient] Stage-0 adding request: speech-a5b415b8015a0b79 (APIServer pid=577724) INFO 04-16 12:25:41 [omni_base.py:162] [Summary] {} (APIServer pid=577724) INFO: 160.213.131.122:31840 - "POST /v1/audio/speech HTTP/1.1" 200 OK ^C(Worker pid=578523) WARNING 04-16 12:29:08 [multiproc_executor.py:871] WorkerProc was terminated (Worker pid=578523) INFO 04-16 12:29:08 [multiproc_executor.py:764] Parent process exited, terminating worker queues (APIServer pid=577724) INFO 04-16 12:29:08 [omni_base.py:295] [AsyncOmni] Shutting down (APIServer pid=577724) INFO 04-16 12:29:08 [async_omni_engine.py:1622] [AsyncOmniEngine] Shutting down Orchestrator (APIServer pid=577724) INFO 04-16 12:29:08 [orchestrator.py:212] [Orchestrator] Received shutdown signal (APIServer pid=577724) INFO 04-16 12:29:08 [orchestrator.py:941] [Orchestrator] Shutting down all stages (APIServer pid=577724) INFO: Shutting down (Worker pid=578523) (APIServer pid=577724) INFO: Waiting for application shutdown. (APIServer pid=577724) INFO: Application shutdown complete. (APIServer pid=577724) INFO: Finished server process [577724] (APIServer pid=577724) INFO 04-16 12:29:08 [omni_base.py:295] [AsyncOmni] Shutting down

These are the terminal logs from the recent test.

codeHackeR321 · 2026-04-16T06:43:05Z

Hi @Sy0307, I am also facing issues with voice cloning flow only in English Language.

lishunyang12

Review: LGTM

Clean, well-scoped fix for the multichar Chinese token mismatch. The approach of splitting at the serving layer and fail-fast validating at the model layer is sound.

Correctness

is_cjk_char covers the main CJK Unicode blocks. Missing Extension E/F/G/I (U+2B820..U+323AF) but those are vanishingly rare in practice — fine to add later if needed.
build_cjk_split_map correctly strips the sentencepiece ▁ prefix, validates all constituent chars have non-UNK IDs, and caches the result.
split_multichar_chinese is a clean O(n) pass, idempotent as documented.
The switch from {"prompt": text} to {"prompt_token_ids": ids} correctly preserves BOS handling (the preprocess already strips leading BOS).

Thread safety

_voxcpm2_encode lazy-inits _voxcpm2_tokenizer in the async event loop with no await between the None-check and assignment, so no race. Good.

Performance

Lazy one-time map build + O(n) per-request scan on a small number of text tokens — negligible overhead, consistent with the benchmark numbers in the PR.

Minor suggestions (non-blocking)

Duplicate tokenizer load: _voxcpm2_encode calls AutoTokenizer.from_pretrained(model_name) which loads the tokenizer a second time in the serving process. If there's a way to reuse the engine's tokenizer (e.g. via self.engine_client), that would save memory and startup time. Not critical since it's a one-time cost.
_get_multichar_zh_split() in preprocess hot path: The lazy-build is fine, but any(tid in split_map for tid in token_ids) runs on every prefill. Since the serving layer is now responsible for splitting, this check should never fire in normal operation. Consider gating it behind a debug/assert mode if profiling ever shows it matters (unlikely with current token counts).

Tested logic looks correct. Approving.

gesla2024 · 2026-04-17T03:17:24Z

I redeployed the latest merged project code to the server for testing, deleted the model, cleared the cache, and re-downloaded it. During testing, I found that the audio output is normal when not using voice cloning, but when using voice cloning, whether in Chinese or English, and whether streaming output is enabled or not, there are issues — all the audio is noisy and garbled.

When not using voice cloning but enabling streaming output, the generated audio data overlaps, producing two identical outputs.

Below is an example video I tested, hopefully it helps;

output.mp4

Sy0307 · 2026-04-17T17:31:58Z

I will handle bug in voice clone mode asap and thanks for your report. @gesla2024

…okenization (vllm-project#2832) Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 marked this pull request as ready for review April 15, 2026 19:48

Sy0307 requested a review from hsliuustc0106 as a code owner April 15, 2026 19:48

linyueqian added the ready label to trigger buildkite CI label Apr 15, 2026

Sy0307 added 2 commits April 16, 2026 11:18

refactor(voxcpm2): simplify _voxcpm2_encode imports and trim comments

7df2dc9

Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 force-pushed the fix/voxcpm2-chinese-tokenizer branch from 64cf5ce to 7df2dc9 Compare April 16, 2026 03:52

lishunyang12 approved these changes Apr 16, 2026

View reviewed changes

linyueqian merged commit 7d64a7c into vllm-project:main Apr 16, 2026
8 checks passed

codeHackeR321 mentioned this pull request Apr 17, 2026

[New Model]: VoxCPM2 #2594

Closed

1 task

Sy0307 mentioned this pull request Apr 17, 2026

[Bugfix][VoxCPM2] Fix voice-clone decode loop by padding prefill prompt #2894

Merged

gnomefin mentioned this pull request Apr 18, 2026

[Bug]: VoxCPM2 voice-cloning decoder never emits stop token, output always ~5 min #2896

Closed

1 task

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026

[BugFix][VoxCPM2]: split multichar Chinese tokens to match training t…

e2479fd

…okenization (vllm-project#2832) Signed-off-by: Sy03 <1370724210@qq.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[BugFix][VoxCPM2]: split multichar Chinese tokens to match training t…

307b35c

…okenization (vllm-project#2832) Signed-off-by: Sy03 <1370724210@qq.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[BugFix][VoxCPM2]: split multichar Chinese tokens to match training t…

829404b

…okenization (vllm-project#2832) Signed-off-by: Sy03 <1370724210@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][VoxCPM2]: split multichar Chinese tokens to match training tokenization#2832

[BugFix][VoxCPM2]: split multichar Chinese tokens to match training tokenization#2832
linyueqian merged 2 commits into
vllm-project:mainfrom
Sy0307:fix/voxcpm2-chinese-tokenizer

Sy0307 commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

linyueqian commented Apr 15, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026

Uh oh!

gesla2024 commented Apr 16, 2026

Uh oh!

codeHackeR321 commented Apr 16, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

Uh oh!

gesla2024 commented Apr 17, 2026

Uh oh!

Sy0307 commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Sy0307 commented Apr 15, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

linyueqian commented Apr 15, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026

Uh oh!

gesla2024 commented Apr 16, 2026

Uh oh!

codeHackeR321 commented Apr 16, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: LGTM

Correctness

Thread safety

Performance

Minor suggestions (non-blocking)

Uh oh!

Uh oh!

gesla2024 commented Apr 17, 2026

Uh oh!

Sy0307 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Sy0307 commented Apr 17, 2026 •

edited

Loading