Skip to content

[Fix][Fish Speech] Remove redundant get_vocab() in control token encoding#2842

Merged
linyueqian merged 1 commit into
vllm-project:mainfrom
Sy0307:fix/fish-speech-get-vocab-perf
Apr 16, 2026
Merged

[Fix][Fish Speech] Remove redundant get_vocab() in control token encoding#2842
linyueqian merged 1 commit into
vllm-project:mainfrom
Sy0307:fix/fish-speech-get-vocab-perf

Conversation

@Sy0307
Copy link
Copy Markdown
Contributor

@Sy0307 Sy0307 commented Apr 16, 2026

Purpose

_encode_control_token() in prompt_utils.py called tokenizer.get_vocab() on every invocation, which rebuilds the full 155K-entry vocabulary dictionary each time (~68ms on H20 GPU). Since this function is called 6 times per prompt (for <|im_start|>, <|im_end|>, <|voice|>), it adds ~408ms of pure Python overhead to every Fish Speech S2 Pro TTS request.

Replace with tokenizer.convert_tokens_to_ids() which performs the same single-token lookup in <1ms.

Test Plan

  • A/B benchmark: run baseline (origin/main) and fix on the same H20 GPU with identical config (enforce_eager=false, CUDA graph enabled), same model, same text
  • Verify audio output is valid WAV with correct content
  • Ruff lint/format pass

Test Result

Setup: enforce_eager=false (torch.compile + CUDA graph), text = "The quick brown fox jumps over the lazy dog." (~3s audio)

Median Total Median RTF Improvement
Baseline (origin/main) 1.205s 0.399
This PR 0.908s 0.292 -25%

Long text (~14s audio):

Median Total Median RTF Improvement
Baseline (origin/main) 3.830s 0.269
This PR 3.649s 0.248 -8%

Root cause profiling: build_prompt dropped from ~400ms to ~1ms per request.

cc @linyueqian @zwhzzz0821

…ding

tokenizer.get_vocab() rebuilds the full 155K-entry vocab dict on every
call (~68ms on H20).  _encode_control_token() called it 6 times per
prompt, adding ~408ms of pure Python overhead to every Fish Speech TTS
request.

Replace with convert_tokens_to_ids() which does the same lookup in <1ms.

Signed-off-by: Sy03 <1370724210@qq.com>
@Sy0307 Sy0307 requested a review from hsliuustc0106 as a code owner April 16, 2026 07:51
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Blocking Issues

None.


VERDICT: COMMENT

Clean performance optimization. The A/B benchmarking evidence in the PR description is solid. LGTM.

(Note: This change is already covered by existing tests since it's an internal optimization with no API changes.)

Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 16, 2026
@linyueqian linyueqian enabled auto-merge (squash) April 16, 2026 12:08
@linyueqian linyueqian merged commit 322620f into vllm-project:main Apr 16, 2026
8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants