Skip to content

[Perf] Improve Fish Speech S2 Pro inference performance#1859

Merged
linyueqian merged 2 commits into
vllm-project:mainfrom
Sy0307:dev/fish_perf
Mar 21, 2026
Merged

[Perf] Improve Fish Speech S2 Pro inference performance#1859
linyueqian merged 2 commits into
vllm-project:mainfrom
Sy0307:dev/fish_perf

Conversation

@Sy0307
Copy link
Copy Markdown
Contributor

@Sy0307 Sy0307 commented Mar 12, 2026

Purpose

This PR focuses on Fish Speech S2 Pro inference performance based on #1798 . Waiting for #1798 merge.

The current diff is shown against main, so it also contains the full Fish Speech S2 Pro integration stack. However, the main intent of this PR is the performance work on top of that support.

The main optimizations are:

  • batch DAC decode in Stage-1 instead of decoding each request separately
  • keep last_slow_ar_hidden GPU-resident across decode steps to avoid the per-step GPU -> CPU -> GPU round-trip
  • replace connector busy-spin polling with threading.Condition() wakeups plus bounded backoff
  • store async chunk frames as tensors and pack them with tensor ops instead of repeated .cpu().tolist() / torch.tensor(list) reconstruction
  • keep the guarded Fast AR compiled path for the single-request fast path

In practice, these changes reduce:

  • repeated small DAC decode launches under concurrency
  • per-step host/device copies in the Slow AR loop
  • connector-side CPU polling overhead
  • Python-heavy chunk packing overhead in the streaming path

This improves both:

  • single-request latency / RTF
  • loaded throughput under concurrency

Test Result

Measured on RTX 5090 with fishaudio/s2-pro.

The comparison below uses the same config before and after the perf changes:

  • stage0/stage1 max_batch_size=8
  • max_inflight=8

Single request

Metric Before perf changes After perf changes Delta
RTF 0.446 0.392 -12.0%
TTFP 0.121s 0.114s -5.8%
Request throughput 0.381 req/s 0.432 req/s +13.6%

Concurrency = 8

Metric Before perf changes After perf changes Delta
RTF 1.113 0.760 -31.7%
TTFP 0.342s 0.287s -16.2%
Request throughput 1.204 req/s 1.757 req/s +46.0%
Audio throughput 7.099 s/s 10.363 s/s +46.0%

Concurrency = 16

Metric Before perf changes After perf changes Delta
RTF 1.629 1.159 -28.9%
TTFP 3.547s 2.567s -27.6%
Request throughput 1.251 req/s 1.754 req/s +40.2%
Audio throughput 7.376 s/s 10.343 s/s +40.2%

Validation:

  • transfer adapter unit tests passed
  • Fish Speech end-to-end benchmark completed for c=1, c=8, and c=16

cc @linyueqian

@Sy0307 Sy0307 requested a review from hsliuustc0106 as a code owner March 12, 2026 20:00
@linyueqian
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 i will merge #1798 first

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd6be15da3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +867 to +871
if self._is_fish_speech:
if not request.input or not request.input.strip():
raise ValueError("Input text cannot be empty")
ref_audio_data = None
if request.ref_audio is not None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Run Fish Speech request validation before generation

Fish Speech requests skip _validate_tts_request because they return from the if self._is_fish_speech branch before the elif self._is_tts validation path runs. Since OpenAICreateSpeechRequest.max_new_tokens is not range-limited in the schema, values like -1 or very large numbers can pass through and then be written into sampling params, which can cause runtime errors (negative token budget) or excessive generation budgets. Applying the existing validator (or equivalent bounds checks) to Fish Speech requests would prevent this.

Useful? React with 👍 / 👎.

Comment on lines +912 to +916
if self._is_fish_speech and request.max_new_tokens is not None and sampling_params_list:
import copy

sampling_params_list = copy.deepcopy(sampling_params_list)
sampling_params_list[0].max_tokens = request.max_new_tokens
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply Fish Speech default max_new_tokens to sampling params

The Fish Speech prompt builder sets a default max_new_tokens of 4096 in additional_information, but this block only updates stage-0 sampling_params_list[0].max_tokens when the caller explicitly provides request.max_new_tokens. For requests that omit the field, generation falls back to the stage config default (fish_speech_s2_pro.yaml uses max_tokens: 200), causing premature truncation relative to the advertised 4096-token default behavior.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

resolve conflicts please

@Sy0307 Sy0307 force-pushed the dev/fish_perf branch 3 times, most recently from cbeec8d to e8f000e Compare March 16, 2026 20:04
@Sy0307
Copy link
Copy Markdown
Contributor Author

Sy0307 commented Mar 16, 2026

resolve conflicts please

Resolved.

@Sy0307
Copy link
Copy Markdown
Contributor Author

Sy0307 commented Mar 17, 2026

PTAK @linyueqian @hsliuustc0106

Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice perf wins across the board. One bug to fix:

fish_speech_dac_decoder.py — empty valid_codes_qf will crash

If all requests in a batch have invalid/empty codes, valid_codes_qf is empty and valid_codes_qf[0].device (line ~228) raises IndexError. Please add an early return guard:

if not valid_codes_qf:
    return audios, srs

before the feature_lengths = torch.tensor(...) block.

@Sy0307
Copy link
Copy Markdown
Contributor Author

Sy0307 commented Mar 18, 2026

If all requests in a batch have invalid/empty codes, valid_codes_qf is empty and valid_codes_qf[0].device (line ~228) raises IndexError. Please add an early return guard:

if not valid_codes_qf:
    return audios, srs

before the feature_lengths = torch.tensor(...) block.

        if not valid_codes_qf:
            return OmniOutput(
                text_hidden_states=None,
                multimodal_outputs={
                    "model_outputs": [empty] * num_req,
                    "sr": [sr_tensor] * num_req,
                },
            )

We have set some guards in code. This is enough?

@linyueqian linyueqian added this to the v0.18.0 milestone Mar 18, 2026
@linyueqian
Copy link
Copy Markdown
Collaborator

If all requests in a batch have invalid/empty codes, valid_codes_qf is empty and valid_codes_qf[0].device (line ~228) raises IndexError. Please add an early return guard:

if not valid_codes_qf:
    return audios, srs

before the feature_lengths = torch.tensor(...) block.

        if not valid_codes_qf:
            return OmniOutput(
                text_hidden_states=None,
                multimodal_outputs={
                    "model_outputs": [empty] * num_req,
                    "sr": [sr_tensor] * num_req,
                },
            )

We have set some guards in code. This is enough?

i think so

@linyueqian
Copy link
Copy Markdown
Collaborator

resolve conflicts and should be good to go.

Signed-off-by: sy0307 <sy0307@users.noreply.github.com>
@linyueqian linyueqian added the ready label to trigger buildkite CI label Mar 20, 2026
@linyueqian
Copy link
Copy Markdown
Collaborator

fix pre-commit please

Signed-off-by: Sy03 <1370724210@qq.com>
@Sy0307
Copy link
Copy Markdown
Contributor Author

Sy0307 commented Mar 20, 2026

fix pre-commit please

Fixed.

@linyueqian linyueqian merged commit 072647e into vllm-project:main Mar 21, 2026
7 of 8 checks passed
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…#1859)

Signed-off-by: sy0307 <sy0307@users.noreply.github.com>
Signed-off-by: Sy03 <1370724210@qq.com>
Co-authored-by: sy0307 <sy0307@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants