Add Fish Speech S2 Pro support with online serving and voice cloning by linyueqian · Pull Request #1798 · vllm-project/vllm-omni

linyueqian · 2026-03-11T01:02:38Z

Summary

Add Fish Speech S2 Pro (fishaudio/s2-pro) model support with dual-AR architecture (4B Slow AR + Fast AR + DAC decoder)
Implement online serving via /v1/audio/speech endpoint with streaming support
Add voice cloning: DAC-encode reference audio to semantic tokens, prepend as system message conditioning

Changes

Model files (`vllm_omni/model_executor/models/fish_speech/`)

fish_speech_slow_ar.py — Slow AR decoder with RoPE fix and sqrt normalization
fish_speech_fast_ar.py — Fast AR decoder with interleaved RoPE
fish_speech_dac_decoder.py — DAC codec decoder (44.1kHz output)
dac_encoder.py — CPU-based DAC encoder for voice cloning reference audio
configuration_fish_speech.py — Model config (fish_qwen3_omni)

Online serving

Fish Speech prompt builder in serving_speech.py with voice cloning support
Voice cloning: encode ref audio → semantic tokens → system message prefix

Stage config & input processors

fish_speech_s2_pro.yaml — Two-stage pipeline with async chunk streaming
fish_speech.py — Stage input processor for Slow AR → DAC decoder

Examples

examples/offline_inference/fish_speech/end2end.py — Offline inference
examples/online_serving/fish_speech/run_server.sh — Server launch script
examples/online_serving/fish_speech/speech_client.py — API client with voice cloning

Test plan

Offline inference produces intelligible speech
Online serving POST /v1/audio/speech returns valid WAV (44.1kHz, 3.76s)
Streaming mode returns PCM chunks (254KB)
Voice cloning with ref_audio + ref_text produces cloned voice output (2.83s)
Multi-request batching
Long-form text synthesis

Usage

# Start server
vllm-omni serve fishaudio/s2-pro \
    --stage-configs-path vllm_omni/model_executor/stage_configs/fish_speech_s2_pro.yaml \
    --omni --trust-remote-code --enforce-eager

# Basic TTS
curl -X POST http://localhost:8091/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{"model":"fishaudio/s2-pro","input":"Hello world","voice":"default"}' \
    --output output.wav

# Voice cloning
python examples/online_serving/fish_speech/speech_client.py \
    --text "Hello world" --ref-audio ref.wav --ref-text "Reference transcript"

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 444cdb7efb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-11T01:10:09Z

+        semantic_code = semantic_token_id.reshape(bsz) - semantic_begin
+
+        all_codes = torch.empty(bsz, num_cb, dtype=torch.long, device=device)
+        all_codes[:, 0] = semantic_code


Clamp non-semantic token ids before emitting codec codes

When the sampler emits <|im_end|> (which is explicitly allowed by the logits mask), semantic_token_id - semantic_begin_id becomes negative here and is written into codebook-0. That frame is then forwarded downstream as codec input, but DAC decoding expects non-negative code indices, so end-of-sequence steps can produce invalid codes and corrupted/failing decode at request tail. Please map out-of-range semantic ids to a safe pad value before filling all_codes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-11T01:10:09Z

+            with torch.cuda.amp.autocast(dtype=torch.float32):
+                wav, audio_lengths = self._codec.decode(codes_bqf, feature_lengths)


Drop unsupported float32 autocast in DAC decode path

This decode block enters CUDA autocast with dtype=torch.float32, but CUDA AMP only supports reduced-precision autocast dtypes (fp16/bf16). In CUDA deployments this can raise at runtime when decoding audio, turning normal synthesis requests into failures. If full precision is desired, remove autocast (or disable it) instead of requesting fp32 autocast.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-11T01:10:09Z

+        additional_information: dict[str, Any] = {
+            "text": [request.input],
+            "max_new_tokens": [request.max_new_tokens or 4096],
+        }


Enforce Fish Speech max_new_tokens in actual sampling

max_new_tokens is stored in additional_information here, but the Fish Speech generation path still uses stage default sampling (max_tokens: 200) and no Fish model code reads this field, so caller-specified output length is silently ignored. This makes API behavior inconsistent (requests that ask for shorter/longer generations do not take effect).

Useful? React with 👍 / 👎.

lishunyang12

left a couple comments — mainly around the duplicated DAC codec construction and the resampling quality.

hsliuustc0106

Summary

Adds Fish Speech S2 Pro model support with:

Dual-AR architecture (4B Slow AR + Fast AR + DAC decoder)
Online serving via /v1/audio/speech with streaming
Voice cloning via DAC-encoded reference audio
Comprehensive docs and examples

Validated

✅ DCO signed
✅ All CI checks passed
✅ Offline inference, online serving, streaming mode, and voice cloning tested per PR description
✅ Docs updated (supported_models.md, speech_api.md, examples)
✅ Stage config with async chunk streaming

Scope

24 files with clean model structure:

Model files: slow_ar, fast_ar, dac_decoder, dac_encoder, configuration
Online serving: prompt builder with voice cloning support
Examples: offline inference, online serving, gradio demo
Tests mentioned but not in diff (assume tested locally)

Comprehensive new model integration.

hsliuustc0106 · 2026-03-12T15:49:01Z

any inference speed result?

linyueqian · 2026-03-12T15:59:39Z

any inference speed result?

rtf is about 0.52. ttfp is 131 ms. @Sy0307 is working on several optimization in subsequent pr.

Implements the Dual-AR TTS pipeline for Fish Speech S2 Pro with two stages: - Stage 0: Slow AR (Qwen3-based text model) generates semantic tokens with Fast AR codebook predictor for residual codes - Stage 1: DAC decoder converts codec indices to 44.1kHz audio waveform Key implementation details: - Interleaved (GPT-J) RoPE style matching Fish Speech training - Codebook embedding normalization by sqrt(num_codebooks + 1) - DAC hop length of 2048 (512 decoder * 4 quantizer upsample) - Async chunk streaming with left-context overlap for smooth audio - Semantic token masking (only semantic range + im_end allowed) Signed-off-by: linyueqian <linyueqian@outlook.com>

- Add fish_speech_slow_ar to TTS model stages in serving_speech.py - Build Fish Speech prompts with chat template and <|voice|> token - Support voice cloning via ref_audio + ref_text (DAC-encodes reference audio to semantic tokens on CPU and prepends as system message) - Add DAC encoder utility for reference audio encoding - Add server launch script and client example Signed-off-by: linyueqian <linyueqian@outlook.com>

…kens - Clamp semantic token IDs to valid codebook range in Fast AR; im_end or other non-semantic tokens now map to 0 instead of going negative - Replace unsupported float32 autocast with autocast(enabled=False) in DAC decoder to avoid CUDA AMP runtime errors - Override Stage-0 max_tokens from caller-specified max_new_tokens so Fish Speech API requests respect output length parameter Signed-off-by: linyueqian <linyueqian@outlook.com>

- Interactive web UI with text input and voice cloning support - Streaming (progressive PCM) and non-streaming modes - Voice cloning via audio upload/URL + transcript - Combined server + demo launch script (run_gradio_demo.sh) Signed-off-by: linyueqian <linyueqian@outlook.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>

…llm-project#1798) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: KexiongYu <yukexiong1@huawei.com>

univa-HARRY · 2026-03-16T08:05:44Z

Does this not work on the vllm/vllm-openai:v0.16.0-based image?
A language_model_only Pydantic error occurs.

Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1234, in _stage_worker_async_entry
    asyncio.run(_stage_worker_async(model, stage_payload, in_q, out_q, batch_timeout, stage_init_timeout))
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1368, in _stage_worker_async
    vllm_config = omni_engine_args.create_engine_config(usage_context=usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/engine/arg_utils.py", line 304, in create_model_config
    omni_config = OmniModelConfig(
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 141, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for OmniModelConfig
language_model_only
  Unexpected keyword argument [type=unexpected_keyword_argument, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/unexpected_keyword_argument
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1234, in _stage_worker_async_entry
    asyncio.run(_stage_worker_async(model, stage_payload, in_q, out_q, batch_timeout, stage_init_timeout))
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1368, in _stage_worker_async
    vllm_config = omni_engine_args.create_engine_config(usage_context=usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/engine/arg_utils.py", line 304, in create_model_config
    omni_config = OmniModelConfig(
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 141, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for OmniModelConfig
language_model_only
  Unexpected keyword argument [type=unexpected_keyword_argument, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/unexpected_keyword_argument
(APIServer pid=7) Traceback (most recent call last):
(APIServer pid=7)   File "/usr/local/bin/vllm-omni", line 10, in <module>
(APIServer pid=7)     sys.exit(main())
(APIServer pid=7)              ^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/cli/main.py", line 53, in main
(APIServer pid=7)     args.dispatch_function(args)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/cli/serve.py", line 80, in cmd
(APIServer pid=7)     uvloop.run(omni_run_server(args))
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=7)     return __asyncio.run(
(APIServer pid=7)            ^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=7)     return runner.run(main)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=7)     return self._loop.run_until_complete(task)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=7)     return await main
(APIServer pid=7)            ^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 232, in omni_run_server
(APIServer pid=7)     await omni_run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 250, in omni_run_server_worker
(APIServer pid=7)     async with build_async_omni(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=7)     return await anext(self.gen)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 354, in build_async_omni
(APIServer pid=7)     async with build_async_omni_from_stage_config(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=7)     return await anext(self.gen)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 397, in build_async_omni_from_stage_config
(APIServer pid=7)     async_omni = AsyncOmni(model=args.model, **kwargs)
(APIServer pid=7)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 132, in __init__
(APIServer pid=7)     super().__init__(model, **kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 196, in __init__
(APIServer pid=7)     self._initialize_stages(model, kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 375, in _initialize_stages
(APIServer pid=7)     self._wait_for_stages_ready(timeout=init_timeout)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 256, in _wait_for_stages_ready
(APIServer pid=7)     super()._wait_for_stages_ready(timeout)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 548, in _wait_for_stages_ready
(APIServer pid=7)     if result := stage.try_collect():
(APIServer pid=7)                  ^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 714, in try_collect
(APIServer pid=7)     raise RuntimeError(f"OmniStage Worker process died unexpectedly with exit code {self._proc.exitcode}")
(APIServer pid=7) RuntimeError: OmniStage Worker process died unexpectedly with exit code 1

mru4913 · 2026-03-17T09:19:51Z

What's the release date? （for docker deployment）

linyueqian · 2026-03-17T16:07:36Z

Does this not work on the vllm/vllm-openai:v0.16.0-based image? A language_model_only Pydantic error occurs.

Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1234, in _stage_worker_async_entry
    asyncio.run(_stage_worker_async(model, stage_payload, in_q, out_q, batch_timeout, stage_init_timeout))
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1368, in _stage_worker_async
    vllm_config = omni_engine_args.create_engine_config(usage_context=usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/engine/arg_utils.py", line 304, in create_model_config
    omni_config = OmniModelConfig(
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 141, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for OmniModelConfig
language_model_only
  Unexpected keyword argument [type=unexpected_keyword_argument, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/unexpected_keyword_argument
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1234, in _stage_worker_async_entry
    asyncio.run(_stage_worker_async(model, stage_payload, in_q, out_q, batch_timeout, stage_init_timeout))
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1368, in _stage_worker_async
    vllm_config = omni_engine_args.create_engine_config(usage_context=usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm-omni/vllm_omni/engine/arg_utils.py", line 304, in create_model_config
    omni_config = OmniModelConfig(
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 141, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for OmniModelConfig
language_model_only
  Unexpected keyword argument [type=unexpected_keyword_argument, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.9/v/unexpected_keyword_argument
(APIServer pid=7) Traceback (most recent call last):
(APIServer pid=7)   File "/usr/local/bin/vllm-omni", line 10, in <module>
(APIServer pid=7)     sys.exit(main())
(APIServer pid=7)              ^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/cli/main.py", line 53, in main
(APIServer pid=7)     args.dispatch_function(args)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/cli/serve.py", line 80, in cmd
(APIServer pid=7)     uvloop.run(omni_run_server(args))
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=7)     return __asyncio.run(
(APIServer pid=7)            ^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=7)     return runner.run(main)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=7)     return self._loop.run_until_complete(task)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=7)     return await main
(APIServer pid=7)            ^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 232, in omni_run_server
(APIServer pid=7)     await omni_run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 250, in omni_run_server_worker
(APIServer pid=7)     async with build_async_omni(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=7)     return await anext(self.gen)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 354, in build_async_omni
(APIServer pid=7)     async with build_async_omni_from_stage_config(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=7)     return await anext(self.gen)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/openai/api_server.py", line 397, in build_async_omni_from_stage_config
(APIServer pid=7)     async_omni = AsyncOmni(model=args.model, **kwargs)
(APIServer pid=7)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 132, in __init__
(APIServer pid=7)     super().__init__(model, **kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 196, in __init__
(APIServer pid=7)     self._initialize_stages(model, kwargs)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 375, in _initialize_stages
(APIServer pid=7)     self._wait_for_stages_ready(timeout=init_timeout)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 256, in _wait_for_stages_ready
(APIServer pid=7)     super()._wait_for_stages_ready(timeout)
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni.py", line 548, in _wait_for_stages_ready
(APIServer pid=7)     if result := stage.try_collect():
(APIServer pid=7)                  ^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/app/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 714, in try_collect
(APIServer pid=7)     raise RuntimeError(f"OmniStage Worker process died unexpectedly with exit code {self._proc.exitcode}")
(APIServer pid=7) RuntimeError: OmniStage Worker process died unexpectedly with exit code 1

needs to use 0.17.0

linyueqian · 2026-03-17T16:08:26Z

What's the release date? （for docker deployment）

please refer to https://docs.google.com/document/d/1OY_11S0FdOzY5txBdLrtPp4NleEhr7S3OE1YnpOWu9w/ the next formal release date is Mar 27

…llm-project#1798) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

Morris-Lucifer · 2026-03-20T13:54:14Z

Subject: Error BadRequestError: This model does not support generation when serving Fish Speech S2 Pro

Hi @linyueqian ,

I am testing the newly merged Fish Speech S2 Pro support using vllm-omni (version 0.17.0rc2.dev121+gb33d7637d). I encountered a 400 BadRequestError during online serving.

Server Start Command:

vllm serve /path/to/s2-pro \
  --served-model-name s2-pro \
  --stage-configs-path vllm_omni/model_executor/stage_configs/fish_speech_s2_pro.yaml \
  --omni --trust-remote-code --enforce-eager

Request Payload:
I am using the speech_client.py provided in the examples.

Error Log:

(APIServer pid=5191) INFO 03-20 21:36:47 [serving_speech.py:992] TTS speech request speech-xxx: text='...', model=s2-pro
(APIServer pid=5191) INFO: 172.22.16.1:58043 - "POST /v1/audio/speech HTTP/1.1" 400 Bad Request
{"error":{"message":"This model does not support generation","type":"BadRequestError","param":null,"code":400}}

I have already ensured model_type is fish_qwen3_omni, but the error persists.
Is there any specific requirement for the config.json or additional flags needed to enable the generation path for this dual-AR architecture?

Thanks!

linyueqian · 2026-03-20T14:00:47Z

Subject: Error BadRequestError: This model does not support generation when serving Fish Speech S2 Pro

Hi @linyueqian ,

I am testing the newly merged Fish Speech S2 Pro support using vllm-omni (version 0.17.0rc2.dev121+gb33d7637d). I encountered a 400 BadRequestError during online serving.

Server Start Command:
vllm serve /path/to/s2-pro \

  --served-model-name s2-pro \

  --stage-configs-path vllm_omni/model_executor/stage_configs/fish_speech_s2_pro.yaml \

  --omni --trust-remote-code --enforce-eager
Request Payload:

I am using the speech_client.py provided in the examples.

Error Log:
(APIServer pid=5191) INFO 03-20 21:36:47 [serving_speech.py:992] TTS speech request speech-xxx: text='...', model=s2-pro

(APIServer pid=5191) INFO: 172.22.16.1:58043 - "POST /v1/audio/speech HTTP/1.1" 400 Bad Request

{"error":{"message":"This model does not support generation","type":"BadRequestError","param":null,"code":400}}
I have already ensured model_type is fish_qwen3_omni, but the error persists.

Is there any specific requirement for the config.json or additional flags needed to enable the generation path for this dual-AR architecture?

Thanks!

Can you try to add is_comprehension=True in your config and see if that works?

BadDeveloper2022 · 2026-03-23T03:42:30Z

v0.18.0rc1 Bug : WARNING 03-23 11:37:05 [serving_speech.py:331] Failed to estimate TTS prompt length, using fallback 2048: 'FishSpeechConfig' object has no attribute 'talker_config'
INFO 03-23 11:37:05 [serving_speech.py:992] TTS speech request speech-aded8318469b416f: text='Hello ,Good Morning', model=Base
127.0.0.1:26695 - "POST /v1/audio/speech HTTP/1.1" 200 OK
ERROR 03-23 11:37:05 [serving_speech.py:749] Streaming speech generation failed for speech-aded8318469b416f: This model does not support generation
ERROR 03-23 11:37:05 [serving_speech.py:749] Traceback (most recent call last):
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/entrypoints/openai/serving_speech.py", line 695, in _generate_audio_chunks
ERROR 03-23 11:37:05 [serving_speech.py:749] async for res in generator:
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 193, in generate
ERROR 03-23 11:37:05 [serving_speech.py:749] await self.engine.add_request_async(
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 952, in add_request_async
ERROR 03-23 11:37:05 [serving_speech.py:749] self.add_request(
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 923, in add_request
ERROR 03-23 11:37:05 [serving_speech.py:749] msg = self._build_add_request_message(
ERROR 03-23 11:37:05 [serving_speech.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 638, in _build_add_request_message
ERROR 03-23 11:37:05 [serving_speech.py:749] request = self.input_processor.process_inputs(
ERROR 03-23 11:37:05 [serving_speech.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 201, in process_inputs
ERROR 03-23 11:37:05 [serving_speech.py:749] self._validate_params(params, supported_tasks)
ERROR 03-23 11:37:05 [serving_speech.py:749] File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 94, in _validate_params
ERROR 03-23 11:37:05 [serving_speech.py:749] raise ValueError("This model does not support generation")
ERROR 03-23 11:37:05 [serving_speech.py:749] ValueError: This model does not support generation
Exception in ASGI application
+ Exception Group Traceback (most recent call last):
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_utils.py", line 81, in collapse_excgroups
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/responses.py", line 270, in call
async with anyio.create_task_group() as task_group:
^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 783, in aexit
raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
| result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/fastapi/applications.py", line 1160, in call
| await super().call(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/applications.py", line 107, in call
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in call
| raise exc
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in call
| await self.app(scope, receive, _send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/middleware/cors.py", line 87, in call
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 177, in call
| raise exc
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 175, in call
| await self.app(scope, receive, send_wrapper)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 63, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/routing.py", line 716, in call
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/routing.py", line 736, in app
| await route.handle(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/routing.py", line 290, in handle
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/fastapi/routing.py", line 119, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/fastapi/routing.py", line 106, in app
| await response(scope, receive, send)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/responses.py", line 269, in call
| with collapse_excgroups():
^^^^^^^^^^^^^^^^^^^^
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/contextlib.py", line 158, in exit
| self.gen.throw(value)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/_utils.py", line 87, in collapse_excgroups
| raise exc
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/responses.py", line 273, in wrap
| await func()
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/starlette/responses.py", line 253, in stream_response
| async for chunk in self.body_iterator:
| File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/entrypoints/openai/serving_speech.py", line 695, in _generate_audio_chunks
| async for res in generator:
| File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 193, in generate
| await self.engine.add_request_async(
| File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 952, in add_request_async
| self.add_request(
| File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 923, in add_request
| msg = self._build_add_request_message(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/ai/minicpm/0.18/vllm-omni/vllm_omni/engine/async_omni_engine.py", line 638, in _build_add_request_message
| request = self.input_processor.process_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 201, in process_inputs
| self._validate_params(params, supported_tasks)
| File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 94, in _validate_params
| raise ValueError("This model does not support generation")
| ValueError: This model does not support generation
+------------------------------------
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm-omni/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

linyueqian · 2026-03-24T04:54:04Z

This was fixed in #2058 — the stage configs were missing is_comprehension: true, which caused supported_tasks to not include "generate". Please update to the latest main and try again.

1615070057 · 2026-03-31T13:59:51Z

The cloning effect feels mediocre, sounding like a Westerner speaking Chinese. Are there specific requirements for the cloning audio? My test audio is 18 seconds in MP3 format.

…llm-project#1798) Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner March 11, 2026 01:02

chatgpt-codex-connector Bot reviewed Mar 11, 2026

View reviewed changes

linyueqian force-pushed the feature/fish-speech-s2-pro branch from ded4c6b to a49fca5 Compare March 11, 2026 01:56

linyueqian mentioned this pull request Mar 11, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

lishunyang12 reviewed Mar 11, 2026

View reviewed changes

Comment thread vllm_omni/model_executor/models/fish_speech/fish_speech_dac_decoder.py Outdated

Comment thread vllm_omni/model_executor/models/fish_speech/dac_encoder.py

Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026

hsliuustc0106 approved these changes Mar 12, 2026

View reviewed changes

linyueqian added 6 commits March 12, 2026 12:42

Add documentation for Fish Speech S2 Pro (offline + online serving)

51fc491

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Remove unrelated qwen3_tts cuda_graph_decoder_wrapper change

c69aaff

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian force-pushed the feature/fish-speech-s2-pro branch from 792076b to c69aaff Compare March 12, 2026 17:00

Extract shared DAC codec builder and use torchaudio resample

283b9b9

Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian added the ready label to trigger buildkite CI label Mar 12, 2026

Sy0307 mentioned this pull request Mar 12, 2026

[Perf] Improve Fish Speech S2 Pro inference performance #1859

Merged

linyueqian merged commit 366b336 into vllm-project:main Mar 12, 2026
6 of 7 checks passed

linyueqian mentioned this pull request Mar 21, 2026

[Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection #2058

Merged

3 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

Add Fish Speech S2 Pro support with online serving and voice cloning (v…

c605e02

…llm-project#1798) Signed-off-by: linyueqian <linyueqian@outlook.com>

		with torch.cuda.amp.autocast(dtype=torch.float32):
		wav, audio_lengths = self._codec.decode(codes_bqf, feature_lengths)

Conversation

linyueqian commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Model files (vllm_omni/model_executor/models/fish_speech/)

Online serving

Stage config & input processors

Examples

Test plan

Usage

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Summary

Validated

Scope

Uh oh!

hsliuustc0106 commented Mar 12, 2026

Uh oh!

linyueqian commented Mar 12, 2026

Uh oh!

Uh oh!

univa-HARRY commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mru4913 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian commented Mar 17, 2026

Uh oh!

linyueqian commented Mar 17, 2026

Uh oh!

Morris-Lucifer commented Mar 20, 2026

Uh oh!

linyueqian commented Mar 20, 2026

Uh oh!

BadDeveloper2022 commented Mar 23, 2026

Uh oh!

linyueqian commented Mar 24, 2026

Uh oh!

1615070057 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

linyueqian commented Mar 11, 2026 •

edited

Loading

Model files (`vllm_omni/model_executor/models/fish_speech/`)

univa-HARRY commented Mar 16, 2026 •

edited

Loading

mru4913 commented Mar 17, 2026 •

edited

Loading