Skip to content

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU#2606

Merged
linyueqian merged 2 commits intovllm-project:mainfrom
Sy0307:fix/fish-speech-multiproc-ref-audio
Apr 9, 2026
Merged

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU#2606
linyueqian merged 2 commits intovllm-project:mainfrom
Sy0307:fix/fish-speech-multiproc-ref-audio

Conversation

@Sy0307
Copy link
Copy Markdown
Contributor

@Sy0307 Sy0307 commented Apr 8, 2026

Purpose

Fix #2602

Fix Fish Speech S2 Pro voice cloning FileNotFoundError when running on multi-GPU with distributed_executor_backend: "mp".

Root cause: The API server writes reference audio to a temporary /tmp/fish_ref_*.npy file and passes the file path to workers via additional_information. When workers are spawned as separate processes (multiproc multi-GPU), they cannot access the API server's /tmp file (different process namespace / node / container).

Fix: Pass reference audio data inline as a torch.Tensor through additional_information, which uses the serialization layer's efficient binary tensor_data path. No filesystem dependency between processes.

Changes across 5 files (+14/-28):

  • serving_speech.py: Replace tempfile.NamedTemporaryFile npy write with inline torch.Tensor (ref_audio_pathref_audio_wav)
  • fish_speech_slow_ar.py: Read ref_audio_wav tensor from info_dict instead of np.load(ref_audio_path) + os.remove()
  • end2end.py: Same pattern change for offline example
  • test_serving_speech.py: Update assertions from file path check to tensor type check
  • test_fish_speech_regressions.py: Replace ref_audio_path + np.load/os.remove mocks with ref_audio_wav: torch.tensor([0.0])

Test Plan

  • pytest tests/entrypoints/openai_api/test_serving_speech.py -k fish -x
  • pytest tests/model_executor/models/test_fish_speech_regressions.py -x
  • Manual: launch Fish Speech S2 Pro with 2+ GPUs, send voice clone request with ref_audio + ref_text

Test Result

Unit tests updated to match new data path. Pending CI validation.

@Sy0307 Sy0307 requested a review from hsliuustc0106 as a code owner April 8, 2026 19:13
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. The root cause (cross-process temp file inaccessibility with mp backend) is well understood and the approach -- passing inline torch.Tensor through the existing tensor_data serialization path -- is the right one.

A few notes:

Serialization cost is fine. A 30s clip at 44.1kHz mono float32 = ~5.2MB through binary serialization, which is strictly better than the old disk write + read path. The existing _REF_AUDIO_MAX_DURATION cap bounds the worst case.

Interaction with #2609 (voice cache). Our voice cache PR was built on the old ref_audio_path pattern. We will rebase #2609 onto this once it merges -- the cache-hit temp file cleanup code becomes unnecessary, which is actually a simplification.

Minor (non-blocking): The consumer side (fish_speech_slow_ar.py) does not validate tensor shape/dimensionality (e.g. must be 1-D, non-empty). This is pre-existing (the old np.load path was similarly unvalidated), so not a regression -- could be a follow-up.

LGTM.

@linyueqian
Copy link
Copy Markdown
Collaborator

fix dco pls

@Sy0307 Sy0307 force-pushed the fix/fish-speech-multiproc-ref-audio branch from 16105df to a7cbbc4 Compare April 9, 2026 03:58
Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 9, 2026
@linyueqian linyueqian enabled auto-merge (squash) April 9, 2026 20:09
@linyueqian linyueqian merged commit 694be6f into vllm-project:main Apr 9, 2026
7 of 8 checks passed
Sy0307 added a commit to Sy0307/vllm-omni that referenced this pull request Apr 10, 2026
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

2 participants