[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU by Sy0307 · Pull Request #2606 · vllm-project/vllm-omni

Sy0307 · 2026-04-08T19:13:28Z

Purpose

Fix Fish Speech S2 Pro voice cloning FileNotFoundError when running on multi-GPU with distributed_executor_backend: "mp".

Root cause: The API server writes reference audio to a temporary /tmp/fish_ref_*.npy file and passes the file path to workers via additional_information. When workers are spawned as separate processes (multiproc multi-GPU), they cannot access the API server's /tmp file (different process namespace / node / container).

Fix: Pass reference audio data inline as a torch.Tensor through additional_information, which uses the serialization layer's efficient binary tensor_data path. No filesystem dependency between processes.

Changes across 5 files (+14/-28):

serving_speech.py: Replace tempfile.NamedTemporaryFile npy write with inline torch.Tensor (ref_audio_path → ref_audio_wav)
fish_speech_slow_ar.py: Read ref_audio_wav tensor from info_dict instead of np.load(ref_audio_path) + os.remove()
end2end.py: Same pattern change for offline example
test_serving_speech.py: Update assertions from file path check to tensor type check
test_fish_speech_regressions.py: Replace ref_audio_path + np.load/os.remove mocks with ref_audio_wav: torch.tensor([0.0])

Test Plan

pytest tests/entrypoints/openai_api/test_serving_speech.py -k fish -x
pytest tests/model_executor/models/test_fish_speech_regressions.py -x
Manual: launch Fish Speech S2 Pro with 2+ GPUs, send voice clone request with ref_audio + ref_text

Test Result

Unit tests updated to match new data path. Pending CI validation.

chatgpt-codex-connector · 2026-04-08T19:13:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian

Clean fix. The root cause (cross-process temp file inaccessibility with mp backend) is well understood and the approach -- passing inline torch.Tensor through the existing tensor_data serialization path -- is the right one.

A few notes:

Serialization cost is fine. A 30s clip at 44.1kHz mono float32 = ~5.2MB through binary serialization, which is strictly better than the old disk write + read path. The existing _REF_AUDIO_MAX_DURATION cap bounds the worst case.

Interaction with #2609 (voice cache). Our voice cache PR was built on the old ref_audio_path pattern. We will rebase #2609 onto this once it merges -- the cache-hit temp file cleanup code becomes unnecessary, which is actually a simplification.

Minor (non-blocking): The consumer side (fish_speech_slow_ar.py) does not validate tensor shape/dimensionality (e.g. must be 1-D, non-empty). This is pre-existing (the old np.load path was similarly unvalidated), so not a regression -- could be a follow-up.

LGTM.

linyueqian · 2026-04-08T21:09:26Z

fix dco pls

Signed-off-by: Sy03 <1370724210@qq.com>

linyueqian

LGTM

…llm-project#2606) Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 requested a review from hsliuustc0106 as a code owner April 8, 2026 19:13

Sy0307 mentioned this pull request Apr 8, 2026

[Bug]: Fish Speech S2 Pro: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy' ONLY when server is launched on multiple GPUs. #2602

Closed

1 task

linyueqian approved these changes Apr 8, 2026

View reviewed changes

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU

a7cbbc4

Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 force-pushed the fix/fish-speech-multiproc-ref-audio branch from 16105df to a7cbbc4 Compare April 9, 2026 03:58

Merge branch 'main' into fix/fish-speech-multiproc-ref-audio

bb5f351

linyueqian approved these changes Apr 9, 2026

View reviewed changes

linyueqian added the ready label to trigger buildkite CI label Apr 9, 2026

linyueqian enabled auto-merge (squash) April 9, 2026 20:09

linyueqian merged commit 694be6f into vllm-project:main Apr 9, 2026
7 of 8 checks passed

Sy0307 added a commit to Sy0307/vllm-omni that referenced this pull request Apr 10, 2026

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU (v…

c0665fa

…llm-project#2606) Signed-off-by: Sy03 <1370724210@qq.com>

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU (v…

213b393

…llm-project#2606) Signed-off-by: Sy03 <1370724210@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU#2606

[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU#2606
linyueqian merged 2 commits intovllm-project:mainfrom
Sy0307:fix/fish-speech-multiproc-ref-audio

Sy0307 commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 8, 2026

Uh oh!

linyueqian left a comment

Uh oh!

linyueqian commented Apr 8, 2026

Uh oh!

linyueqian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sy0307 commented Apr 8, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 8, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 8, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants