[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU#2606
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
linyueqian
left a comment
There was a problem hiding this comment.
Clean fix. The root cause (cross-process temp file inaccessibility with mp backend) is well understood and the approach -- passing inline torch.Tensor through the existing tensor_data serialization path -- is the right one.
A few notes:
Serialization cost is fine. A 30s clip at 44.1kHz mono float32 = ~5.2MB through binary serialization, which is strictly better than the old disk write + read path. The existing _REF_AUDIO_MAX_DURATION cap bounds the worst case.
Interaction with #2609 (voice cache). Our voice cache PR was built on the old ref_audio_path pattern. We will rebase #2609 onto this once it merges -- the cache-hit temp file cleanup code becomes unnecessary, which is actually a simplification.
Minor (non-blocking): The consumer side (fish_speech_slow_ar.py) does not validate tensor shape/dimensionality (e.g. must be 1-D, non-empty). This is pre-existing (the old np.load path was similarly unvalidated), so not a regression -- could be a follow-up.
LGTM.
|
fix dco pls |
Signed-off-by: Sy03 <1370724210@qq.com>
16105df to
a7cbbc4
Compare
…llm-project#2606) Signed-off-by: Sy03 <1370724210@qq.com>
…llm-project#2606) Signed-off-by: Sy03 <1370724210@qq.com>
Purpose
Fix #2602
Fix Fish Speech S2 Pro voice cloning
FileNotFoundErrorwhen running on multi-GPU withdistributed_executor_backend: "mp".Root cause: The API server writes reference audio to a temporary
/tmp/fish_ref_*.npyfile and passes the file path to workers viaadditional_information. When workers are spawned as separate processes (multiproc multi-GPU), they cannot access the API server's/tmpfile (different process namespace / node / container).Fix: Pass reference audio data inline as a
torch.Tensorthroughadditional_information, which uses the serialization layer's efficient binarytensor_datapath. No filesystem dependency between processes.Changes across 5 files (+14/-28):
serving_speech.py: Replacetempfile.NamedTemporaryFilenpy write with inlinetorch.Tensor(ref_audio_path→ref_audio_wav)fish_speech_slow_ar.py: Readref_audio_wavtensor frominfo_dictinstead ofnp.load(ref_audio_path)+os.remove()end2end.py: Same pattern change for offline exampletest_serving_speech.py: Update assertions from file path check to tensor type checktest_fish_speech_regressions.py: Replaceref_audio_path+np.load/os.removemocks withref_audio_wav: torch.tensor([0.0])Test Plan
pytest tests/entrypoints/openai_api/test_serving_speech.py -k fish -xpytest tests/model_executor/models/test_fish_speech_regressions.py -xref_audio+ref_textTest Result
Unit tests updated to match new data path. Pending CI validation.