fix(fish_speech): use from_indices() instead of decode() for DAC decoder#2668
fix(fish_speech): use from_indices() instead of decode() for DAC decoder#2668ianliuy wants to merge 3 commits into
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
DCO check failed ( |
The DAC codec's decode() method accepts only a continuous latent
tensor (z), but the decoder was passing discrete codebook indices
along with feature_lengths -- causing:
TypeError: DAC.decode() takes 2 positional arguments but 3 were given
Switch to from_indices() which correctly handles discrete codebook
indices by first dequantizing through the RVQ, then decoding to
waveform. Compute audio_lengths from feature_lengths * hop_length
since from_indices() returns a single tensor (not a tuple).
Update _FakeCodec in tests to match the new calling convention.
Fixes vllm-project#2643
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yiyang Liu <yiyangliu@microsoft.com>
|
Amended the commit with |
Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
|
Hi @hsliuustc0106, gentle ping 🙏 This PR and two others are all approved by @lishunyang12 with green CI, waiting on your review as the remaining requested reviewer. All three are small fixes:
Happy to address any feedback just wanted to batch these together to keep them on your radar whenever you have a moment. Thanks! |
lishunyang12
left a comment
There was a problem hiding this comment.
LGTM. The fix is correct and well-documented in the PR description.
What was wrong:
DAC.decode(z)expects a continuous latent tensor, but the code passed discrete codebook indices (torch.long). The correct entry point for indices isfrom_indices().decode()/from_indices()both accept only 1 positional arg, so the extrafeature_lengthscaused theTypeError.- The old code unpacked a
(wav, lengths)tuple, but both methods return a single tensor.
Why the fix is correct:
from_indices(codes_bqf)is the right DAC API for discrete codebook indices -> audio waveform.audio_lengths = clamp(feature_lengths * hop_length, max=wav_batch.shape[-1])is the standard way to recover sample-domain lengths from frame-domain lengths for a fixed-hop-length codec, with a defensive upper bound.- The
_FakeCodectest mock is correctly updated to match the new single-tensor return.
No concerns.
Replacing with inline comments
|
Hi @hsliuustc0106 and @lishunyang12, just a polite follow-up on this small DAC decoder fix. The branch is mergeable, DCO/pre-commit/build checks are green, and the prior approval was dismissed after the force-push. Would you be willing to take another look when you have a chance? |
Summary
Fix
TypeError: DAC.decode() takes 2 positional arguments but 3 were givenwhen running fish_speech TTS online server.Root Cause
fish_speech_dac_decoder.py:298callsself._codec.decode(codes_bqf, feature_lengths), but:DAC.decode(z)expects a continuous latent tensor. The code passes discrete codebook indices (torch.long). The correct method isfrom_indices(indices).decode()andfrom_indices()accept only 1 positional argument. The extrafeature_lengthscauses the TypeError.decode()andfrom_indices()return a single tensor, not a(wav, lengths)tuple.Fix
from_indices()internally doesquantizer.decode(indices) decoder(z)the correct indicesaudio pathaudio_lengthscomputed fromfeature_lengths * hop_length(mathematically exact for this architecture)torch.clamp(max=...)as defensive boundAlso updated
_FakeCodecin tests to match the new API.Fixes #2643