Skip to content

fix(fish_speech): use from_indices() instead of decode() for DAC decoder#2668

Open
ianliuy wants to merge 3 commits into
vllm-project:mainfrom
ianliuy:fix/issue-2643
Open

fix(fish_speech): use from_indices() instead of decode() for DAC decoder#2668
ianliuy wants to merge 3 commits into
vllm-project:mainfrom
ianliuy:fix/issue-2643

Conversation

@ianliuy
Copy link
Copy Markdown
Contributor

@ianliuy ianliuy commented Apr 10, 2026

Summary

Fix TypeError: DAC.decode() takes 2 positional arguments but 3 were given when running fish_speech TTS online server.

Root Cause

fish_speech_dac_decoder.py:298 calls self._codec.decode(codes_bqf, feature_lengths), but:

  1. Wrong method: DAC.decode(z) expects a continuous latent tensor. The code passes discrete codebook indices (torch.long). The correct method is from_indices(indices).
  2. Wrong arg count: Both decode() and from_indices() accept only 1 positional argument. The extra feature_lengths causes the TypeError.
  3. Wrong return unpacking: decode() and from_indices() return a single tensor, not a (wav, lengths) tuple.

Fix

# Before
wav_batch, audio_lengths = self._codec.decode(codes_bqf, feature_lengths)

# After
wav_batch = self._codec.from_indices(codes_bqf)
audio_lengths = torch.clamp(
    feature_lengths * self._hop_length,
    max=wav_batch.shape[-1],
)
  • from_indices() internally does quantizer.decode(indices) decoder(z) the correct indicesaudio path
  • audio_lengths computed from feature_lengths * hop_length (mathematically exact for this architecture)
  • torch.clamp(max=...) as defensive bound

Also updated _FakeCodec in tests to match the new API.

Fixes #2643

@ianliuy ianliuy requested a review from hsliuustc0106 as a code owner April 10, 2026 04:19
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

DCO check failed (ACTION_REQUIRED). Please amend your commit with git commit -s --amend to add the Signed-off-by line.

The DAC codec's decode() method accepts only a continuous latent
tensor (z), but the decoder was passing discrete codebook indices
along with feature_lengths -- causing:
    TypeError: DAC.decode() takes 2 positional arguments but 3 were given

Switch to from_indices() which correctly handles discrete codebook
indices by first dequantizing through the RVQ, then decoding to
waveform. Compute audio_lengths from feature_lengths * hop_length
since from_indices() returns a single tensor (not a tuple).

Update _FakeCodec in tests to match the new calling convention.

Fixes vllm-project#2643

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yiyang Liu <yiyangliu@microsoft.com>
@ianliuy
Copy link
Copy Markdown
Contributor Author

ianliuy commented Apr 10, 2026

Amended the commit with git commit -s --amend to add the Signed-off-by line. DCO check should pass now. cc - @hsliuustc0106

lishunyang12
lishunyang12 previously approved these changes Apr 11, 2026
@ianliuy
Copy link
Copy Markdown
Contributor Author

ianliuy commented Apr 12, 2026

The ReadTheDocs failure is unrelated to this PR it timed out during pip install .[docs] (15min limit). Other PRs (#2377, #2517) are hitting the same RTD timeout. This PR only touches fish_speech_dac_decoder.py and its test file no docs changes.

Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
@ianliuy
Copy link
Copy Markdown
Contributor Author

ianliuy commented Apr 14, 2026

Hi @hsliuustc0106, gentle ping 🙏 This PR and two others are all approved by @lishunyang12 with green CI, waiting on your review as the remaining requested reviewer.

All three are small fixes:

Happy to address any feedback just wanted to batch these together to keep them on your radar whenever you have a moment. Thanks!

lishunyang12
lishunyang12 previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The fix is correct and well-documented in the PR description.

What was wrong:

  • DAC.decode(z) expects a continuous latent tensor, but the code passed discrete codebook indices (torch.long). The correct entry point for indices is from_indices().
  • decode() / from_indices() both accept only 1 positional arg, so the extra feature_lengths caused the TypeError.
  • The old code unpacked a (wav, lengths) tuple, but both methods return a single tensor.

Why the fix is correct:

  • from_indices(codes_bqf) is the right DAC API for discrete codebook indices -> audio waveform.
  • audio_lengths = clamp(feature_lengths * hop_length, max=wav_batch.shape[-1]) is the standard way to recover sample-domain lengths from frame-domain lengths for a fixed-hop-length codec, with a defensive upper bound.
  • The _FakeCodec test mock is correctly updated to match the new single-tensor return.

No concerns.

@lishunyang12 lishunyang12 dismissed stale reviews from themself April 16, 2026 14:56

Replacing with inline comments

@ianliuy
Copy link
Copy Markdown
Contributor Author

ianliuy commented Apr 30, 2026

Hi @hsliuustc0106 and @lishunyang12, just a polite follow-up on this small DAC decoder fix.

The branch is mergeable, DCO/pre-commit/build checks are green, and the prior approval was dismissed after the force-push. Would you be willing to take another look when you have a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: fish_speech online server bug

3 participants