[Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path#2052
[Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path#2052NickCao wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c8add1d79d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…av path Qwen3TTSCode2Wav.forward() compares ctx_frames against 0 (line 287: "if ctx_frames > 0"), but the non-async Base path passes left_context_size as a single-element list [ref_code_len] to survive serialize_additional_information(), which only supports tensor and list values (plain ints are dropped). The async chunk path bypasses serialization and passes a plain int directly. The list wrapper in talker2code2wav() is intentional — without it the serializer drops the key and ctx_frames silently falls back to 0, causing ref_code context to never be trimmed from the output audio. Fix the consumer (Qwen3TTSCode2Wav.forward) to unwrap the list when present, handling both the serialized list form (non-async) and the plain int form (async chunk path). The bug was introduced in PR vllm-project#1731 (761eff9) which added ref_code support to the non-async path but did not account for the type mismatch between serialized list and the int comparison downstream. It was masked by the token overflow crash (max_model_len=32768 < prompt tokens) which prevented the code from reaching the comparison. Fixes: vllm-project#2030 Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
c8add1d to
f70ed06
Compare
|
LGTM. Confirm there is type mismatch indeed introduced by #1731 . Thanks for fix :) |
lishunyang12
left a comment
There was a problem hiding this comment.
Fix looks correct. A couple of suggestions.
| # plain ints); async chunk path sends a plain int. | ||
| # Handle both. | ||
| if isinstance(val, list): | ||
| val = val[0] if val else 0 |
There was a problem hiding this comment.
This silently discards extra elements if the list has len > 1. Not currently possible, but a defensive assert or warning would save debugging time if the producer changes.
| val = val[0] if val else 0 | |
| val = val[0] if len(val) == 1 else val[0] |
Actually, simpler — just index [0] and let it raise IndexError on empty rather than silently returning 0:
| val = val[0] if val else 0 | |
| val = val[0] |
| # Handle both. | ||
| if isinstance(val, list): | ||
| val = val[0] if val else 0 | ||
| left_context_size[i] = int(val) |
There was a problem hiding this comment.
Nit: the int() cast is good but consider doing the normalization in a small helper — fish_speech and any future consumer that passes scalars through additional_information will hit the same list-vs-scalar issue. A shared unwrap_scalar(val) in e.g. serialization.py would avoid duplicating this pattern.
| codec_codes = audio_codes.transpose(0, 1).cpu().reshape(-1).tolist() | ||
| # Wrap ref_code_len in a list: serialize_additional_information() | ||
| # only preserves tensor and list values; plain ints are dropped. | ||
| # The consumer (Qwen3TTSCode2Wav.forward) unwraps the list. |
There was a problem hiding this comment.
The real fix should be in serialize_additional_information — it should support scalar int/float values instead of silently dropping them. Wrapping in a list at the producer and unwrapping at the consumer is a workaround that every caller has to know about. Would you consider adding scalar support to AdditionalInformationEntry (e.g. a scalar_data field) in a follow-up?
|
Closing in favor of @Sy0307's pending fix of the root cause. |
Purpose
Disclaimer: this PR contains AI-generated code
Qwen3TTSCode2Wav.forward() compares ctx_frames against 0 (line 287: "if ctx_frames > 0"), but the non-async Base path passes left_context_size as a single-element list [ref_code_len] to survive serialize_additional_information(), which only supports tensor and list values (plain ints are dropped). The async chunk path bypasses serialization and passes a plain int directly.
The list wrapper in talker2code2wav() is intentional — without it the serializer drops the key and ctx_frames silently falls back to 0, causing ref_code context to never be trimmed from the output audio.
Fix the consumer (Qwen3TTSCode2Wav.forward) to unwrap the list when present, handling both the serialized list form (non-async) and the plain int form (async chunk path).
The bug was introduced in PR #1731 (761eff9) which added ref_code support to the non-async path but did not account for the type mismatch between serialized list and the int comparison downstream. It was masked by the token overflow crash (max_model_len=32768 < prompt tokens) which prevented the code from reaching the comparison.
Fixes: #203
Test Plan
After modifying
qwen3_tts_no_async_chunk.yamlto increase themax_model_len, run/workspace/.venv/bin/python -m pytest -sv tests/e2e/online_serving/test_qwen3_tts_base_expansion.py --run-level "advanced_model"Test Result
TBD
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)