Skip to content

[Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path#2052

Closed
NickCao wants to merge 1 commit into
vllm-project:mainfrom
NickCao:fix/left-context-size-type
Closed

[Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path#2052
NickCao wants to merge 1 commit into
vllm-project:mainfrom
NickCao:fix/left-context-size-type

Conversation

@NickCao
Copy link
Copy Markdown
Contributor

@NickCao NickCao commented Mar 20, 2026

Purpose

Disclaimer: this PR contains AI-generated code

Qwen3TTSCode2Wav.forward() compares ctx_frames against 0 (line 287: "if ctx_frames > 0"), but the non-async Base path passes left_context_size as a single-element list [ref_code_len] to survive serialize_additional_information(), which only supports tensor and list values (plain ints are dropped). The async chunk path bypasses serialization and passes a plain int directly.

The list wrapper in talker2code2wav() is intentional — without it the serializer drops the key and ctx_frames silently falls back to 0, causing ref_code context to never be trimmed from the output audio.

Fix the consumer (Qwen3TTSCode2Wav.forward) to unwrap the list when present, handling both the serialized list form (non-async) and the plain int form (async chunk path).

The bug was introduced in PR #1731 (761eff9) which added ref_code support to the non-async path but did not account for the type mismatch between serialized list and the int comparison downstream. It was masked by the token overflow crash (max_model_len=32768 < prompt tokens) which prevented the code from reaching the comparison.

Fixes: #203

Test Plan

After modifying qwen3_tts_no_async_chunk.yaml to increase the max_model_len, run

/workspace/.venv/bin/python -m pytest -sv tests/e2e/online_serving/test_qwen3_tts_base_expansion.py --run-level "advanced_model"

Test Result

TBD


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@NickCao NickCao requested a review from hsliuustc0106 as a code owner March 20, 2026 16:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8add1d79d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/model_executor/stage_input_processors/qwen3_tts.py Outdated
…av path

Qwen3TTSCode2Wav.forward() compares ctx_frames against 0 (line 287:
"if ctx_frames > 0"), but the non-async Base path passes
left_context_size as a single-element list [ref_code_len] to survive
serialize_additional_information(), which only supports tensor and
list values (plain ints are dropped). The async chunk path bypasses
serialization and passes a plain int directly.

The list wrapper in talker2code2wav() is intentional — without it
the serializer drops the key and ctx_frames silently falls back to
0, causing ref_code context to never be trimmed from the output
audio.

Fix the consumer (Qwen3TTSCode2Wav.forward) to unwrap the list when
present, handling both the serialized list form (non-async) and the
plain int form (async chunk path).

The bug was introduced in PR vllm-project#1731 (761eff9) which added ref_code
support to the non-async path but did not account for the type
mismatch between serialized list and the int comparison downstream.
It was masked by the token overflow crash (max_model_len=32768 <
prompt tokens) which prevented the code from reaching the
comparison.

Fixes: vllm-project#2030

Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
@NickCao NickCao force-pushed the fix/left-context-size-type branch from c8add1d to f70ed06 Compare March 20, 2026 16:43
@NickCao NickCao changed the title [Bugfix] Fix left_context_size passed as list instead of int in non-async Base path [Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path Mar 20, 2026
@linyueqian
Copy link
Copy Markdown
Collaborator

@Sy0307 @yenuo26 ptal.

@Sy0307
Copy link
Copy Markdown
Contributor

Sy0307 commented Mar 22, 2026

LGTM. Confirm there is type mismatch indeed introduced by #1731 . Thanks for fix :)

@linyueqian linyueqian added the ready label to trigger buildkite CI label Mar 22, 2026
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks correct. A couple of suggestions.

# plain ints); async chunk path sends a plain int.
# Handle both.
if isinstance(val, list):
val = val[0] if val else 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently discards extra elements if the list has len > 1. Not currently possible, but a defensive assert or warning would save debugging time if the producer changes.

Suggested change
val = val[0] if val else 0
val = val[0] if len(val) == 1 else val[0]

Actually, simpler — just index [0] and let it raise IndexError on empty rather than silently returning 0:

Suggested change
val = val[0] if val else 0
val = val[0]

# Handle both.
if isinstance(val, list):
val = val[0] if val else 0
left_context_size[i] = int(val)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the int() cast is good but consider doing the normalization in a small helper — fish_speech and any future consumer that passes scalars through additional_information will hit the same list-vs-scalar issue. A shared unwrap_scalar(val) in e.g. serialization.py would avoid duplicating this pattern.

codec_codes = audio_codes.transpose(0, 1).cpu().reshape(-1).tolist()
# Wrap ref_code_len in a list: serialize_additional_information()
# only preserves tensor and list values; plain ints are dropped.
# The consumer (Qwen3TTSCode2Wav.forward) unwraps the list.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real fix should be in serialize_additional_information — it should support scalar int/float values instead of silently dropping them. Wrapping in a list at the producer and unwrapping at the consumer is a workaround that every caller has to know about. Would you consider adding scalar support to AdditionalInformationEntry (e.g. a scalar_data field) in a follow-up?

@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Mar 23, 2026

Closing in favor of @Sy0307's pending fix of the root cause.

@NickCao NickCao closed this Mar 23, 2026
@Sy0307
Copy link
Copy Markdown
Contributor

Sy0307 commented Mar 25, 2026

Plz continue as #2104 has been merged. @NickCao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: ByteDance-Seed/BAGEL-7B-MoT

4 participants