Skip to content

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0#3880

Merged
linyueqian merged 4 commits into
vllm-project:mainfrom
Dan250124:hotfix/embedding_compatibility
May 26, 2026
Merged

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0#3880
linyueqian merged 4 commits into
vllm-project:mainfrom
Dan250124:hotfix/embedding_compatibility

Conversation

@Dan250124
Copy link
Copy Markdown
Contributor

Summary

  • Fix compatibility between Qwen3-TTS Code2Wav decoder and transformers >= 5.9.0 by adapting to breaking API changes in create_causal_mask.

Problem

The transformers library introduced three breaking changes to masking_utils.create_causal_mask that affect vLLM-Omni's Qwen3-TTS tokenizer/decoder model:

Date Commit Change
2026-02-11 6c6f70c (#43916) Renamed input_embeds to inputs_embeds
2026-03-04 421c7f6 (#44181) Removed cache_position parameter
2026-05-11 2d6815e (#45885) Removed backward-compatible input_embeds alias

Since vLLM's dependency constraints allow installing transformers 5.9.0, users who build vLLM-Omni from source (e.g., on CUDA 12.x where pre-built wheels require CUDA 13.0+) can hit this incompatibility. The error manifests during CUDA Graph warmup for the Code2Wav decoder:

Full error log
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] Failed to enable CUDA Graph for Code2Wav decoder
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] Traceback (most recent call last):
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/qwen3_tts_code2wav.py", line 709, in load_weights
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     self.decoder.enable_cudagraph(
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 881, in enable_cudagraph
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     self._cudagraph_wrapper.warmup(
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/cuda_graph_decoder_wrapper.py", line 334, in warmup
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     _ = self.decoder(dummy)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]         ^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     return self._call_impl(*args, **kwargs)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../torch/nn/modules/module.py", line 1790, in _call_impl
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     return forward_call(*args, **kwargs)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 910, in forward
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     hidden = self.pre_transformer(inputs_embeds=hidden).last_hidden_state
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 576, in forward
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     "full_attention": create_causal_mask(**mask_kwargs),
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] TypeError: create_causal_mask() got an unexpected keyword argument 'input_embeds'

This causes CUDA Graph capture to fail silently, falling back to eager mode with a performance penalty.

Fix

Use inspect.signature() to detect the parameter names accepted by create_causal_mask at runtime and pass the correct keyword arguments:

  • Old transformers (< 5.9.0): passes input_embeds and cache_position
  • New transformers (>= 5.9.0): passes inputs_embeds only (no cache_position)

This approach is fully backward-compatible - no minimum transformers version bump required.

Changes

  • vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py: Use runtime introspection of create_causal_mask signature to adapt keyword arguments across transformers versions.

Test Plan

  • Verify Code2Wav decoder loads and CUDA Graph captures successfully with transformers < 5.9.0
  • Verify Code2Wav decoder loads and CUDA Graph captures successfully with transformers >= 5.9.0
  • Run Qwen3-TTS end-to-end TTS inference on both transformers versions

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@Dan250124 Dan250124 force-pushed the hotfix/embedding_compatibility branch from 55b9d36 to cfbe8ee Compare May 26, 2026 14:32
Adapt to breaking API changes in transformers masking_utils.create_causal_mask:
- input_embeds renamed to inputs_embeds
- cache_position parameter removed
Use runtime signature introspection to pass correct kwargs across versions.

Signed-off-by: Dan250124 <416947747@qq.com>
@Dan250124 Dan250124 force-pushed the hotfix/embedding_compatibility branch from d5bf426 to 1e49f74 Compare May 26, 2026 14:33
Move 'import inspect' to correct alphabetical position.

Signed-off-by: Dan250124 <416947747@qq.com>
@Dan250124 Dan250124 force-pushed the hotfix/embedding_compatibility branch from 38f7a40 to 5c43fb4 Compare May 26, 2026 14:38
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

lgtm, @linyueqian @Sy0307

@Sy0307
Copy link
Copy Markdown
Collaborator

Sy0307 commented May 26, 2026

lgtm. Nice fix.

@linyueqian linyueqian added the ready label to trigger buildkite CI label May 26, 2026
Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@linyueqian linyueqian enabled auto-merge (squash) May 26, 2026 15:33
@linyueqian linyueqian merged commit 1f1c82e into vllm-project:main May 26, 2026
8 checks passed
zengchuang-hw pushed a commit to zengchuang-hw/vllm-omni that referenced this pull request Jun 1, 2026
86MaxCao pushed a commit to 86MaxCao/vllm-omni that referenced this pull request Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vllm-omni run 0.21.0rc1. Compatibility issue when running Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice with transformers 5.9.0.

4 participants