[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0 by Dan250124 · Pull Request #3880 · vllm-project/vllm-omni

Dan250124 · 2026-05-26T14:28:52Z

Summary

Fix compatibility between Qwen3-TTS Code2Wav decoder and transformers >= 5.9.0 by adapting to breaking API changes in create_causal_mask.

Problem

The transformers library introduced three breaking changes to masking_utils.create_causal_mask that affect vLLM-Omni's Qwen3-TTS tokenizer/decoder model:

Date	Commit	Change
2026-02-11	`6c6f70c` (#43916)	Renamed `input_embeds` to `inputs_embeds`
2026-03-04	`421c7f6` (#44181)	Removed `cache_position` parameter
2026-05-11	`2d6815e` (#45885)	Removed backward-compatible `input_embeds` alias

Since vLLM's dependency constraints allow installing transformers 5.9.0, users who build vLLM-Omni from source (e.g., on CUDA 12.x where pre-built wheels require CUDA 13.0+) can hit this incompatibility. The error manifests during CUDA Graph warmup for the Code2Wav decoder:

Full error log

(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] Failed to enable CUDA Graph for Code2Wav decoder
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] Traceback (most recent call last):
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/qwen3_tts_code2wav.py", line 709, in load_weights
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     self.decoder.enable_cudagraph(
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 881, in enable_cudagraph
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     self._cudagraph_wrapper.warmup(
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/cuda_graph_decoder_wrapper.py", line 334, in warmup
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     _ = self.decoder(dummy)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]         ^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     return self._call_impl(*args, **kwargs)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../torch/nn/modules/module.py", line 1790, in _call_impl
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     return forward_call(*args, **kwargs)
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 910, in forward
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     hidden = self.pre_transformer(inputs_embeds=hidden).last_hidden_state
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]   File ".../vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 576, in forward
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]     "full_attention": create_causal_mask(**mask_kwargs),
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=9773) WARNING 05-26 08:42:50 [qwen3_tts_code2wav.py:722] TypeError: create_causal_mask() got an unexpected keyword argument 'input_embeds'

This causes CUDA Graph capture to fail silently, falling back to eager mode with a performance penalty.

Fix

Use inspect.signature() to detect the parameter names accepted by create_causal_mask at runtime and pass the correct keyword arguments:

Old transformers (< 5.9.0): passes input_embeds and cache_position
New transformers (>= 5.9.0): passes inputs_embeds only (no cache_position)

This approach is fully backward-compatible - no minimum transformers version bump required.

Changes

vllm_omni/model_executor/models/qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py: Use runtime introspection of create_causal_mask signature to adapt keyword arguments across transformers versions.

Test Plan

Verify Code2Wav decoder loads and CUDA Graph captures successfully with transformers < 5.9.0
Verify Code2Wav decoder loads and CUDA Graph captures successfully with transformers >= 5.9.0
Run Qwen3-TTS end-to-end TTS inference on both transformers versions

chatgpt-codex-connector · 2026-05-26T14:29:00Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Adapt to breaking API changes in transformers masking_utils.create_causal_mask: - input_embeds renamed to inputs_embeds - cache_position parameter removed Use runtime signature introspection to pass correct kwargs across versions. Signed-off-by: Dan250124 <416947747@qq.com>

Move 'import inspect' to correct alphabetical position. Signed-off-by: Dan250124 <416947747@qq.com>

hsliuustc0106 · 2026-05-26T14:40:57Z

lgtm, @linyueqian @Sy0307

Sy0307 · 2026-05-26T15:10:43Z

lgtm. Nice fix.

linyueqian

Looks good to me.

…9.0 (vllm-project#3880) Signed-off-by: Dan250124 <416947747@qq.com>

Dan250124 requested review from Gaohan123, ZeldaHuang, hsliuustc0106, linyueqian, princepride, yuanheng-zhao and ywang96 as code owners May 26, 2026 14:28

Dan250124 force-pushed the hotfix/embedding_compatibility branch from 55b9d36 to cfbe8ee Compare May 26, 2026 14:32

Dan250124 force-pushed the hotfix/embedding_compatibility branch from d5bf426 to 1e49f74 Compare May 26, 2026 14:33

[Lint] Fix import ordering for ruff compliance

5c43fb4

Move 'import inspect' to correct alphabetical position. Signed-off-by: Dan250124 <416947747@qq.com>

Dan250124 force-pushed the hotfix/embedding_compatibility branch from 38f7a40 to 5c43fb4 Compare May 26, 2026 14:38

Merge branch 'main' into hotfix/embedding_compatibility

b501922

Merge branch 'main' into hotfix/embedding_compatibility

bd3e45e

linyueqian added the ready label to trigger buildkite CI label May 26, 2026

linyueqian approved these changes May 26, 2026

View reviewed changes

linyueqian enabled auto-merge (squash) May 26, 2026 15:33

linyueqian merged commit 1f1c82e into vllm-project:main May 26, 2026
8 checks passed

Yadan-Wei mentioned this pull request May 26, 2026

[Bugfix] Fix qwen3-tts create_causal_mask kwarg for transformers >=5.9.0 #3786

Closed

2 tasks

NickCao mentioned this pull request May 27, 2026

[Bug]: vllm-omni run 0.21.0rc1. Compatibility issue when running Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice with transformers 5.9.0. #3903

Closed

1 task

hsliuustc0106 linked an issue May 28, 2026 that may be closed by this pull request

[Bug]: vllm-omni run 0.21.0rc1. Compatibility issue when running Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice with transformers 5.9.0. #3903

Closed

1 task

zengchuang-hw pushed a commit to zengchuang-hw/vllm-omni that referenced this pull request Jun 1, 2026

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.…

b64f707

…9.0 (vllm-project#3880) Signed-off-by: Dan250124 <416947747@qq.com>

86MaxCao pushed a commit to 86MaxCao/vllm-omni that referenced this pull request Jun 4, 2026

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.…

4f92395

…9.0 (vllm-project#3880) Signed-off-by: Dan250124 <416947747@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0#3880

[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0#3880
linyueqian merged 4 commits into
vllm-project:mainfrom
Dan250124:hotfix/embedding_compatibility

Dan250124 commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

hsliuustc0106 commented May 26, 2026

Uh oh!

Sy0307 commented May 26, 2026

Uh oh!

linyueqian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Dan250124 commented May 26, 2026

Summary

Problem

Fix

Changes

Test Plan

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

hsliuustc0106 commented May 26, 2026

Uh oh!

Sy0307 commented May 26, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants