[BugFix]: Fix Qwen3-TTS code2wav fails when enforce_eager: false by ChefWu551 · Pull Request #2868 · vllm-project/vllm-omni

ChefWu551 · 2026-04-17T03:33:57Z

Purpose

As described in PR #2866, this PR mainly fixes that issue.
This is also a review for PR #2328.

Test Plan

python /workspace/vllm-omni/benchmarks/qwen3-tts/vllm_omni/bench_tts_serve.py \
--host 127.0.0.1 --port 8899 \
--task-type Base \
--ref-audio /workspace/resource/clone_2.wav \
--ref-text "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you." \
--num-prompts 10 \
--config-name base_baseline \
--result-dir benchmarks/qwen3-tts/results/

Test Result

Warming up with 3 requests...
  Warmup done.
  Running 10 requests with concurrency=4...
  concurrency=4: 100%|██████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]

==================================================
             Serving Benchmark Result             
==================================================
Successful requests:                    10        
Failed requests:                        0         
Maximum request concurrency:            4         
Benchmark duration (s):                 15.60     
Request throughput (req/s):             0.64      
--------------------------------------------------
                End-to-end Latency                
--------------------------------------------------
Mean E2EL (ms):                         5546.95   
Median E2EL (ms):                       5522.70   
P99 E2EL (ms):                          6739.25   
==================================================
                   Audio Result                   
==================================================
Total audio duration generated (s):     42.24     
Audio throughput (audio duration/s):    2.71      
--------------------------------------------------
               Time to First Packet               
--------------------------------------------------
Mean AUDIO_TTFP (ms):                   768.17    
Median AUDIO_TTFP (ms):                 727.03    
P99 AUDIO_TTFP (ms):                    1049.49   
--------------------------------------------------
                 Real Time Factor                 
--------------------------------------------------
Mean AUDIO_RTF:                         1.330     
Median AUDIO_RTF:                       1.436     
P99 AUDIO_RTF:                          1.457     
==================================================

The accuracy

Test Conclusion: Accuracy is satisfactory, and the content of the two audio segments is consistent.

When force_eager is false
output_force_eager_false.wav
When force_eager is true
output_force_eagler_true.wav

Performance (not as good as shown in the chart)

Concurrency	Metric	`force_eager: false`	`force_eager: true`
1	Total duration (s)	19.36	19.07
1	Request throughput (req/s)	0.52	0.52
1	Audio throughput (audio duration/s)	2.18	2.21
1	Mean E2E latency (ms)	1935.29	1906.71
4	Total duration (s)	12.19	11.86
4	Request throughput (req/s)	0.82	0.84
4	Audio throughput (audio duration/s)	3.41	3.44
4	Mean E2E latency (ms)	4321.41	4251.28
10	Total duration (s)	10.34	10.19
10	Request throughput (req/s)	0.97	0.98
10	Audio throughput (audio duration/s)	4.01	4.24
10	Mean E2E latency (ms)	9225.24	9198.95

chatgpt-codex-connector · 2026-04-17T03:34:02Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-17T09:38:09Z

Fix looks correct. The tuple length check is defensive, which is good.

One question: when enforce_eager: false, what returns an OmniOutput tuple instead of an OmniOutput object? Is it torch.compile or graph mode? Adding a comment explaining the root cause would help future maintainers understand why this conversion is needed.

Also consider: could the check be stricter? For example, verify each tuple element type matches OmniOutput._field_types to catch mismatches earlier?

hsliuustc0106 · 2026-04-17T14:28:53Z

why the RTF is so big >1? which hardware are you using?

Sy0307 · 2026-04-18T08:17:25Z

Please verify generated audio examples's quality.

ChefWu551 · 2026-04-19T03:03:52Z

why the RTF is so big >1? which hardware are you using?

GPU: NVIDIA RTX 40 series

ChefWu551 · 2026-04-20T00:37:14Z

Please verify generated audio examples's quality.

Good advice! And I have test the quality, it seems no diffrence.
Here is use case


curl -X POST http://127.0.0.1:8899/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/model/ModelScope/Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    "input": "Once upon a time, in a small village, there lived a wise old owl. Every night, the owl would sit atop the tallest tree and share stories with the other animals. One stormy night, a lost little rabbit found its way to the tree. The owl, noticing the rabbit’s fear, invited it to listen to a tale of courage. As the storm raged on, the rabbit felt safe and warm. By morning, the rabbit had learned to face its fears and found the courage to return home. The owl’s stories had once again brought comfort and strength.",
    "task_type": "Base",
    "voice": "clone",
    "ref_audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav",
    "ref_text": "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you.",
    "response_format": "wav"
  }' --output output_1.wav

The result is :

When force_eager is false
output_force_eager_false.wav
When force_eager is true
output_force_eagler_true.wav

Sy0307 · 2026-04-20T02:55:17Z

Good verification. Can you compare the performance when enable eager mode or not. And also consider the order of merging this PR and #2910

ChefWu551 · 2026-04-20T03:46:40Z

Good verification. Can you compare the performance when enable eager mode or not. And also consider the order of merging this PR and #2910

Sure，I am working on this.

linyueqian · 2026-04-22T03:06:27Z

fix dco please. can you please update this pr before this friday? thanks.

ChefWu551 · 2026-04-23T01:33:36Z

fix dco please. can you please update this pr before this friday? thanks.

sure

ChefWu551 · 2026-04-23T02:30:19Z

Good verification. Can you compare the performance when enable eager mode or not. And also consider the order of merging this PR and #2910

server start command

vllm serve /model/ModelScope/Qwen/Qwen3-TTS-12Hz-1.7B-Base \
--omni \
--allowed-local-media-path /workspace \
--port 8899

before merge pr #2910：

Concurrency	Metric	`force_eager: false`	`force_eager: true`
1	Total duration (s)	19.36	19.07
1	Request throughput (req/s)	0.52	0.52
1	Audio throughput (audio duration/s)	2.18	2.21
1	Mean E2E latency (ms)	1935.29	1906.71
4	Total duration (s)	12.19	11.86
4	Request throughput (req/s)	0.82	0.84
4	Audio throughput (audio duration/s)	3.41	3.44
4	Mean E2E latency (ms)	4321.41	4251.28
10	Total duration (s)	10.34	10.19
10	Request throughput (req/s)	0.97	0.98
10	Audio throughput (audio duration/s)	4.01	4.24
10	Mean E2E latency (ms)	9225.24	9198.95

After merge pr #2910：

Concurrency	Metric	`force_eager: false`	`force_eager: true`
1	Total duration (s)	18.83	18.99
1	Request throughput (req/s)	0.53	0.53
1	Audio throughput (audio duration/s)	2.24	2.22
1	Mean E2E latency (ms)	1882.35	1899.04
4	Total duration (s)	11.96	12.48
4	Request throughput (req/s)	0.84	0.80
4	Audio throughput (audio duration/s)	3.61	3.37
4	Mean E2E latency (ms)	4152.56	4399.58
10	Total duration (s)	10.53	10.26
10	Request throughput (req/s)	0.95	0.97
10	Audio throughput (audio duration/s)	3.99	4.16
10	Mean E2E latency (ms)	9474.58	9241.43

Signed-off-by: wuyuefeng <565948592@qq.com>

ChefWu551 · 2026-04-23T02:43:15Z

fix dco please. can you please update this pr before this friday? thanks.

Thanks for the reminder. I have fixed the missing DCO sign-off and updated the PR branch.

linyueqian

LGTM

…m-project#2868) Signed-off-by: wuyuefeng <565948592@qq.com>

ChefWu551 requested a review from hsliuustc0106 as a code owner April 17, 2026 03:33

michael-chipmates mentioned this pull request Apr 19, 2026

[Qwen3TTS][Bugfix] Guard inner CUDA graph replay during outer capture #2910

Merged

4 tasks

Gaohan123 added this to the v0.20.0 milestone Apr 20, 2026

linyueqian added the ready label to trigger buildkite CI label Apr 22, 2026

[fix]: Qwen3-TTS code2wav fails when enforce_eager: false

16d91c6

Signed-off-by: wuyuefeng <565948592@qq.com>

ChefWu551 force-pushed the fix-code2wav-eager-force branch 2 times, most recently from bb1a036 to 16d91c6 Compare April 23, 2026 02:42

linyueqian approved these changes Apr 23, 2026

View reviewed changes

linyueqian merged commit c8efdbd into vllm-project:main Apr 23, 2026
8 checks passed

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[BugFix]: Fix Qwen3-TTS code2wav fails when enforce_eager: false (vll…

4a03b06

…m-project#2868) Signed-off-by: wuyuefeng <565948592@qq.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[BugFix]: Fix Qwen3-TTS code2wav fails when enforce_eager: false (vll…

e1e01d2

…m-project#2868) Signed-off-by: wuyuefeng <565948592@qq.com>

Conversation

ChefWu551 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

The accuracy

Performance (not as good as shown in the chart)

Uh oh!

chatgpt-codex-connector Bot commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

Sy0307 commented Apr 18, 2026

Uh oh!

ChefWu551 commented Apr 19, 2026

Uh oh!

ChefWu551 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sy0307 commented Apr 20, 2026

Uh oh!

ChefWu551 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian commented Apr 22, 2026

Uh oh!

ChefWu551 commented Apr 23, 2026

Uh oh!

ChefWu551 commented Apr 23, 2026

Uh oh!

ChefWu551 commented Apr 23, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ChefWu551 commented Apr 17, 2026 •

edited

Loading

ChefWu551 commented Apr 20, 2026 •

edited

Loading

ChefWu551 commented Apr 20, 2026 •

edited

Loading