[Performance] Improve MiMo-Audio tokenizer decoding performance by qibaoyuan · Pull Request #2183 · vllm-project/vllm-omni

qibaoyuan · 2026-03-25T11:50:09Z

Purpose

To improve the decoding capability of the audio tokenizer in the MiMo-Audio model, we focus on optimizing its efficiency, as it is frequently invoked in asynchronous scenarios. Improving its performance is therefore critical. Our approach leverages CUDA Graphs to accelerate execution.

Key changes include:

Attention.forward_fixed — Replaces flash_attn_varlen_func with F.scaled_dot_product_attention, operating on 3D tensors [B, L, D], thereby avoiding variable-length packing.
TransformerLayer.forward_fixed — Combines self_attn.forward_fixed with the feed-forward network (FFN).
CausalConvTranspose1d.forward_fixed — Applies transposed convolution directly on 3D tensors without using masked_select.
TransformerVocos.forward_fixed — Implements a mask-free forward path for the vocoder.
AudioDecoder.forward_fixed — Constructs the full decoder pipeline: dconv1 → transformer layers → dconv2 → vocoder.
MiMoAudioTokenizer.decode_fixed — Wraps the complete decoding process, including decode_vq, padding, and decoder.forward_fixed.

Test Plan

export MIMO_AUDIO_TOKENIZER_PATH="XiaomiMiMo/MiMo-Audio-Tokenizer"

python3 -u end2end.py \
--stage-configs-path ./vllm_omni/model_executor/stage_configs/mimo_audio.yaml  \
--model  "XiaomiMiMo/MiMo-Audio-7B-Instruct" \
--query-type tts_sft_with_audio \
--audio_path ./examples/offline_inference/mimo_audio/beijing.mp3 \
--text "我还知道东北有杀猪菜，是把猪血肠、五花肉、酸菜等放在一块炖的，味道很浓郁。"

Test Result

Request ID: 0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1, Text saved to ./output_audio/tts_sft_with_audio/0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.txt

Request ID: 0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1, Audio saved to ./output_audio/tts_sft_with_audio/0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.wav

0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.wav

Essential Elements of an Effective PR Description Checklist

[ x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ x] The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
[x ] The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

# Conflicts: # vllm_omni/model_executor/models/mimo_audio/mimo_audio_code2wav.py

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

hsliuustc0106 · 2026-04-17T11:20:34Z

I wonder what's the througput in high concurrency setting?

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

qibaoyuan · 2026-04-21T03:19:37Z

I wonder what's the througput in high concurrency setting?

Under a QPS of 30, we achieved an RTF of 0.910 and an inter-frame time of 0.861s using an H20 GPU with chunk_size set to 3.

qibaoyuan and others added 30 commits March 6, 2026 15:30

[mimo-audio] tok example

0b5ed57

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge branch 'vllm-project:main' into tok_cg

09e17eb

[mimo-audio] example

e10ea18

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge branch 'vllm-project:main' into tok_cg

2c4e68c

[mimo-audio] example

80d4f24

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge branch 'vllm-project:main' into tok_cg

bd4ed9d

Merge branch 'vllm-project:main' into tok_cg

b55ac59

Merge branch 'vllm-project:main' into tok_cg

8140d2d

Merge branch 'vllm-project:main' into tok_cg

8b022b3

Merge branch 'vllm-project:main' into tok_cg

4deee49

Merge branch 'vllm-project:main' into tok_cg

6124bf8

Merge branch 'vllm-project:main' into tok_cg

02156e5

Merge branch 'vllm-project:main' into tok_cg

c77f442

Merge branch 'vllm-project:main' into tok_cg

9dbb293

Merge branch 'vllm-project:main' into tok_cg

6964957

Merge branch 'vllm-project:main' into tok_cg

0109fd1

Merge branch 'vllm-project:main' into tok_cg

e3700d0

Merge branch 'vllm-project:main' into tok_cg

2439724

Merge branch 'vllm-project:main' into tok_cg

be6206f

Merge branch 'vllm-project:main' into tok_cg

1c1ff70

Merge branch 'vllm-project:main' into tok_cg

33efe81

Merge branch 'vllm-project:main' into tok_cg

05ef764

Merge branch 'vllm-project:main' into tok_cg

24f85e0

Merge branch 'vllm-project:main' into tok_cg

57da820

Merge remote-tracking branch 'origin/main' into tok_cg

f49a6b8

# Conflicts: # vllm_omni/model_executor/models/mimo_audio/mimo_audio_code2wav.py

Merge remote-tracking branch 'origin/tok_cg' into tok_cg

cd06d99

[mimo-audio] revert

b10c4e0

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

[mimo-audio] cg refit

f2dd06b

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

[mimo-audio] streaming decode

0c41a3e

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge branch 'vllm-project:main' into tok_cg

1b85225

qibaoyuan added 2 commits April 9, 2026 08:44

Merge branch 'vllm-project:main' into tok_cg

3c677ef

Merge branch 'vllm-project:main' into tok_cg

b3ad336

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 9, 2026

qibaoyuan and others added 11 commits April 13, 2026 10:02

Merge branch 'vllm-project:main' into tok_cg

a1fd7ec

Merge branch 'vllm-project:main' into tok_cg

61d58bc

Merge branch 'main' into tok_cg

144d8a2

Merge branch 'main' into tok_cg

b366b3d

[mimo-audio] bugfix for ci

6a0c4f5

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge branch 'vllm-project:main' into tok_cg

b482c9a

Merge branch 'main' into tok_cg

d75f9a3

Merge branch 'vllm-project:main' into tok_cg

f858dcb

Merge branch 'vllm-project:main' into tok_cg

94c51c2

Merge branch 'vllm-project:main' into tok_cg

5d6e7e0

Merge branch 'vllm-project:main' into tok_cg

2c7f165

qibaoyuan force-pushed the tok_cg branch from 056c4c6 to bfce6ca Compare April 20, 2026 01:04

qibaoyuan and others added 3 commits April 20, 2026 09:05

Merge branch 'vllm-project:main' into tok_cg

92599c1

Merge branch 'vllm-project:main' into tok_cg

3c8b11a

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

Merge remote-tracking branch 'origin/tok_cg' into tok_cg

4adc637

Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>

qibaoyuan force-pushed the tok_cg branch from bfce6ca to 4adc637 Compare April 20, 2026 01:06

qibaoyuan added 2 commits April 20, 2026 16:20

Merge branch 'vllm-project:main' into tok_cg

c420161

Merge branch 'vllm-project:main' into tok_cg

420977c

Merge branch 'main' into tok_cg

728efe9

qibaoyuan mentioned this pull request Apr 22, 2026

Enable MiMo-Audio-7B end-to-end inference on Intel XPU #2983

Open

qibaoyuan added 5 commits April 22, 2026 16:23

Merge branch 'vllm-project:main' into tok_cg

3f0fa57

Merge branch 'vllm-project:main' into tok_cg

623f567

Merge branch 'vllm-project:main' into tok_cg

2dd6b0c

Merge branch 'vllm-project:main' into tok_cg

2c210e6

Merge branch 'vllm-project:main' into tok_cg

81beff6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Improve MiMo-Audio tokenizer decoding performance#2183

[Performance] Improve MiMo-Audio tokenizer decoding performance#2183
qibaoyuan wants to merge 92 commits intovllm-project:mainfrom
qibaoyuan:tok_cg

qibaoyuan commented Mar 25, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

qibaoyuan commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qibaoyuan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

qibaoyuan commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qibaoyuan commented Mar 25, 2026 •

edited

Loading