Skip to content

[Performance] Improve MiMo-Audio tokenizer decoding performance#2183

Open
qibaoyuan wants to merge 92 commits intovllm-project:mainfrom
qibaoyuan:tok_cg
Open

[Performance] Improve MiMo-Audio tokenizer decoding performance#2183
qibaoyuan wants to merge 92 commits intovllm-project:mainfrom
qibaoyuan:tok_cg

Conversation

@qibaoyuan
Copy link
Copy Markdown
Contributor

@qibaoyuan qibaoyuan commented Mar 25, 2026

Purpose

To improve the decoding capability of the audio tokenizer in the MiMo-Audio model, we focus on optimizing its efficiency, as it is frequently invoked in asynchronous scenarios. Improving its performance is therefore critical. Our approach leverages CUDA Graphs to accelerate execution.

Key changes include:

  • Attention.forward_fixed — Replaces flash_attn_varlen_func with F.scaled_dot_product_attention, operating on 3D tensors [B, L, D], thereby avoiding variable-length packing.
  • TransformerLayer.forward_fixed — Combines self_attn.forward_fixed with the feed-forward network (FFN).
  • CausalConvTranspose1d.forward_fixed — Applies transposed convolution directly on 3D tensors without using masked_select.
  • TransformerVocos.forward_fixed — Implements a mask-free forward path for the vocoder.
  • AudioDecoder.forward_fixed — Constructs the full decoder pipeline: dconv1 → transformer layers → dconv2 → vocoder.
  • MiMoAudioTokenizer.decode_fixed — Wraps the complete decoding process, including decode_vq, padding, and decoder.forward_fixed.

Test Plan

export MIMO_AUDIO_TOKENIZER_PATH="XiaomiMiMo/MiMo-Audio-Tokenizer"

python3 -u end2end.py \
--stage-configs-path ./vllm_omni/model_executor/stage_configs/mimo_audio.yaml  \
--model  "XiaomiMiMo/MiMo-Audio-7B-Instruct" \
--query-type tts_sft_with_audio \
--audio_path ./examples/offline_inference/mimo_audio/beijing.mp3 \
--text "我还知道东北有杀猪菜,是把猪血肠、五花肉、酸菜等放在一块炖的,味道很浓郁。"

Test Result

Request ID: 0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1, Text saved to ./output_audio/tts_sft_with_audio/0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.txt

Request ID: 0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1, Audio saved to ./output_audio/tts_sft_with_audio/0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.wav

0_3581f0d8-1ec1-4063-a223-72fa6a95b4a1.wav


Essential Elements of an Effective PR Description Checklist
  • [ x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ x] The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • [x ] The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

qibaoyuan and others added 30 commits March 6, 2026 15:30
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
# Conflicts:
#	vllm_omni/model_executor/models/mimo_audio/mimo_audio_code2wav.py
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 9, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I wonder what's the througput in high concurrency setting?

qibaoyuan and others added 3 commits April 20, 2026 09:05
@qibaoyuan
Copy link
Copy Markdown
Contributor Author

I wonder what's the througput in high concurrency setting?

Under a QPS of 30, we achieved an RTF of 0.910 and an inter-frame time of 0.861s using an H20 GPU with chunk_size set to 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants