Enable MiMo-Audio-7B end-to-end inference on Intel XPU by Liangyx2 · Pull Request #2983 · vllm-project/vllm-omni

Liangyx2 · 2026-04-21T07:44:34Z

PR Description

Motivation

MiMo-Audio (XiaomiMiMo) is a multi-modal audio model supporting TTS, voice cloning, audio transcription, and spoken dialogue. Currently it only runs on CUDA. This PR enables MiMo-Audio inference on XPU (Intel GPU) by adding platform-specific stage configs and fixing several CUDA-only code paths that prevented the model from loading and running on non-CUDA devices.

Technical Details

XPU stage config (mimo_audio.yaml): Added a 2-stage pipeline config (Stage 0: fused_thinker_talker for LLM + audio code generation, Stage 1: code2wav for waveform synthesis) with XPU-specific knobs (enforce_eager, disable_hybrid_kv_cache_manager, skip_mm_profiling, memory utilization tuning).
Guard CUDA-only APIs: Wrapped all torch.cuda.is_current_stream_capturing() calls in mimo_audio.py, mimo_audio_code2wav.py, and mimo_audio_llm.py with torch.cuda.is_available() and device.type == "cuda" checks, returning False on non-CUDA devices. This prevents runtime errors on XPU.
Fix device-hardcoded defaults: Removed torch.device(f"cuda:{torch.cuda.current_device()}") defaults in mimo_audio_llm.py's generate_audio_tokens / generate_audio_tokens_one_step methods, replacing them with local_embeds.device to be device-agnostic.
Fix multimodal processor: Added _hf_processor_applies_updates() -> False override in MiMoAudioLLMMultiModalProcessor so that vllm correctly applies prompt updates (audio placeholder expansion) instead of assuming the HF processor already did it.
Robustness improvements in end2end.py:
- Reference audio truncation to 8 seconds (MAX_REF_AUDIO_SAMPLES) to prevent model confusion (repetition / voice identity loss) with long clips.
- Truncation of code2wav input tokens to MAX_CODE2WAV_TOKENS=8192 in the stage input processor to prevent OOM.
- Skip invalid/empty audio outputs (< 10ms) instead of writing corrupt WAV files.
- Default --text to None so each query type uses its own sensible default.

Performance Impact

No regression on CUDA — all changes are guarded by device-type checks.
Enables single-GPU XPU inference for MiMo-Audio with enforce_eager=true and conservative memory settings (gpu_memory_utilization: 0.4 / 0.35).
Reference audio truncation and code2wav token truncation improve stability and prevent OOM on memory-constrained XPU devices.

Workload Mapping

Workload	Model	Platform	Config
MiMo-Audio TTS / Voice Cloning / Audio Understanding / Spoken Dialogue	`XiaomiMiMo/MiMo-Audio`	XPU (Intel GPU)	`mimo_audio.yaml` (2-stage: fused_thinker_talker + code2wav)

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

chatgpt-codex-connector · 2026-04-21T07:44:40Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

hsliuustc0106

NON-BLOCKING:

Test Coverage — XPU is experimental and CI does not run on XPU hardware. Since this PR adds device-type guards and XPU-specific configuration, please verify manually on XPU that:
1. The model loads successfully with the new mimo_audio.yaml config
2. Inference produces valid audio output for at least one query type (e.g., tts_sft)
3. No runtime errors from CUDA-specific APIs on XPU
Consider adding a note in the PR description confirming which XPU configuration was tested.

gcanlin · 2026-04-21T13:27:06Z

        num_reqs = len(request_ids)
-        is_capturing = torch.cuda.is_current_stream_capturing()
+        if torch.cuda.is_available() and input_ids.device.type == "cuda":
+            is_capturing = torch.cuda.is_current_stream_capturing()


Don't other platforms support torch.xxx.is_current_stream_capturing()?

gcanlin · 2026-04-21T13:28:16Z

@@ -0,0 +1,103 @@
+# XPU stage config for running MiMo-Audio with 2-stage architecture


We don't introduce the new stage configs. Please refer to #2383 and add a correct deploy config.

see: https://github.com/vllm-project/vllm-omni/blob/main/vllm_omni/deploy/mimo_audio.yaml

hsliuustc0106 · 2026-04-21T15:39:14Z

cc @qibaoyuan

qibaoyuan · 2026-04-22T00:59:29Z

Thanks! Could you help us test this incoming PR on XPU and report any issues you encounter?
#2183

Liangyx2 added 8 commits April 17, 2026 13:55

Add files via upload

4a4f0aa

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update end2end.py

9fa5ea7

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update mimo_audio.py

8842c8b

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update mimo_audio_code2wav.py

5f88f28

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update mimo_audio_llm.py

9ff290d

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update mimo_audio.py

aeff1fa

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Add files via upload

969995c

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Merge branch 'vllm-project:main' into MiMoAudio

f5f6a58

Liangyx2 requested a review from hsliuustc0106 as a code owner April 21, 2026 07:44

Liangyx2 added 3 commits April 21, 2026 15:49

fix pre-commit error

6de2fdb

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

fix pre-commit error

ccc1c3d

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update end2end.py

134e701

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

hsliuustc0106 reviewed Apr 21, 2026

View reviewed changes

gcanlin reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MiMo-Audio-7B end-to-end inference on Intel XPU#2983

Enable MiMo-Audio-7B end-to-end inference on Intel XPU#2983
Liangyx2 wants to merge 11 commits intovllm-project:mainfrom
Liangyx2:MiMoAudio

Liangyx2 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

gcanlin Apr 21, 2026

Uh oh!

gcanlin Apr 21, 2026

Uh oh!

qibaoyuan Apr 22, 2026

Uh oh!

hsliuustc0106 commented Apr 21, 2026

Uh oh!

qibaoyuan commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,103 @@
		# XPU stage config for running MiMo-Audio with 2-stage architecture

Conversation

Liangyx2 commented Apr 21, 2026

PR Description

Motivation

Technical Details

Performance Impact

Workload Mapping

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

qibaoyuan Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 21, 2026

Uh oh!

qibaoyuan commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants