[LoRA] Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration by pratapyash · Pull Request #37193 · vllm-project/vllm

pratapyash · 2026-03-16T13:44:50Z

Purpose

Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration.

The vLLM supported_models.md documentation marks this model with a LoRA checkmark under both "Multimodal Language Models" and "Speech-to-Text Language Models" tables, but the model class never implemented the SupportsLoRA protocol. This means --enable-lora fails at runtime despite the documentation claiming support.

This PR resolves that gap by adding the minimal required LoRA attributes to the model class in a single file.

FIX #31205
Related: #30461, PR #34097

Changes

Import and inherit SupportsLoRA -- enables --enable-lora for this model
Define packed_modules_mapping -- maps qkv_proj to [q_proj, k_proj, v_proj]. gate_up_proj is intentionally excluded because Qwen3-Omni uses MoE (FusedMoE) for FFN layers, not packed linear projections
Define embedding_modules = {} -- required by the SupportsLoRA protocol; empty because no embedding-layer LoRA is needed
Define lora_skip_prefixes = ["audio_tower.", "visual."] -- gracefully skips audio/vision tower modules during LoRA loading. Without this, adapters trained with broad target_modules (e.g., regex matching all Linear layers) crash with ValueError from check_unexpected_modules even though the thinker modules are valid. Follows the same pattern as NemotronH (lora_skip_prefixes = ["mtp."])

Why `gate_up_proj` is removed from `packed_modules_mapping`

Qwen3-Omni is MoE -- FFN uses FusedMoE with per-expert gate_proj/up_proj/down_proj, not a packed gate_up_proj. The inherited mapping came from Qwen2_5OmniThinkerForConditionalGeneration which is a dense model. Qwen3MoeForCausalLM (the authoritative MoE reference) deliberately excludes gate_up_proj and only adds it conditionally when mlp_only_layers is non-empty. Qwen3-Omni has mlp_only_layers: []. Keeping it is harmless (never matched) but misleading -- MoE expert LoRA is handled by FusedMoEWithLoRA, not packed_modules_mapping.

Test Plan

Public test adapter (random weights): yashpratap/Qwen3-Omni-30B-A3B-LoRA-test-r32

Server launch:

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct \
  --tensor-parallel-size 4 \
  --enable-lora \
  --max-lora-rank 32 \
  --max-loras 2 \
  --lora-modules test=yashpratap/Qwen3-Omni-30B-A3B-LoRA-test-r32 \
  --enforce-eager \
  --gpu-memory-utilization 0.90 \
  --max-model-len 8192

Inference:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "test", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50}'

Test Result

Tested on 4x L40S (48GB each).

Test 1: Enforce-eager mode -- PASS

Base model and LoRA adapter both listed in /v1/models
Base model response: standard Qwen-Omni intro
LoRA adapter response: different output confirming adapter is applied
No errors or warnings

Test 2: CUDA graph compilation -- PASS

VLLM_COMPILE + PIECEWISE cudagraph mode
Both base model and LoRA inference work correctly

Test 3: Mixed adapter (thinker + tower modules) -- PASS

Adapter containing both thinker attention and audio/vision tower LoRA weights
Tower modules gracefully skipped via lora_skip_prefixes
Thinker modules loaded and applied correctly
Server starts, inference works

Test 4: With gate_up_proj in mapping -- PASS (but removed anyway)

Tested keeping gate_up_proj in packed_modules_mapping
No errors or crashes -- the mapping entry is never matched since no module is named gate_up_proj in the MoE architecture
Removed to stay consistent with Qwen3MoeForCausalLM which deliberately excludes it for MoE models

Purpose is clearly described
Test plan provided
Test results included
No documentation update needed (LoRA checkmark already present in supported_models.md)

AI assistance was used in developing and testing this PR, per AGENTS.md.

…tedLinear and quantization support - Replaced nn.Linear with ReplicatedLinear for conv_out, proj1, and proj2 layers to support quantization. - Added quant_config parameter to Qwen3OmniMoeAudioEncoder constructor. - Updated method calls to handle outputs from ReplicatedLinear layers. - Included SupportsLoRA in Qwen3OmniMoeThinkerForConditionalGeneration class.

…oEncoder - Removed ReplicatedLinear usage for conv_out, proj1, and proj2 layers. - Eliminated quant_config parameter from Qwen3OmniMoeAudioEncoder constructor. - Updated method calls to reflect changes in layer outputs.

… loading - Introduced lora_skip_prefixes to exclude audio_tower and visual modules from LoRA loading. - This change addresses the requirement for enable_tower_connector_lora, which is not yet supported.

gemini-code-assist

Code Review

This pull request adds LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration. The changes are minimal and well-contained, enabling LoRA by inheriting SupportsLoRA and defining the necessary attributes. The packed_modules_mapping is updated to support LoRA for attention layers while correctly excluding MLP layers for this MoE model. Additionally, lora_skip_prefixes is added to prevent errors when loading LoRA adapters that target the vision and audio towers. The changes are well-explained and appear correct.

pratapyash · 2026-03-16T17:48:18Z

Hi @DarkLight1337 @jeejeelee request you to review this PR

pratapyash · 2026-03-30T07:26:12Z

Hi @DarkLight1337, @jeejeelee, @NickLucche following up again!

pratapyash added 3 commits March 15, 2026 10:59

feat: add support for skipping audio/vision tower modules during LoRA…

2d11aa8

… loading - Introduced lora_skip_prefixes to exclude audio_tower and visual modules from LoRA loading. - This change addresses the requirement for enable_tower_connector_lora, which is not yet supported.

pratapyash requested a review from sighingnow as a code owner March 16, 2026 13:44

mergify bot added the qwen Related to Qwen models label Mar 16, 2026

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

Merge branch 'main' into add-lora-qwen3-omni-moe-thinker

02b6925

Merge branch 'main' into add-lora-qwen3-omni-moe-thinker

0b8decf

Merge branch 'main' into add-lora-qwen3-omni-moe-thinker

7a4b623

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoRA] Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration#37193

[LoRA] Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration#37193
pratapyash wants to merge 6 commits intovllm-project:mainfrom
pratapyash:add-lora-qwen3-omni-moe-thinker

pratapyash commented Mar 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

pratapyash commented Mar 16, 2026

Uh oh!

pratapyash commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pratapyash commented Mar 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Why gate_up_proj is removed from packed_modules_mapping

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

pratapyash commented Mar 16, 2026

Uh oh!

pratapyash commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pratapyash commented Mar 16, 2026 •

edited by github-actions bot

Loading

Why `gate_up_proj` is removed from `packed_modules_mapping`

pratapyash commented Mar 30, 2026 •

edited

Loading