[LoRA] Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration#37193
Open
pratapyash wants to merge 6 commits intovllm-project:mainfrom
Open
[LoRA] Add LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration#37193pratapyash wants to merge 6 commits intovllm-project:mainfrom
pratapyash wants to merge 6 commits intovllm-project:mainfrom
Conversation
…tedLinear and quantization support - Replaced nn.Linear with ReplicatedLinear for conv_out, proj1, and proj2 layers to support quantization. - Added quant_config parameter to Qwen3OmniMoeAudioEncoder constructor. - Updated method calls to handle outputs from ReplicatedLinear layers. - Included SupportsLoRA in Qwen3OmniMoeThinkerForConditionalGeneration class.
…oEncoder - Removed ReplicatedLinear usage for conv_out, proj1, and proj2 layers. - Eliminated quant_config parameter from Qwen3OmniMoeAudioEncoder constructor. - Updated method calls to reflect changes in layer outputs.
… loading - Introduced lora_skip_prefixes to exclude audio_tower and visual modules from LoRA loading. - This change addresses the requirement for enable_tower_connector_lora, which is not yet supported.
Contributor
There was a problem hiding this comment.
Code Review
This pull request adds LoRA support for Qwen3OmniMoeThinkerForConditionalGeneration. The changes are minimal and well-contained, enabling LoRA by inheriting SupportsLoRA and defining the necessary attributes. The packed_modules_mapping is updated to support LoRA for attention layers while correctly excluding MLP layers for this MoE model. Additionally, lora_skip_prefixes is added to prevent errors when loading LoRA adapters that target the vision and audio towers. The changes are well-explained and appear correct.
Contributor
Author
|
Hi @DarkLight1337 @jeejeelee request you to review this PR |
Contributor
Author
|
Hi @DarkLight1337, @jeejeelee, @NickLucche following up again! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Add LoRA support for
Qwen3OmniMoeThinkerForConditionalGeneration.The vLLM
supported_models.mddocumentation marks this model with a LoRA checkmark under both "Multimodal Language Models" and "Speech-to-Text Language Models" tables, but the model class never implemented theSupportsLoRAprotocol. This means--enable-lorafails at runtime despite the documentation claiming support.This PR resolves that gap by adding the minimal required LoRA attributes to the model class in a single file.
FIX #31205
Related: #30461, PR #34097
Changes
SupportsLoRA-- enables--enable-lorafor this modelpacked_modules_mapping-- mapsqkv_projto[q_proj, k_proj, v_proj].gate_up_projis intentionally excluded because Qwen3-Omni uses MoE (FusedMoE) for FFN layers, not packed linear projectionsembedding_modules = {}-- required by theSupportsLoRAprotocol; empty because no embedding-layer LoRA is neededlora_skip_prefixes = ["audio_tower.", "visual."]-- gracefully skips audio/vision tower modules during LoRA loading. Without this, adapters trained with broadtarget_modules(e.g., regex matching all Linear layers) crash withValueErrorfromcheck_unexpected_moduleseven though the thinker modules are valid. Follows the same pattern as NemotronH (lora_skip_prefixes = ["mtp."])Why
gate_up_projis removed frompacked_modules_mappingQwen3-Omni is MoE -- FFN uses
FusedMoEwith per-expertgate_proj/up_proj/down_proj, not a packedgate_up_proj. The inherited mapping came fromQwen2_5OmniThinkerForConditionalGenerationwhich is a dense model.Qwen3MoeForCausalLM(the authoritative MoE reference) deliberately excludesgate_up_projand only adds it conditionally whenmlp_only_layersis non-empty. Qwen3-Omni hasmlp_only_layers: []. Keeping it is harmless (never matched) but misleading -- MoE expert LoRA is handled byFusedMoEWithLoRA, notpacked_modules_mapping.Test Plan
Public test adapter (random weights):
yashpratap/Qwen3-Omni-30B-A3B-LoRA-test-r32Server launch:
Inference:
Test Result
Tested on 4x L40S (48GB each).
Test 1: Enforce-eager mode -- PASS
/v1/modelsTest 2: CUDA graph compilation -- PASS
VLLM_COMPILE+PIECEWISEcudagraph modeTest 3: Mixed adapter (thinker + tower modules) -- PASS
lora_skip_prefixesTest 4: With
gate_up_projin mapping -- PASS (but removed anyway)gate_up_projinpacked_modules_mappinggate_up_projin the MoE architectureQwen3MoeForCausalLMwhich deliberately excludes it for MoE modelssupported_models.md)AI assistance was used in developing and testing this PR, per AGENTS.md.