Skip to content

Enable loading of fused expert weights in the Transformers modelling backend#36997

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:v5-style-moe-weights
Mar 14, 2026
Merged

Enable loading of fused expert weights in the Transformers modelling backend#36997
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:v5-style-moe-weights

Conversation

@hmellor
Copy link
Copy Markdown
Member

@hmellor hmellor commented Mar 13, 2026

This pull request introduces improvements to the handling and loading of fused Mixture-of-Experts (MoE) weights in the Transformers modelling backend:

  • Added explicit expert mapping for models saved with fused experts, ensuring compatibility with checkpoints released after Transformers v5 or re-saved with save_original_format=False. This mapping repurposes expert_id as shard_idx for deconcatenating w1 and w3 weights.
  • Unified the loading logic in FusedMoE.load_weights to handle both fused (3D tensor) and non-fused (single expert) weights, simplifying the process and reducing code duplication.

…backend

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the Mixture of Experts (MoE) weight loading mechanism to support both fused and non-fused expert weight formats. In vllm/model_executor/layers/fused_moe/layer.py, the load_weights method is updated to differentiate between 3D tensors representing fused expert weights and single expert weights, applying a unified loading logic that iterates through individual experts. Concurrently, vllm/model_executor/models/transformers/moe.py introduces an explicit expert_mapping for fused experts, ensuring proper identification and handling of these weight structures during model loading.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 14, 2026 04:21
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2026
@DarkLight1337 DarkLight1337 merged commit ffa5d74 into vllm-project:main Mar 14, 2026
63 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Transformers backend Mar 14, 2026
@hmellor hmellor deleted the v5-style-moe-weights branch March 14, 2026 08:22
siewcapital pushed a commit to siewcapital/vllm that referenced this pull request Mar 15, 2026
…backend (vllm-project#36997)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Siew's Capital Jarvis <brayden.stanley.0127@gmail.com>
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
…backend (vllm-project#36997)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…backend (vllm-project#36997)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…backend (vllm-project#36997)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…backend (vllm-project#36997)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants