Enable loading of fused expert weights in the Transformers modelling backend by hmellor · Pull Request #36997 · vllm-project/vllm

hmellor · 2026-03-13T18:20:18Z

This pull request introduces improvements to the handling and loading of fused Mixture-of-Experts (MoE) weights in the Transformers modelling backend:

Added explicit expert mapping for models saved with fused experts, ensuring compatibility with checkpoints released after Transformers v5 or re-saved with save_original_format=False. This mapping repurposes expert_id as shard_idx for deconcatenating w1 and w3 weights.
Unified the loading logic in FusedMoE.load_weights to handle both fused (3D tensor) and non-fused (single expert) weights, simplifying the process and reducing code duplication.

…backend Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

gemini-code-assist

Code Review

The pull request refactors the Mixture of Experts (MoE) weight loading mechanism to support both fused and non-fused expert weight formats. In vllm/model_executor/layers/fused_moe/layer.py, the load_weights method is updated to differentiate between 3D tensors representing fused expert weights and single expert weights, applying a unified loading logic that iterates through individual experts. Concurrently, vllm/model_executor/models/transformers/moe.py introduces an explicit expert_mapping for fused experts, ensuring proper identification and handling of these weight structures during model loading.

…backend (vllm-project#36997) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Siew's Capital Jarvis <brayden.stanley.0127@gmail.com>

…backend (vllm-project#36997) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Enable loading of fused expert weights in the Transformers modelling …

c62826c

…backend Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested review from mgoin and pavanimajety as code owners March 13, 2026 18:20

github-project-automation bot added this to Transformers backend Mar 13, 2026

github-project-automation bot moved this to Todo in Transformers backend Mar 13, 2026

hmellor mentioned this pull request Mar 13, 2026

Update to transformers v5 #30566

Merged

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

DarkLight1337 approved these changes Mar 14, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 14, 2026 04:21

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2026

DarkLight1337 merged commit ffa5d74 into vllm-project:main Mar 14, 2026
63 checks passed

github-project-automation bot moved this from Todo to Done in Transformers backend Mar 14, 2026

hmellor deleted the v5-style-moe-weights branch March 14, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable loading of fused expert weights in the Transformers modelling backend#36997

Enable loading of fused expert weights in the Transformers modelling backend#36997
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:v5-style-moe-weights

hmellor commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hmellor commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmellor commented Mar 13, 2026 •

edited by github-actions bot

Loading