Skip to content

[MoE Refactor] Combine MoERunnerBase + DefaultMoERunner#40560

Merged
robertgshaw2-redhat merged 6 commits intovllm-project:mainfrom
neuralmagic:collapse-runner
Apr 22, 2026
Merged

[MoE Refactor] Combine MoERunnerBase + DefaultMoERunner#40560
robertgshaw2-redhat merged 6 commits intovllm-project:mainfrom
neuralmagic:collapse-runner

Conversation

@bnellnm
Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm commented Apr 21, 2026

Purpose

Now that the chunking MoE runner has been deleted, we can have a single concrete MoERunner class.

  • Merge MoERunnerBase with DefaultMoERunner.

Test Plan

CI

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Bill Nell <bnell@redhat.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request consolidates MoE runner logic by migrating the implementation from the now-deleted MoERunnerBase into DefaultMoERunner and registering custom operations for MoE forward passes. The review feedback identifies several critical issues: the _replace_quant_method helper fails to update the cached _fused_output_is_reduced property, which can lead to inconsistent reduction logic; the scaling logic for FP16 incorrectly ignores the routed_scaling_factor when shared experts are absent; and an assertion in the shared expert reduction path could cause crashes when shared experts are not used.

Comment thread vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py Outdated
Comment thread vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py Outdated
Comment thread vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py Outdated
Signed-off-by: Bill Nell <bnell@redhat.com>
Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated
bnellnm added 2 commits April 21, 2026 22:45
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 22, 2026
bnellnm added 2 commits April 22, 2026 12:44
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) April 22, 2026 13:12
@robertgshaw2-redhat robertgshaw2-redhat merged commit 809d83c into vllm-project:main Apr 22, 2026
66 checks passed
@bnellnm bnellnm deleted the collapse-runner branch April 22, 2026 17:45
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 23, 2026
…#40560)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…#40560)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…#40560)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants