[MoE Refactor] Refactor ZeroExpertFusedMoE into new framework by bnellnm · Pull Request #35549 · vllm-project/vllm

bnellnm · 2026-02-27T21:35:11Z

Purpose

Remove the ZeroExpertFusedMoE class and move its functionality into FusedMoE, MoERunnerBase and the new ZeroExpertRouter classes.

based on #35326

cc @baonudesifeizhai, @OftenDream , @yzong-rh

Test Plan

Added new tests for zero experts.

Test Result

cc @yzong-rh

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-02-27T21:35:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request is a major and well-executed refactoring of the Mixture of Experts (MoE) implementation. It successfully removes the ZeroExpertFusedMoE class by introducing a more modular design with a new ZeroExpertRouter, a MoERunner abstraction, and a dedicated SharedExperts class. This significantly improves the structure and extensibility of the MoE framework. The changes are consistent across the codebase and are supported by a comprehensive new test suite for the zero-expert functionality. I've identified one critical issue in the new ChunkingMoERunner that could cause a crash when handling empty inputs, and I've provided a suggestion to fix it. Overall, this is an excellent refactoring effort.

baonudesifeizhai · 2026-03-15T03:38:57Z

they may more activity in sglang.....so ...

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify · 2026-04-06T03:00:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <bnell@redhat.com>

…roject#35549) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

…stream regressions in HPU worker, MoE router, and offloading tests (#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>

…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>

…roject#35549) Signed-off-by: Bill Nell <bnell@redhat.com>

…roject#35549) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

mergify Bot added the nvidia label Feb 27, 2026

mergify Bot added the needs-rebase label Feb 27, 2026

github-project-automation Bot added this to NVIDIA Feb 27, 2026

gemini-code-assist Bot reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/runner/chunking_moe_runner.py Outdated

bnellnm marked this pull request as ready for review March 2, 2026 00:03

bnellnm requested review from WoosukKwon, hmellor, jeejeelee, mgoin, pavanimajety, robertgshaw2-redhat, tjtanaa, tlrmchlsmth and yewentao256 as code owners March 2, 2026 00:03

bnellnm force-pushed the moe-runner-4 branch from 28e9058 to 1018497 Compare March 2, 2026 13:32

This was referenced Mar 2, 2026

[MoE Refactor] Remove SharedFusedMoE class #35782

Merged

[MoE Refactor] Move the shared/fused expert output sum into MoERunnerBase #35949

Merged

bnellnm force-pushed the moe-runner-4 branch from 1018497 to b36c3a5 Compare March 18, 2026 15:59

bnellnm added 10 commits March 18, 2026 16:48

initial MoERunner refactor

4aeabf2

Signed-off-by: Bill Nell <bnell@redhat.com>

fix lint

a4d3acb

Signed-off-by: Bill Nell <bnell@redhat.com>

rebase

5b7f133

Signed-off-by: Bill Nell <bnell@redhat.com>

rebase + remove dead code

fad7f33

Signed-off-by: Bill Nell <bnell@redhat.com>

wip

83c1863

Signed-off-by: Bill Nell <bnell@redhat.com>

fix

7c7953e

Signed-off-by: Bill Nell <bnell@redhat.com>

WIP DOUBLE CHECK THIS

d5b5805

Signed-off-by: Bill Nell <bnell@redhat.com>

wip more refactoring

42de827

Signed-off-by: Bill Nell <bnell@redhat.com>

wip

2e4ce00

Signed-off-by: Bill Nell <bnell@redhat.com>

SharedExperts wip

0144b8b

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify Bot removed the needs-rebase label Apr 4, 2026

mergify Bot added the needs-rebase label Apr 6, 2026

merge

5785a4e

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify Bot removed the needs-rebase label Apr 6, 2026

revert bogus changes

0405c75

Signed-off-by: Bill Nell <bnell@redhat.com>

robertgshaw2-redhat reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/evals/gsm8k/configs/moe-refactor/LongCat-Flash-Chat-FP8.yaml Outdated

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2026

bnellnm added 2 commits April 14, 2026 16:11

remove test

c2cc30f

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'origin/main' into moe-runner-4

8d96c38

bnellnm requested a review from robertgshaw2-redhat April 14, 2026 16:11

robertgshaw2-redhat approved these changes Apr 14, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Apr 14, 2026

robertgshaw2-redhat enabled auto-merge (squash) April 14, 2026 20:11

robertgshaw2-redhat disabled auto-merge April 14, 2026 20:11

robertgshaw2-redhat merged commit 19ec9a0 into vllm-project:main Apr 14, 2026
73 of 75 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 14, 2026

zxd1997066 pushed a commit to zxd1997066/vllm that referenced this pull request Apr 15, 2026

[MoE Refactor] Refactor ZeroExpertFusedMoE into new framework (vllm-p…

630c183

…roject#35549) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

ZhanqiuHu mentioned this pull request Apr 15, 2026

[CI Investigate 2026-04-15] Distributed Comm Ops: test_comm_ops.py failure (no test output) ZhanqiuHu/vllm-ci-watch#9

Open

pawel-olejniczak mentioned this pull request Apr 15, 2026

[FIX_FOR_VLLM_CUSTOM=fc701c80588c215f84af0b745edcf4d127e276bc] Fix upstream regressions in HPU worker, MoE router, and offloading tests vllm-project/vllm-gaudi#1354

Merged

bnellnm deleted the moe-runner-4 branch April 15, 2026 20:49

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[MoE Refactor] Refactor ZeroExpertFusedMoE into new framework (vllm-p…

ea8e5b7

…roject#35549) Signed-off-by: Bill Nell <bnell@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE Refactor] Refactor ZeroExpertFusedMoE into new framework#35549

[MoE Refactor] Refactor ZeroExpertFusedMoE into new framework#35549
robertgshaw2-redhat merged 93 commits intovllm-project:mainfrom
neuralmagic:moe-runner-4

bnellnm commented Feb 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Feb 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

baonudesifeizhai commented Mar 15, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bnellnm commented Feb 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Feb 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

baonudesifeizhai commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bnellnm commented Feb 27, 2026 •

edited by github-actions Bot

Loading

baonudesifeizhai commented Mar 15, 2026 •

edited

Loading