[EPLB] Support EPLB w/ NVFP4 by andrewbriand · Pull Request #29804 · vllm-project/vllm

andrewbriand · 2025-12-01T18:36:15Z

Purpose

Support EPLB in combination with NVFP4.

Test Plan

Added a test test_eplb_fused_moe_layer_dep_nvfp4.py which ensures that NVFP4 backends correctly route tokens to physical experts based on their logical expert ids.

Test Result

Tests pass on GB200.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector · 2025-12-01T18:36:23Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request adds support for Expert Parallel Load Balancing (EPLB) with NVFP4 quantization. The changes include a new test case for this functionality and modifications to ModelOptNvFp4FusedMoE to handle the EPLB path, along with a new kernel wrapper flashinfer_trtllm_fp4_routed_moe. The implementation is largely correct, but I've identified a critical issue where the routing method type is hardcoded in the new kernel wrapper. This would lead to incorrect behavior for MoE models that use different routing mechanisms. I have provided comments with suggestions to address this issue by dynamically determining the routing method.

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py

vllm/model_executor/layers/quantization/modelopt.py

github-actions · 2025-12-01T18:40:34Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Signed-off-by: Andrew Briand <abriand@nvidia.com>

JaheimLee · 2025-12-02T08:29:24Z

Does this support marlin kernel?

andrewbriand · 2025-12-02T18:08:37Z

Does this support marlin kernel?

Yes, this should work since Marlin accepts topk_ids from select_experts which will handle mapping of logical experts to physical experts:

vllm/vllm/model_executor/layers/quantization/modelopt.py

Line 1591 in e642217

topk_ids,

.

IwakuraRein · 2025-12-02T21:35:41Z

vllm/model_executor/layers/quantization/modelopt.py

+        ):
+            # Pack top k ids and expert weights into a single int32 tensor, as
+            # required by TRT-LLM
+            packed_tensor = (topk_ids.to(torch.int32) << 16) | topk_weights.to(


Maybe hide this packing operation in the flashinfer_trtllm_fp4_routed_moe. I.e., let flashinfer_trtllm_fp4_routed_moe take topk_ids and topk_weights directly, making its interface closer to Marlin’s.

Additionally, the packing will be removed in the flashinfer api in the near future so we can just pass topk_ids and topk_weights to flashinfer.

Signed-off-by: Andrew Briand <abriand@nvidia.com>

heheda12345 · 2025-12-03T18:20:48Z

CC @tlrmchlsmth

IwakuraRein

LGTM. Thanks for the contribution

…re comms Signed-off-by: Andrew Briand <abriand@nvidia.com>

Signed-off-by: Andrew Briand <abriand@nvidia.com>

mergify · 2025-12-10T00:47:55Z

Hi @andrewbriand, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Andrew Briand <abriand@nvidia.com>

mgoin · 2025-12-10T21:07:30Z

vllm/distributed/eplb/rebalance_execute.py

-                    weight[src],
+                    # Move to device in case the weights have been offloaded to CPU
+                    weight[src].to(torch.cuda.current_device()),


Can we submit this change separately? I don't see the need to prioritize supporting cpu offloading with eplb and this may have complications

Sure, I will revert this for now

abmfy

LGTM, thanks!

…GPU before comms" This reverts commit c3a7ea1. Signed-off-by: Andrew Briand <abriand@nvidia.com>

pavanimajety

LGTM, thanks for the PR!

Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com>

Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

andrewbriand requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 1, 2025 18:36

mergify bot added the nvidia label Dec 1, 2025

github-project-automation bot added this to NVIDIA Dec 1, 2025

gemini-code-assist bot reviewed Dec 1, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py Show resolved Hide resolved

vllm/model_executor/layers/quantization/modelopt.py Show resolved Hide resolved

Enable EPLB for NVFP4

e642217

Signed-off-by: Andrew Briand <abriand@nvidia.com>

andrewbriand force-pushed the abriand_eplb_nvfp4_2 branch from cf61fec to e642217 Compare December 1, 2025 19:13

IwakuraRein reviewed Dec 2, 2025

View reviewed changes

Andrew Briand added 2 commits December 2, 2025 17:32

Merge branch 'main' into abriand_eplb_nvfp4_2

b7f1160

Signed-off-by: Andrew Briand <abriand@nvidia.com>

Move packing inside of kernel wrapper

eae0006

Signed-off-by: Andrew Briand <abriand@nvidia.com>

jiahanc mentioned this pull request Dec 8, 2025

[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE #30014

Merged

5 tasks

IwakuraRein moved this to Ready in NVIDIA Dec 9, 2025

pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025

IwakuraRein approved these changes Dec 9, 2025

View reviewed changes

Andrew Briand added 2 commits December 9, 2025 16:29

Prepare EPLB code for offloading by moving weight tensors to GPU befo…

c3a7ea1

…re comms Signed-off-by: Andrew Briand <abriand@nvidia.com>

Merge branch 'main' into abriand_eplb_nvfp4_2

a5e9eca

Signed-off-by: Andrew Briand <abriand@nvidia.com>

Andrew Briand and others added 2 commits December 9, 2025 17:16

Typo and formatting

23fe727

Signed-off-by: Andrew Briand <abriand@nvidia.com>

Merge branch 'main' into abriand_eplb_nvfp4_2

93ce25e

mgoin reviewed Dec 10, 2025

View reviewed changes

abmfy approved these changes Dec 11, 2025

View reviewed changes

Revert "Prepare EPLB code for offloading by moving weight tensors to …

fa6f85a

…GPU before comms" This reverts commit c3a7ea1. Signed-off-by: Andrew Briand <abriand@nvidia.com>

andrewbriand requested a review from mgoin December 11, 2025 19:32

pavanimajety enabled auto-merge (squash) December 11, 2025 19:44

pavanimajety approved these changes Dec 11, 2025

View reviewed changes

github-project-automation bot moved this from Ready to In review in NVIDIA Dec 11, 2025

andrewbriand added 2 commits December 11, 2025 14:26

Merge branch 'main' into abriand_eplb_nvfp4_2

b793aca

Merge branch 'main' into abriand_eplb_nvfp4_2

80bcbd5

pavanimajety merged commit a00d889 into vllm-project:main Dec 11, 2025
59 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Dec 11, 2025

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025

[EPLB] Support EPLB w/ NVFP4 (vllm-project#29804)

ee061a0

Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com>

pavanimajety mentioned this pull request Dec 16, 2025

[Tracking Issue][Performance]: (G)B200/300 performance improvements #28883

Open

22 tasks

Uh oh!

Conversation

andrewbriand commented Dec 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

JaheimLee commented Dec 2, 2025

Uh oh!

andrewbriand commented Dec 2, 2025

Uh oh!

IwakuraRein Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewbriand Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 commented Dec 3, 2025

Uh oh!

IwakuraRein left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 10, 2025

Uh oh!

mgoin Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

andrewbriand Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

andrewbriand Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

pavanimajety left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

andrewbriand commented Dec 1, 2025 •

edited by github-actions bot

Loading

IwakuraRein Dec 2, 2025 •

edited

Loading