Naive dispatch combine POC by robertgshaw2-redhat · Pull Request #31933 · vllm-project/vllm

robertgshaw2-redhat · 2026-01-08T00:14:45Z

Purpose

Fit naive dispatch/combine into mk abstraction
Helps to simplify the fused moe method

Test Plan

MODEL_BLOCK := "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8"
GPUS := "2"
PORT := "8000"

launch_dp_ep:
	chg run --gpus {{GPUS}} -- vllm serve {{MODEL_BLOCK}} -dp {{GPUS}} --enable-expert-parallel --port {{PORT}} --enforce-eager

eval_block:
	lm_eval \
		--model local-completions \
		--tasks gsm8k \
		--model_args "model={{MODEL_BLOCK}},base_url=http://localhost:{{PORT}}/v1/completions,num_concurrent=1000,tokenized_requests=False"

Test Result

local-completions (model=Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=1000,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8757|±  |0.0091|
|     |       |strict-match    |     5|exact_match|↑  |0.8931|±  |0.0085|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Robert Shaw <robshaw@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a proof-of-concept for a naive dispatch/combine mechanism for expert parallelism and refactors the dispatch method across various device communicators to use topk_weights and topk_ids. My review has identified a few critical issues in the new MoEPrepareAndFinalizeNaiveEP implementation that could lead to runtime errors or incorrect behavior. Additionally, there are some debugging log statements that should be removed before merging.

gemini-code-assist · 2026-01-08T00:16:38Z

vllm/model_executor/layers/fused_moe/prepare_finalize.py

+        a1, _, extra_tensors = get_ep_group().dispatch(
+            a1,
+            self.dummy_tensor, # router logits
+            is_sequence_parallel=False, # TODO?
+            extra_tensors=extra_tensors,
+        )


The call to get_ep_group().dispatch appears to use an outdated signature. The dispatch methods in vllm/distributed/device_communicators have been updated to accept topk_weights and topk_ids directly. However, the calling signature from GroupCoordinator in parallel_state.py (which is what get_ep_group() returns) seems to be unchanged, leading to a signature mismatch where a boolean is_sequence_parallel is passed as topk_ids (a tensor). This will likely cause a TypeError at runtime. This is a critical issue that needs to be addressed, likely by updating vllm/distributed/parallel_state.py and then adjusting this call accordingly.

vllm/model_executor/layers/fused_moe/prepare_finalize.py

gemini-code-assist · 2026-01-08T00:16:38Z

vllm/model_executor/layers/quantization/fp8.py

+        logger.info_once(f"{router_logits.shape=}")
        topk_weights, topk_ids = layer.select_experts(
            hidden_states=x,
            router_logits=router_logits,
        )
+        logger.info_once(f"{router_logits.shape=}")
+        logger.info_once(f"{topk_weights.shape=}")


These logging statements appear to be for debugging purposes and should be removed before merging the pull request. They can produce verbose and unnecessary logs in production.

topk_weights, topk_ids = layer.select_experts( hidden_states=x, router_logits=router_logits, )

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify · 2026-01-08T00:47:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat · 2026-01-08T01:42:18Z

vllm/model_executor/layers/fused_moe/prepare_finalize.py

+    def topk_indices_dtype(self) -> torch.dtype | None:
+        return None
+
+    def num_dispatchers(self) -> int:


what should this be?

For now, this can be the all2all_manager world size. If DP+TP is supported then it is divided by the tp world size.

why do we need to know num_dispatcher?

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify · 2026-01-08T03:51:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vkuzo · 2026-01-08T17:29:44Z

vllm/model_executor/layers/fused_moe/prepare_finalize.py

+
+        from vllm.platforms import current_platform
+
+        # The torch ops do not support fp8, so use an int8 view.


which ops are missing? too late for this PR, but we can improve fp8 coverage in PyTorch to make things better in the future

float8_e4m3nz

i.e. the usual one for nvidia gpus

bnellnm · 2026-01-08T19:49:12Z

Nice, this seems much simpler than I expected.

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

mergify · 2026-01-15T05:16:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

robertgshaw2-redhat · 2026-01-20T14:36:10Z

replaced by #32567

Robert Shaw added 3 commits January 7, 2026 19:01

stash

f8851e0

Signed-off-by: Robert Shaw <robshaw@redhat.com>

stash

085adf7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

update interface

a6b039d

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify bot added the nvidia label Jan 8, 2026

github-project-automation bot added this to NVIDIA Jan 8, 2026

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

stash

f8052ce

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify bot added the needs-rebase label Jan 8, 2026

Robert Shaw added 4 commits January 7, 2026 20:20

stash

13b619f

Signed-off-by: Robert Shaw <robshaw@redhat.com>

first correctness!

04bb010

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

b1320de

Signed-off-by: Robert Shaw <robshaw@redhat.com>

comments

4d47206

Signed-off-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat commented Jan 8, 2026

View reviewed changes

Robert Shaw and others added 2 commits January 7, 2026 20:42

updated

f86fad8

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Merge branch 'main' into naive-dispatch-combine

5601b95

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

mergify bot removed the needs-rebase label Jan 8, 2026

robertgshaw2-redhat mentioned this pull request Jan 8, 2026

[Feature]: Implement naive prepare/finalize class to replace naive dispatching in fused_moe/layer.py #30775

Closed

5 tasks

updateds

8c1a530

Signed-off-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat added this to MoE Refactor Jan 8, 2026

github-project-automation bot moved this to Backlog in MoE Refactor Jan 8, 2026

Robert Shaw added 3 commits January 7, 2026 21:33

nit changes

7d7d5a6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

support apply router weight on input

63357f7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

attempt to get everything working for llama scout modelopt flashinfer…

3886cfb

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify bot added the llama Related to Llama models label Jan 8, 2026

updated

2284b59

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify bot added the needs-rebase label Jan 8, 2026

vkuzo reviewed Jan 8, 2026

View reviewed changes

Merge branch 'main' into naive-dispatch-combine

5efe9ff

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

mergify bot removed the needs-rebase label Jan 13, 2026

mergify bot added the needs-rebase label Jan 15, 2026

robertgshaw2-redhat closed this Jan 20, 2026

github-project-automation bot moved this from Backlog to Done in MoE Refactor Jan 20, 2026

github-project-automation bot moved this to Done in NVIDIA Jan 20, 2026

robertgshaw2-redhat deleted the naive-dispatch-combine branch January 20, 2026 14:36


		from vllm.platforms import current_platform

		# The torch ops do not support fp8, so use an int8 view.

Uh oh!

Conversation

robertgshaw2-redhat commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

robertgshaw2-redhat Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

bnellnm Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

vkuzo Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnellnm commented Jan 8, 2026

Uh oh!

mergify bot commented Jan 15, 2026

Uh oh!

robertgshaw2-redhat commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robertgshaw2-redhat commented Jan 8, 2026 •

edited by github-actions bot

Loading

robertgshaw2-redhat Jan 8, 2026 •

edited

Loading