Skip to content

Naive dispatch combine POC#31933

Closed
robertgshaw2-redhat wants to merge 16 commits intomainfrom
naive-dispatch-combine
Closed

Naive dispatch combine POC#31933
robertgshaw2-redhat wants to merge 16 commits intomainfrom
naive-dispatch-combine

Conversation

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jan 8, 2026

Purpose

  • Fit naive dispatch/combine into mk abstraction
  • Helps to simplify the fused moe method

Test Plan

MODEL_BLOCK := "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8"
GPUS := "2"
PORT := "8000"

launch_dp_ep:
	chg run --gpus {{GPUS}} -- vllm serve {{MODEL_BLOCK}} -dp {{GPUS}} --enable-expert-parallel --port {{PORT}} --enforce-eager

eval_block:
	lm_eval \
		--model local-completions \
		--tasks gsm8k \
		--model_args "model={{MODEL_BLOCK}},base_url=http://localhost:{{PORT}}/v1/completions,num_concurrent=1000,tokenized_requests=False"

Test Result

local-completions (model=Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=1000,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8757|±  |0.0091|
|     |       |strict-match    |     5|exact_match|↑  |0.8931|±  |0.0085|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Robert Shaw added 3 commits January 7, 2026 19:01
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a proof-of-concept for a naive dispatch/combine mechanism for expert parallelism and refactors the dispatch method across various device communicators to use topk_weights and topk_ids. My review has identified a few critical issues in the new MoEPrepareAndFinalizeNaiveEP implementation that could lead to runtime errors or incorrect behavior. Additionally, there are some debugging log statements that should be removed before merging.

Comment on lines +51 to +56
a1, _, extra_tensors = get_ep_group().dispatch(
a1,
self.dummy_tensor, # router logits
is_sequence_parallel=False, # TODO?
extra_tensors=extra_tensors,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The call to get_ep_group().dispatch appears to use an outdated signature. The dispatch methods in vllm/distributed/device_communicators have been updated to accept topk_weights and topk_ids directly. However, the calling signature from GroupCoordinator in parallel_state.py (which is what get_ep_group() returns) seems to be unchanged, leading to a signature mismatch where a boolean is_sequence_parallel is passed as topk_ids (a tensor). This will likely cause a TypeError at runtime. This is a critical issue that needs to be addressed, likely by updating vllm/distributed/parallel_state.py and then adjusting this call accordingly.

Comment on lines +1326 to +1332
logger.info_once(f"{router_logits.shape=}")
topk_weights, topk_ids = layer.select_experts(
hidden_states=x,
router_logits=router_logits,
)
logger.info_once(f"{router_logits.shape=}")
logger.info_once(f"{topk_weights.shape=}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These logging statements appear to be for debugging purposes and should be removed before merging the pull request. They can produce verbose and unnecessary logs in production.

        topk_weights, topk_ids = layer.select_experts(
            hidden_states=x,
            router_logits=router_logits,
        )

Signed-off-by: Robert Shaw <robshaw@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 8, 2026
Robert Shaw added 4 commits January 7, 2026 20:20
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
def topk_indices_dtype(self) -> torch.dtype | None:
return None

def num_dispatchers(self) -> int:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should this be?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this can be the all2all_manager world size. If DP+TP is supported then it is divided by the tp world size.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to know num_dispatcher?

Robert Shaw and others added 2 commits January 7, 2026 20:42
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Robert Shaw added 3 commits January 7, 2026 21:33
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
… cutlass

Signed-off-by: Robert Shaw <robshaw@redhat.com>
@mergify mergify bot added the llama Related to Llama models label Jan 8, 2026
Signed-off-by: Robert Shaw <robshaw@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 8, 2026

from vllm.platforms import current_platform

# The torch ops do not support fp8, so use an int8 view.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which ops are missing? too late for this PR, but we can improve fp8 coverage in PyTorch to make things better in the future

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float8_e4m3nz

Copy link
Copy Markdown
Collaborator Author

@robertgshaw2-redhat robertgshaw2-redhat Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. the usual one for nvidia gpus

@bnellnm
Copy link
Copy Markdown
Collaborator

bnellnm commented Jan 8, 2026

Nice, this seems much simpler than I expected.

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Jan 13, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 15, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 15, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator Author

replaced by #32567

@github-project-automation github-project-automation bot moved this from Backlog to Done in MoE Refactor Jan 20, 2026
@github-project-automation github-project-automation bot moved this to Done in NVIDIA Jan 20, 2026
@robertgshaw2-redhat robertgshaw2-redhat deleted the naive-dispatch-combine branch January 20, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models needs-rebase nvidia

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants