[Fix]Fix one-sided MoE padding sentinel for local expert maps by Kevin-XiongC · Pull Request #42034 · vllm-project/vllm

Kevin-XiongC · 2026-05-08T06:48:02Z

Summary

Fix FlashInfer NVLink one-sided MoE dispatch padding when expert_map is present.

The one-sided all2all path pads each source rank up to runtime_max_tokens_per_rank. It previously used -1 as the invalid expert id. Some downstream kernels index expert_map before filtering padded tokens, so -1 can index the last expert-map entry and corrupt local-expert routing.

Without this fix, the Marlin MoE path can see padded -1 expert ids after dispatch and trigger an IMA when those ids are used through the local expert map. When an expert map is present, this PR uses a valid global expert id that is non-local to the current EP rank instead. Paths without expert_map keep using the out-of-range num_experts sentinel.

Source-level rationale

The original padding value is unreasonable for the expert_map path because it violates the downstream kernel contract:

FlashInferNVLinkOneSidedPrepareAndFinalize.prepare() sends topk_ids as one of the all2all payloads and passes invalid_token_expert_id to moe_alltoall.dispatch(). This means padded tokens can be materialized into the received topk_ids tensor.
EP expert_map is a global-expert-id to local-expert-id table with shape [global_num_experts]; non-local experts are represented by values of -1 inside the table. It is not designed to be indexed with a negative global expert id.
The CUDA MoE permute preprocess path used by expert-map-aware MoE kernels indexes the map before skipped/non-local tokens are filtered: preprocessTopkIdLauncher(..., expert_map_ptr, n_expert, ...) runs when expert_map is present, and preprocessTopkIdKernel does local_expert_idx = smem_expert_map[topk_id] for each topk_id.
Therefore, a padded topk_id == -1 is already invalid before later filtering can happen. It becomes expert_map[-1] in the kernel rather than "a skipped token", which can read the wrong entry or hit an illegal memory access. This is the path where Marlin MoE can IMA.

The replacement sentinel is a valid global expert id owned by another EP rank. It is safe to index into expert_map, maps to -1 for the local rank, and is then treated as non-local/skipped by the existing expert-map logic.

AI assistance was used to prepare this PR.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the FlashInferNVLinkOneSidedPrepareAndFinalize.prepare method to calculate a non-negative invalid_token_expert_id instead of using a hardcoded -1. This change ensures that padding tokens use an expert ID that is invalid for the local rank but non-negative, avoiding potential indexing issues with expert_map. Additionally, new unit tests were added to verify the correct calculation of this ID in different scenarios. I have no feedback to provide.

…t#64) Use a non-local valid expert id for one-sided dispatch padding when an expert map is present so downstream kernels that index expert_map before filtering do not see negative expert ids. Keep the out-of-range sentinel for paths without expert_map. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Kevin-XiongC <kevin_xiong1997@outlook.com>

Kevin-XiongC requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners May 8, 2026 06:48

claude Bot reviewed May 8, 2026

View reviewed changes

mergify Bot added the nvidia label May 8, 2026

github-project-automation Bot added this to NVIDIA May 8, 2026

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Kevin-XiongC force-pushed the fix-flashinfer-one-sided-padding-sentinel branch from 3c70b82 to 857b3d8 Compare May 8, 2026 07:13

Kevin-XiongC changed the title ~~Fix one-sided MoE padding sentinel for local expert maps~~ [Fix]Fix one-sided MoE padding sentinel for local expert maps May 8, 2026

Kevin-XiongC force-pushed the fix-flashinfer-one-sided-padding-sentinel branch from 857b3d8 to 634dca5 Compare May 8, 2026 07:29

Delete tests/kernels/moe/test_flashinfer_nvlink_one_sided.py

4365528

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix]Fix one-sided MoE padding sentinel for local expert maps#42034

[Fix]Fix one-sided MoE padding sentinel for local expert maps#42034
Kevin-XiongC wants to merge 2 commits into
vllm-project:mainfrom
Kevin-XiongC:fix-flashinfer-one-sided-padding-sentinel

Kevin-XiongC commented May 8, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kevin-XiongC commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Source-level rationale

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kevin-XiongC commented May 8, 2026 •

edited

Loading