[Fix]Fix one-sided MoE padding sentinel for local expert maps#42034
Open
Kevin-XiongC wants to merge 2 commits into
Open
[Fix]Fix one-sided MoE padding sentinel for local expert maps#42034Kevin-XiongC wants to merge 2 commits into
Kevin-XiongC wants to merge 2 commits into
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the FlashInferNVLinkOneSidedPrepareAndFinalize.prepare method to calculate a non-negative invalid_token_expert_id instead of using a hardcoded -1. This change ensures that padding tokens use an expert ID that is invalid for the local rank but non-negative, avoiding potential indexing issues with expert_map. Additionally, new unit tests were added to verify the correct calculation of this ID in different scenarios. I have no feedback to provide.
3c70b82 to
857b3d8
Compare
…t#64) Use a non-local valid expert id for one-sided dispatch padding when an expert map is present so downstream kernels that index expert_map before filtering do not see negative expert ids. Keep the out-of-range sentinel for paths without expert_map. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
857b3d8 to
634dca5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix FlashInfer NVLink one-sided MoE dispatch padding when
expert_mapis present.The one-sided all2all path pads each source rank up to
runtime_max_tokens_per_rank. It previously used-1as the invalid expert id. Some downstream kernels indexexpert_mapbefore filtering padded tokens, so-1can index the last expert-map entry and corrupt local-expert routing.Without this fix, the Marlin MoE path can see padded
-1expert ids after dispatch and trigger an IMA when those ids are used through the local expert map. When an expert map is present, this PR uses a valid global expert id that is non-local to the current EP rank instead. Paths withoutexpert_mapkeep using the out-of-rangenum_expertssentinel.Source-level rationale
The original padding value is unreasonable for the
expert_mappath because it violates the downstream kernel contract:FlashInferNVLinkOneSidedPrepareAndFinalize.prepare()sendstopk_idsas one of the all2all payloads and passesinvalid_token_expert_idtomoe_alltoall.dispatch(). This means padded tokens can be materialized into the receivedtopk_idstensor.expert_mapis a global-expert-id to local-expert-id table with shape[global_num_experts]; non-local experts are represented by values of-1inside the table. It is not designed to be indexed with a negative global expert id.preprocessTopkIdLauncher(..., expert_map_ptr, n_expert, ...)runs whenexpert_mapis present, andpreprocessTopkIdKerneldoeslocal_expert_idx = smem_expert_map[topk_id]for eachtopk_id.topk_id == -1is already invalid before later filtering can happen. It becomesexpert_map[-1]in the kernel rather than "a skipped token", which can read the wrong entry or hit an illegal memory access. This is the path where Marlin MoE can IMA.The replacement sentinel is a valid global expert id owned by another EP rank. It is safe to index into
expert_map, maps to-1for the local rank, and is then treated as non-local/skipped by the existing expert-map logic.AI assistance was used to prepare this PR.