Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3800,6 +3800,9 @@ void CutlassMoeFCRunner<T, WeightType, OutputType, InputType, BackBoneType, IsMX

if (!fused_prologue_result) {
TLLM_LOG_TRACE("Falling back to unfused prologue");
// Fix: zero-init permutation arrays before three-step fallback
// The three-step path may not populate all entries (e.g. tokens not matching local experts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment here is slightly inaccurate. The reason the three-step path may not populate all entries is not primarily due to "tokens not matching local experts" (those are already handled by the expert_id check in the finalize kernels), but rather due to the fact that blockExpertPrefixSumKernel only records the first occurrence of an expert for a given token, skipping any duplicate selections of the same expert in the token's top-k list.

      // The three-step path may not populate all entries if a token has duplicate expert selections

cudaMemsetAsync(unpermuted_row_to_permuted_row, -1, expanded_num_rows * sizeof(int), stream);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While initializing unpermuted_row_to_permuted_row to -1 prevents using uninitialized memory, it does not fully resolve the illegal memory access issue in all configurations.

Specifically, finalizeMoeRoutingKernel (used when enable_alltoall is false or ep_size == 1) does not check if the retrieved permuted row index is negative before using it as a pointer offset (see line 1733). If a token has duplicate expert selections, the three-step fallback only records the first occurrence due to the break in blockExpertPrefixSumKernel (line 564), leaving the entries for subsequent occurrences as -1. Accessing expanded_permuted_rows_v + (-1) * num_elems_in_padded_col will still result in an illegal memory access.

While finalizeMoeRoutingNoFillingKernel (used with all-to-all) happens to avoid this by only processing the first local expert per token, the standard finalizeMoeRoutingKernel remains vulnerable. You should add a check in the finalize kernels to skip entries where the permuted row index is negative.

Comment on lines +3803 to +3805
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

-1 sentinel needs a consumer-side guard before use.

Line 3805 initializes missing map entries to -1, but both finalize paths consume that map as an index without checking for negative values (Line 1728 and Line 1830). This can still dereference invalid rows for unmapped entries.

Proposed fix
--- a/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh
+++ b/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh
@@
-      int64_t const expanded_permuted_row = unpermuted_row_to_permuted_row[expanded_original_row];
+      int64_t const expanded_permuted_row = unpermuted_row_to_permuted_row[expanded_original_row];
+      if (expanded_permuted_row < 0) {
+        continue;
+      }
@@
-        int64_t const expanded_permuted_row_from_k_idx =
-            unpermuted_row_to_permuted_row[source_row + k_idx * num_rows];
+        int64_t const expanded_permuted_row_from_k_idx =
+            unpermuted_row_to_permuted_row[source_row + k_idx * num_rows];
+        if (expanded_permuted_row_from_k_idx < 0) {
+          continue;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh` around lines
3803 - 3805, The sentinel value -1 stored in unpermuted_row_to_permuted_row must
be checked by consumers before being used as an index; update the finalize code
paths that read this map (the two finalize/consume sites referenced) to guard
before dereferencing—e.g., read int perm = unpermuted_row_to_permuted_row[i]; if
(perm < 0) { /* skip / handle unmapped row */ } else { use perm as index }—so
unmapped entries are skipped or handled safely instead of being used as array
indices.

threeStepBuildExpertMapsSortFirstToken(
token_selected_experts, permuted_token_selected_experts_, permuted_row_to_unpermuted_row_,
unpermuted_row_to_permuted_row, expert_first_token_offset_, blocked_expert_counts_,
Expand Down
Loading