Revert "[MoE Refactor] Remove MoE DP chunking" (#39107) by vllm-agent · Pull Request #39853 · vllm-project/vllm

vllm-agent · 2026-04-15T02:08:14Z

Revert of #39107

This reverts commit e1e318a.

Reason: This PR is suspected of causing 1 new CI failure in build #61308:

Kernels FusedMoE Layer Test (2 H100s) — test_moe_layer[False-deepep_low_latency-2-1-True] crashed with SIGABRT

The PR modified tests/kernels/moe/test_moe_layer.py and multiple MoE layer/runner files (fused_moe/layer.py, fused_moe/config.py, fused_moe/runner/ etc.).

Original PR: #39107

Auto-generated by CI failure analyzer

This reverts commit e1e318a.

gemini-code-assist

Code Review

This pull request introduces ChunkingMoERunner, a wrapper that enables processing large MoE batches in smaller chunks to maintain lockstep across Data-Parallel ranks. It adds environment variables VLLM_MOE_DP_CHUNK_SIZE and VLLM_ENABLE_MOE_DP_CHUNK for configuration and implements a chunked_sizes context manager in DPMetadata to manage per-rank token counts. The changes also include refactoring test utilities and engine configuration to decouple scheduler defaults from parallel settings. Review feedback identifies critical bugs in the chunking logic related to zero-token scenarios, which could cause negative indexing or failed tensor copies during collective operations.

gemini-code-assist · 2026-04-15T02:10:24Z

+        assert out_slice.size(0) >= slice_size
+        out_slice = out_slice[:slice_size, :]
+        out_slice.copy_(orig_slice, non_blocking=True)


The copy_ operation will fail if orig_slice is empty, which occurs when num_tokens is 0 on a Data Parallel rank. In such cases, the workspace buffer should be zeroed out to provide a valid (though dummy) input for the collective kernels to maintain lockstep across all ranks.

assert out_slice.size(0) >= slice_size out_slice = out_slice[:slice_size, :] if orig_slice.numel() > 0: out_slice.copy_(orig_slice, non_blocking=True) else: out_slice.zero_() return out_slice

gemini-code-assist · 2026-04-15T02:10:24Z

+            chunk_start = min(chunk_start, num_tokens - 1)
+            chunk_end = min(chunk_end, num_tokens)


The clamping logic fails when num_tokens is 0 because num_tokens - 1 becomes -1, leading to negative indices and size mismatches during tensor copying. When num_tokens is 0, the runner should still provide a single dummy token (consistent with _compute_chunked_local_num_tokens in forward_context.py) to ensure all DP ranks participate in collective operations.

# clamp start and end if num_tokens > 0: chunk_start = min(chunk_start, num_tokens - 1) chunk_end = min(chunk_end, num_tokens) # If the chunk is entirely padding for this rank if chunk_start >= chunk_end: chunk_start = num_tokens - 1 chunk_end = num_tokens else: chunk_start = 0 chunk_end = 1

gemini-code-assist · 2026-04-15T02:10:24Z

+
+                # Store outputs
+                # TODO(bnell): document when chunk_start >= num_tokens
+                if chunk_start < num_tokens:


The copy-back condition should use the original chunk start index (chunk_start_) rather than the clamped chunk_start. If chunk_start_ >= num_tokens, the chunk is purely padding for this rank, and its result should not be copied back into the output tensor (which would be redundant or even crash if num_tokens is 0).

Suggested change

if chunk_start < num_tokens:

if chunk_start_ < num_tokens:

Revert "[MoE Refactor] Remove MoE DP chunking (vllm-project#39107)"

dc45855

This reverts commit e1e318a.

mergify Bot added ci/build nvidia labels Apr 15, 2026

github-project-automation Bot added this to NVIDIA Apr 15, 2026

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[MoE Refactor] Remove MoE DP chunking" (#39107)#39853

Revert "[MoE Refactor] Remove MoE DP chunking" (#39107)#39853
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-39107

vllm-agent commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		chunk_start = min(chunk_start, num_tokens - 1)
		chunk_end = min(chunk_end, num_tokens)

Uh oh!

Conversation

vllm-agent commented Apr 15, 2026

Revert of #39107

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant