[Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManager Bug by River12 · Pull Request #32705 · vllm-project/vllm

River12 · 2026-01-20T17:49:58Z

Summary: For MoE model DP2TP2, the responses from the 2nd DP group are wrong, when using NaiveAll2AllManager because the broadcast operation is used in an incorrect dist_group.

Test Plan:
Test DP2TP2 with VLLM_ALL2ALL_BACKEND="naive" on MoE mdoel, and the below testing script is modified from examples/offline_inference/torchrun_dp_example.py

Input the same prompt to 2 DP groups
Use default MoE model microsoft/Phi-mini-MoE-instruct

import argparse

from vllm import LLM, SamplingParams


def parse_args():
    parser = argparse.ArgumentParser(
        description="Data-parallel inference with torchrun"
    )
    parser.add_argument(
        "--tp-size",
        type=int,
        default=1,
        help="Tensor parallel size (default: 1)",
    )
    parser.add_argument(
        "--pp-size",
        type=int,
        default=1,
        help="Pipeline parallel size (default: 1)",
    )
    parser.add_argument(
        "--dp-size",
        type=int,
        default=2,
        help="Data parallel size (default: 2)",
    )
    parser.add_argument(
        "--enable-ep",
        action="store_true",
        help="Enable expert parallel (default: False)",
    )
    parser.add_argument(
        "--model",
        type=str,
        default="microsoft/Phi-mini-MoE-instruct",
        help="Model name or path (default: microsoft/Phi-mini-MoE-instruct)",
    )
    parser.add_argument(
        "--max-model-len",
        type=int,
        default=4096,
        help="Maximum model length (default: 4096)",
    )
    parser.add_argument(
        "--gpu-memory-utilization",
        type=float,
        default=0.6,
        help="GPU memory utilization (default: 0.6)",
    )
    parser.add_argument(
        "--seed",
        type=int,
        default=1,
        help="Random seed (default: 1)",
    )
    return parser.parse_args()


args = parse_args()


# Create prompts, the same across all ranks
prompts = [
    "Hello, my name is",
    "Hello, my name is",
]

# Create sampling parameters, the same across all ranks
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)

# Use `distributed_executor_backend="external_launcher"` so that
# this llm engine/instance only creates one worker.
# it is important to set an explicit seed to make sure that
# all ranks have the same random seed, so that sampling can be
# deterministic across ranks.
llm = LLM(
    model=args.model,
    tensor_parallel_size=args.tp_size,
    data_parallel_size=args.dp_size,
    pipeline_parallel_size=args.pp_size,
    enable_expert_parallel=args.enable_ep,
    distributed_executor_backend="external_launcher",
    max_model_len=args.max_model_len,
    gpu_memory_utilization=args.gpu_memory_utilization,
    seed=args.seed,
)

dp_rank = llm.llm_engine.vllm_config.parallel_config.data_parallel_rank
dp_size = llm.llm_engine.vllm_config.parallel_config.data_parallel_size

prompts = [
    f"{idx}.{prompt}" for idx, prompt in enumerate(prompts) if idx % dp_size == dp_rank
]

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(
        f"DP Rank: {dp_rank} Prompt: {prompt!r}\nGenerated text: {generated_text!r}\n"
    )

Running command

FLASHINFER_DISABLE_VERSION_CHECK=1 VLLM_ALL2ALL_BACKEND="naive" \
torchrun --nproc-per-node=4 examples/offline_inference/torchrun_dp_example.py \
    --tp-size=2 --dp-size=2

Log before fix, the responses from the 2nd DP group are wrong:

DP Rank: 0 Prompt: '0.Hello, my name is'
Generated text: ' 0.Hello, my name is 0.Hello, my name is'

DP Rank: 1 Prompt: '1.Hello, my name is'
Generated text: 'aaaa st sample task SS field Story notion snapshot Reyn final moment Reyn Ku Ent dead'

DP Rank: 0 Prompt: '0.Hello, my name is'
Generated text: ' 0.Hello, my name is 0.Hello, my name is'

DP Rank: 1 Prompt: '1.Hello, my name is'
Generated text: 'aaaa st sample task SS field Story notion snapshot Reyn final moment Reyn Ku Ent dead'

Log after fix:

DP Rank: 0 Prompt: '0.Hello, my name is'
Generated text: ' John.\n\n### Instruction 2 (Much more difficult with'

DP Rank: 1 Prompt: '1.Hello, my name is'
Generated text: ' John.\n2.I am a software developer.\n3.I love'

DP Rank: 1 Prompt: '1.Hello, my name is'
Generated text: ' John.\n2.I am a software developer.\n3.I love'

DP Rank: 0 Prompt: '0.Hello, my name is'
Generated text: ' John.\n\n### Instruction 2 (Much more difficult with'

Differential Revision: D91016491

gemini-code-assist

Code Review

The pull request effectively addresses a bug in the NaiveAll2AllManager where the broadcast operation was using an incorrect distributed group for MoE models with DP2TP2 configuration. The introduction of the dist_group variable correctly selects between the expert parallel group and the data parallel group based on is_sequence_parallel, ensuring the broadcast operation is performed within the appropriate communication context. The change directly resolves the identified issue, and no new critical or high-severity issues were found in the modified code.

sarckk · 2026-01-27T19:47:08Z

@River12 could you add a test plan?

cc: @tlrmchlsmth / @mgoin would you be able to help review this change?

River12 · 2026-01-27T22:17:01Z

@River12 could you add a test plan?

cc: @tlrmchlsmth / @mgoin would you be able to help review this change?

@sarckk Thanks, the test plan has been detailed . cc @tlrmchlsmth , @mgoin

tlrmchlsmth

Thanks for the fix!
Two questions:

Does the same thing happen with VLLM_ALL2ALL_BACKEND="allgather_reducescatter"
Seems like this could have been introduced in #32567 - could you confirm if that seems right?

robertgshaw2-redhat · 2026-01-30T22:31:17Z

Thanks for the fix! Two questions:

Does the same thing happen with VLLM_ALL2ALL_BACKEND="allgather_reducescatter"

Seems like this could have been introduced in [MoE Refactor] Integrate Naive Prepare Finalize into MK #32567 - could you confirm if that seems right?

Looked into it. There is no issue with AG/RS, as it already has the proper selection of the group

https://github.com/River12/vllm-project/blob/fc0744f1d1ca48fee6f70753e90cf635233be0bb/vllm/distributed/device_communicators/all2all.py#L199

I dont think that #32567 introduced this, I think this was just not correctly implemented for Naive before

That being said, we should probably deprecate naive. Im not sure the value of it now that we have AG/RS

River12 · 2026-01-30T22:41:53Z

Thanks for the fix! Two questions:

Does the same thing happen with VLLM_ALL2ALL_BACKEND="allgather_reducescatter"

Seems like this could have been introduced in [MoE Refactor] Integrate Naive Prepare Finalize into MK #32567 - could you confirm if that seems right?

Looked into it. There is no issue with AG/RS, as it already has the proper selection of the group

https://github.com/River12/vllm-project/blob/fc0744f1d1ca48fee6f70753e90cf635233be0bb/vllm/distributed/device_communicators/all2all.py#L199

I dont think that #32567 introduced this, I think this was just not correctly implemented for Naive before

That being said, we should probably deprecate naive. Im not sure the value of it now that we have AG/RS

Thanks for reviews.

Confirmed that the same thing does not happen with VLLM_ALL2ALL_BACKEND="allgather_reducescatter", as it is able to select the correct dist_group in dispatch/combine operations for DP2TP2. But the VLLM_ALL2ALL_BACKEND="naive" always choose ep_group in stead of dp_group even in the DP2TP2 setup, leading to the 1st DP group broadcast tensor to the other DP groups (not expected). Then only the 1st DP group can generate correct responses, and the other DP groups generate garbage.
Agree that the . [MoE Refactor] Integrate Naive Prepare Finalize into MK #32567 did not introduce this as it did not touch the 'naive_multicast`

Summary: For MoE model DP2TP2, the two DP groups produce different responses when using NaiveAll2AllManager because the broadcast operation is used in an incorrect dist_group. Signed-off-by: Dezhan Tu <dztu@meta.com> Test Plan: Test DP2TP2 with VLLM_ALL2ALL_BACKEND="naive" on MoE mdoel, and the below testing script is modified from `examples/offline_inference/torchrun_dp_example.py` - Input the same prompt to 2 DP groups - Use default MoE model `microsoft/Phi-mini-MoE-instruct` ``` import argparse from vllm import LLM, SamplingParams def parse_args(): parser = argparse.ArgumentParser( description="Data-parallel inference with torchrun" ) parser.add_argument( "--tp-size", type=int, default=1, help="Tensor parallel size (default: 1)", ) parser.add_argument( "--pp-size", type=int, default=1, help="Pipeline parallel size (default: 1)", ) parser.add_argument( "--dp-size", type=int, default=2, help="Data parallel size (default: 2)", ) parser.add_argument( "--enable-ep", action="store_true", help="Enable expert parallel (default: False)", ) parser.add_argument( "--model", type=str, default="microsoft/Phi-mini-MoE-instruct", help="Model name or path (default: microsoft/Phi-mini-MoE-instruct)", ) parser.add_argument( "--max-model-len", type=int, default=4096, help="Maximum model length (default: 4096)", ) parser.add_argument( "--gpu-memory-utilization", type=float, default=0.6, help="GPU memory utilization (default: 0.6)", ) parser.add_argument( "--seed", type=int, default=1, help="Random seed (default: 1)", ) return parser.parse_args() args = parse_args() # Create prompts, the same across all ranks prompts = [ "Hello, my name is", "Hello, my name is", ] # Create sampling parameters, the same across all ranks sampling_params = SamplingParams(temperature=0.0, top_p=1.0) # Use `distributed_executor_backend="external_launcher"` so that # this llm engine/instance only creates one worker. # it is important to set an explicit seed to make sure that # all ranks have the same random seed, so that sampling can be # deterministic across ranks. llm = LLM( model=args.model, tensor_parallel_size=args.tp_size, data_parallel_size=args.dp_size, pipeline_parallel_size=args.pp_size, enable_expert_parallel=args.enable_ep, distributed_executor_backend="external_launcher", max_model_len=args.max_model_len, gpu_memory_utilization=args.gpu_memory_utilization, seed=args.seed, ) dp_rank = llm.llm_engine.vllm_config.parallel_config.data_parallel_rank dp_size = llm.llm_engine.vllm_config.parallel_config.data_parallel_size prompts = [ f"{idx}.{prompt}" for idx, prompt in enumerate(prompts) if idx % dp_size == dp_rank ] outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print( f"DP Rank: {dp_rank} Prompt: {prompt!r}\nGenerated text: {generated_text!r}\n" ) ``` Running command ``` FLASHINFER_DISABLE_VERSION_CHECK=1 VLLM_ALL2ALL_BACKEND="naive" \ torchrun --nproc-per-node=4 examples/offline_inference/torchrun_dp_example.py \ --tp-size=2 --dp-size=2 ``` Log before fix, the responses from the 2nd DP group are wrong: ``` DP Rank: 0 Prompt: '0.Hello, my name is' Generated text: ' 0.Hello, my name is 0.Hello, my name is' DP Rank: 1 Prompt: '1.Hello, my name is' Generated text: 'aaaa st sample task SS field Story notion snapshot Reyn final moment Reyn Ku Ent dead' DP Rank: 0 Prompt: '0.Hello, my name is' Generated text: ' 0.Hello, my name is 0.Hello, my name is' DP Rank: 1 Prompt: '1.Hello, my name is' Generated text: 'aaaa st sample task SS field Story notion snapshot Reyn final moment Reyn Ku Ent dead' ``` Log after fix: ``` DP Rank: 0 Prompt: '0.Hello, my name is' Generated text: ' John.\n\n### Instruction 2 (Much more difficult with' DP Rank: 1 Prompt: '1.Hello, my name is' Generated text: ' John.\n2.I am a software developer.\n3.I love' DP Rank: 1 Prompt: '1.Hello, my name is' Generated text: ' John.\n2.I am a software developer.\n3.I love' DP Rank: 0 Prompt: '0.Hello, my name is' Generated text: ' John.\n\n### Instruction 2 (Much more difficult with' ``` Reviewed By: diviramon, mutinifni, wushidonguc Differential Revision: D91016491

meta-codesync bot added fb-exported meta-exported labels Jan 20, 2026

mergify bot added the bug Something isn't working label Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

River12 force-pushed the export-D91016491 branch from 5d9ef78 to 495c949 Compare January 20, 2026 19:04

River12 force-pushed the export-D91016491 branch 2 times, most recently from 69fec47 to 4629615 Compare January 24, 2026 00:28

River12 force-pushed the export-D91016491 branch from 4629615 to 6b1424c Compare January 24, 2026 01:03

River12 changed the title ~~Fix MoE Model DP+TP with NaiveAll2AllManger Bug~~ [Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManger Bug Jan 24, 2026

River12 force-pushed the export-D91016491 branch from 6b1424c to a603936 Compare January 24, 2026 01:06

River12 changed the title ~~[Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManger Bug~~ [Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManager Bug Jan 24, 2026

River12 force-pushed the export-D91016491 branch from a603936 to c3b09a0 Compare January 27, 2026 22:13

River12 force-pushed the export-D91016491 branch from c3b09a0 to fc0744f Compare January 27, 2026 22:23

sarckk requested review from mgoin and tlrmchlsmth January 30, 2026 21:42

tlrmchlsmth reviewed Jan 30, 2026

View reviewed changes

robertgshaw2-redhat approved these changes Jan 30, 2026

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) January 30, 2026 22:39

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2026

auto-merge was automatically disabled February 2, 2026 17:39
Head branch was pushed to by a user without write access

River12 force-pushed the export-D91016491 branch from 025bc98 to 1f73a3b Compare February 2, 2026 17:39

River12 added 2 commits February 2, 2026 13:32

Merge branch 'main' into export-D91016491

451cd61

Merge branch 'main' into export-D91016491

884d2e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManager Bug#32705

[Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManager Bug#32705
River12 wants to merge 3 commits intovllm-project:mainfrom
River12:export-D91016491

River12 commented Jan 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

sarckk commented Jan 27, 2026

Uh oh!

River12 commented Jan 27, 2026

Uh oh!

tlrmchlsmth left a comment

Uh oh!

robertgshaw2-redhat commented Jan 30, 2026 •

edited

Loading

Uh oh!

River12 commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

River12 commented Jan 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

sarckk commented Jan 27, 2026

Uh oh!

River12 commented Jan 27, 2026

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

River12 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

River12 commented Jan 20, 2026 •

edited by github-actions bot

Loading

robertgshaw2-redhat commented Jan 30, 2026 •

edited

Loading

River12 commented Jan 30, 2026 •

edited

Loading