[BugFix] Fix Qwen3-Next because of TP Attn + EP MoE modified by wxsIcey · Pull Request #3221 · vllm-project/vllm-ascend

wxsIcey · 2025-09-28T02:25:41Z

The upstream changes to the TP Attn + EP MoE module caused the qwen3-next inference to fail, so this issue was fixed, caused by vllm-project/vllm#24982.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光，",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪，",
        "家书抵万金啥意思？",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        model="Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              gpu_memory_utilization=0.7,
              block_size=64
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@releases/v0.11.0

github-actions · 2025-09-28T02:25:50Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-09-28T03:47:53Z

@wxsIcey got it

github-actions · 2025-09-28T13:16:47Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Icey <1790571317@qq.com>

- Fixes Qwen3-Next because of vllm #24982 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? ``` def main(): prompts = [ "窗前明月光，", "The president of the United States is Mr.", "The capital of France is", "The future of AI is", "感时花溅泪，", "家书抵万金啥意思？", "plz tell me a story: ", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, max_model_len=256, gpu_memory_utilization=0.7, block_size=64 ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 --------- Signed-off-by: Icey <1790571317@qq.com>

- Fixes Qwen3-Next because of vllm #24982 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? ``` def main(): prompts = [ "窗前明月光，", "The president of the United States is Mr.", "The capital of France is", "The future of AI is", "感时花溅泪，", "家书抵万金啥意思？", "plz tell me a story: ", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, max_model_len=256, gpu_memory_utilization=0.7, block_size=64 ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

- Fixes Qwen3-Next because of vllm #24982 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? ``` def main(): prompts = [ "窗前明月光，", "The president of the United States is Mr.", "The capital of France is", "The future of AI is", "感时花溅泪，", "家书抵万金啥意思？", "plz tell me a story: ", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, max_model_len=256, gpu_memory_utilization=0.7, block_size=64 ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>

- Fixes Qwen3-Next because of vllm #24982 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? ``` def main(): prompts = [ "窗前明月光，", "The president of the United States is Mr.", "The capital of France is", "The future of AI is", "感时花溅泪，", "家书抵万金啥意思？", "plz tell me a story: ", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, max_model_len=256, gpu_memory_utilization=0.7, block_size=64 ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

- Fixes Qwen3-Next because of vllm #24982 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? ``` def main(): prompts = [ "窗前明月光，", "The president of the United States is Mr.", "The capital of France is", "The future of AI is", "感时花溅泪，", "家书抵万金啥意思？", "plz tell me a story: ", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95) # Create an LLM. llm = LLM( model="Qwen/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, max_model_len=256, gpu_memory_utilization=0.7, block_size=64 ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@releases/v0.11.0 --------- Signed-off-by: Icey <1790571317@qq.com>

wangxiyuan mentioned this pull request Sep 28, 2025

[Release]: Release checklist for v0.11.0rc1 #3141

Closed

42 tasks

github-actions bot added the merge-conflicts label Sep 28, 2025

wxsIcey force-pushed the qwen3_next_fix branch from f7d2b47 to d92cc9b Compare September 29, 2025 01:13

github-actions bot removed the merge-conflicts label Sep 29, 2025

wxsIcey changed the title ~~[BugFix] Fix Qwen3-Next by vllm #25400 #24982~~ [BugFix] Fix Qwen3-Next by vllm #24982 Sep 29, 2025

wxsIcey changed the title ~~[BugFix] Fix Qwen3-Next by vllm #24982~~ [BugFix] Fix Qwen3-Next because of vllm #24982 Sep 29, 2025

wxsIcey added ready read for review ready-for-test start test by label for PR labels Sep 29, 2025

wxsIcey marked this pull request as ready for review September 29, 2025 01:25

wxsIcey requested a review from wangxiyuan September 29, 2025 01:25

wxsIcey added 2 commits September 29, 2025 07:05

[BugFix] Fix Qwen3-Next by vllm #25400 #24982

603388a

Signed-off-by: Icey <1790571317@qq.com>

tiny fix

c927bf9

Signed-off-by: Icey <1790571317@qq.com>

wxsIcey force-pushed the qwen3_next_fix branch from 161ef22 to c927bf9 Compare September 29, 2025 07:05

wangxiyuan approved these changes Sep 29, 2025

View reviewed changes

wangxiyuan merged commit 83092d9 into vllm-project:main Sep 29, 2025
19 checks passed

wxsIcey changed the title ~~[BugFix] Fix Qwen3-Next because of vllm #24982~~ [BugFix] Fix Qwen3-Next because of TP Attn + EP MoE modified Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix Qwen3-Next because of TP Attn + EP MoE modified#3221

[BugFix] Fix Qwen3-Next because of TP Attn + EP MoE modified#3221
wangxiyuan merged 2 commits intovllm-project:mainfrom
wxsIcey:qwen3_next_fix

wxsIcey commented Sep 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

wangxiyuan commented Sep 28, 2025

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wxsIcey commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

wangxiyuan commented Sep 28, 2025

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wxsIcey commented Sep 28, 2025 •

edited

Loading