support basic long_seq feature st by LookAround0301 · Pull Request #5140 · vllm-project/vllm-ascend

LookAround0301 · 2025-12-17T13:22:54Z

What this PR does / why we need it?

support basic long_seq feature st

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: LookAround <lixushi@huawei.com>

github-actions · 2025-12-17T13:23:54Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces end-to-end smoke tests for a long sequence feature using PCP/DCP on multi-card setups. The tests cover eager mode, full graph compilation, and piecewise execution. My review focuses on improving code quality and maintainability. I've identified a misleading docstring and significant code duplication across the test functions. I've provided suggestions to correct the docstring and refactor the tests to eliminate redundancy, making them easier to maintain in the future.

gemini-code-assist · 2025-12-17T13:24:54Z

+"""Compare the short outputs of HF and vLLM when using greedy sampling.
+
+Run `pytest tests/e2e/multicard/test_qwen3_moe.py`.
+"""


The file's docstring contains an incorrect command to run the tests. It refers to test_qwen3_moe.py instead of the current file, test_long_sequence_basic.py. This is likely a copy-paste error and can be confusing for other developers.

Suggested change

"""Compare the short outputs of HF and vLLM when using greedy sampling.

Run `pytest tests/e2e/multicard/test_qwen3_moe.py`.

"""

"""Compare the short outputs of HF and vLLM when using greedy sampling.

Run `pytest tests/e2e/multicard/long_sequence/test_long_sequence_basic.py`.

"""

gemini-code-assist · 2025-12-17T13:24:54Z

+                    decode_context_parallel_size=2,
+                    max_num_batched_tokens=1024,
+                    enable_expert_parallel=True,
+                    block_size=128
+                    ) as runner:
+        runner.model.generate(prompts, sampling_params)
+
+    model = "vllm-ascend/Qwen3-30B-A3B-W8A8"
+    with VllmRunner(
+            model,
+            enforce_eager=True,
+            max_model_len=1024,
+            tensor_parallel_size=8,
+            prefill_context_parallel_size=2,
+            decode_context_parallel_size=2,
+            enable_expert_parallel=True,
+            block_size=128,
+            quantization="ascend",
+    ) as runner:
+        runner.model.generate(prompts, sampling_params)
+
+
+def test_pcp_dcp_full_graph():
+    prompts = [
+        "The capital of France is",
+        "Hello, my name is Tom, I am",
+        "The president of United States is",
+        "AI future is"
+    ]
+    model = "deepseek-ai/DeepSeek-V2-Lite-Chat"
+    sampling_params = SamplingParams(max_tokens=32, temperature=0.0)
+    with VllmRunner(
+            model,
+            enforce_eager=False,
+            max_model_len=1024,
+            tensor_parallel_size=2,
+            prefill_context_parallel_size=2,
+            decode_context_parallel_size=2,
+            max_num_batched_tokens=1024,
+            enable_expert_parallel=True,
+            block_size=128,
+            compilation_config={
+                "cudagraph_mode": "FULL_DECODE_ONLY",
+                "cudagraph_capture_sizes": [4, 8, 24, 48, 60]}
+        ) as runner:
+        runner.model.generate(prompts, sampling_params)
+
+    model = "vllm-ascend/Qwen3-30B-A3B-W8A8"
+    with VllmRunner(
+            model,
+            enforce_eager=False,
+            max_model_len=1024,
+            tensor_parallel_size=8,
+            prefill_context_parallel_size=2,
+            decode_context_parallel_size=2,
+            enable_expert_parallel=True,
+            block_size=128,
+            quantization="ascend",
+            compilation_config={
+            "cudagraph_mode": "FULL_DECODE_ONLY",
+            "cudagraph_capture_sizes": [4, 8, 24, 48, 60]}
+        ) as runner:
+        runner.model.generate(prompts, sampling_params)
+
+
+def test_pcp_dcp_piece_wise():
+    prompts = [
+        "The capital of France is",
+        "Hello, my name is Tom, I am",
+        "The president of United States is",
+        "AI future is"
+    ]
+    model = "deepseek-ai/DeepSeek-V2-Lite-Chat"
+    sampling_params = SamplingParams(max_tokens=32, temperature=0.0)
+    with VllmRunner(
+            model,
+            enforce_eager=False,
+            max_model_len=1024,
+            tensor_parallel_size=2,
+            prefill_context_parallel_size=2,
+            decode_context_parallel_size=2,
+            max_num_batched_tokens=1024,
+            enable_expert_parallel=True,
+            block_size=128
+        ) as runner:
+        runner.model.generate(prompts, sampling_params)
+
+    model = "vllm-ascend/Qwen3-30B-A3B-W8A8"
+    with VllmRunner(
+            model,
+            enforce_eager=False,
+            max_model_len=1024,
+            tensor_parallel_size=8,
+            prefill_context_parallel_size=2,
+            decode_context_parallel_size=2,
+            enable_expert_parallel=True,
+            block_size=128,
+            quantization="ascend"
+        ) as runner:
+        runner.model.generate(prompts, sampling_params)


The three test functions (test_pcp_dcp_basic, test_pcp_dcp_full_graph, test_pcp_dcp_piece_wise) are highly repetitive, making the code difficult to maintain. Key components like prompts, sampling_params, and the VllmRunner configuration are duplicated in each function.

A better approach is to refactor this using pytest.mark.parametrize. This will create a single, parameterized test, eliminating redundancy and making the test configurations explicit and easier to manage. The suggested code implements this refactoring. Please note that import pytest is included in the suggestion and should be moved to the top of the file.

import pytest PROMPTS = [ "The capital of France is", "Hello, my name is Tom, I am", "The president of United States is", "AI future is" ] SAMPLING_PARAMS = SamplingParams(max_tokens=32, temperature=0.0) DEEPSEEK_MODEL = "deepseek-ai/DeepSeek-V2-Lite-Chat" QWEN_MODEL = "vllm-ascend/Qwen3-30B-A3B-W8A8" BASE_DEEPSEEK_ARGS = { "max_model_len": 1024, "tensor_parallel_size": 2, "prefill_context_parallel_size": 2, "decode_context_parallel_size": 2, "max_num_batched_tokens": 1024, "enable_expert_parallel": True, "block_size": 128 } BASE_QWEN_ARGS = { "max_model_len": 1024, "tensor_parallel_size": 8, "prefill_context_parallel_size": 2, "decode_context_parallel_size": 2, "enable_expert_parallel": True, "block_size": 128, "quantization": "ascend", } def _run_models(deepseek_vllm_runner_args, qwen_vllm_runner_args): with VllmRunner(DEEPSEEK_MODEL, **deepseek_vllm_runner_args) as runner: runner.model.generate(PROMPTS, SAMPLING_PARAMS) with VllmRunner(QWEN_MODEL, **qwen_vllm_runner_args) as runner: runner.model.generate(PROMPTS, SAMPLING_PARAMS) @pytest.mark.parametrize("extra_args", [ {"enforce_eager": True}, { "enforce_eager": False, "compilation_config": { "cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [4, 8, 24, 48, 60] }, }, {"enforce_eager": False}, ], ids=["basic", "full_graph", "piece_wise"]) def test_pcp_dcp(extra_args): deepseek_args = {**BASE_DEEPSEEK_ARGS, **extra_args} qwen_args = {**BASE_QWEN_ARGS, **extra_args} _run_models(deepseek_args, qwen_args)

Signed-off-by: LookAround <lixushi@huawei.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

### What this PR does / why we need it? support basic long_seq feature st - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LookAround <lixushi@huawei.com>

### What this PR does / why we need it? support basic long_seq feature st - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

LookAround0301 added 2 commits December 17, 2025 21:18

support long_sequence st

1b7a964

Signed-off-by: LookAround <lixushi@huawei.com>

clean code

41a488c

Signed-off-by: LookAround <lixushi@huawei.com>

github-actions bot added the module:tests label Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

LookAround0301 added 5 commits December 17, 2025 21:32

clean code

1edc4a1

Signed-off-by: LookAround <lixushi@huawei.com>

fix bug

dbbfc0b

Signed-off-by: LookAround <lixushi@huawei.com>

Merge branch 'refs/heads/main' into ST

dc53120

clean code

76e6f46

Signed-off-by: LookAround <lixushi@huawei.com>

clean code

3bf1d00

Signed-off-by: LookAround <lixushi@huawei.com>

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 18, 2025

fix bug

84d6298

Signed-off-by: LookAround <lixushi@huawei.com>

wangxiyuan approved these changes Dec 18, 2025

View reviewed changes

fix bug

b4a030a

Signed-off-by: LookAround <lixushi@huawei.com>

LookAround0301 closed this Dec 18, 2025

LookAround0301 reopened this Dec 18, 2025

LookAround0301 added 2 commits December 18, 2025 20:27

Merge branch 'main' into ST

06240e4

fix bug

0a0d38a

Signed-off-by: LookAround <lixushi@huawei.com>

wangxiyuan merged commit 76e58d6 into vllm-project:main Dec 19, 2025
22 checks passed

LookAround0301 deleted the ST branch January 4, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support basic long_seq feature st#5140

support basic long_seq feature st#5140
wangxiyuan merged 11 commits intovllm-project:mainfrom
LookAround0301:ST

LookAround0301 commented Dec 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LookAround0301 commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LookAround0301 commented Dec 17, 2025 •

edited by github-actions bot

Loading