[CI] Add wait logic for each individual case by Potabk · Pull Request #6036 · vllm-project/vllm-ascend

Potabk · 2026-01-20T06:45:44Z

What this PR does / why we need it?

Wait until the NPU memory is clean

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2c24bc6

Signed-off-by: wangli <wangli858794774@gmail.com>

github-actions · 2026-01-20T06:46:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a utility decorator wait_until_npu_memory_free to ensure sufficient NPU memory is available before running tests, which helps prevent flaky tests. The changes are generally good, but I have identified a couple of high-severity issues in the implementation of the decorator. Firstly, the docstring for one of the parameters is incorrect, which could lead to misuse. Secondly, the waiting loop is a busy-wait loop that will cause high CPU usage. I've provided suggestions to fix these issues.

Signed-off-by: wangli <wangli858794774@gmail.com>

Signed-off-by: leo-pony <nengjunma@outlook.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...

### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123

…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123

…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit

…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use

…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use 123

fix

ff1f957

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk requested a review from wangxiyuan as a code owner January 20, 2026 06:45

github-actions bot added the module:tests label Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

Comment thread tests/e2e/conftest.py Outdated

Comment thread tests/e2e/conftest.py Outdated

Potabk added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026

Potabk and others added 5 commits January 20, 2026 14:50

fix

5c28398

Signed-off-by: wangli <wangli858794774@gmail.com>

also add wait for pcp

acc250a

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

6031f54

Signed-off-by: wangli <wangli858794774@gmail.com>

fix log

2d67b7f

Signed-off-by: wangli <wangli858794774@gmail.com>

add successfully print

3c67a06

Signed-off-by: leo-pony <nengjunma@outlook.com>

wangxiyuan merged commit 8cf1e8d into vllm-project:main Jan 20, 2026
20 checks passed

Potabk deleted the 0112_fix branch January 21, 2026 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add wait logic for each individual case#6036

[CI] Add wait logic for each individual case#6036
wangxiyuan merged 6 commits intovllm-project:mainfrom
Potabk:0112_fix

Potabk commented Jan 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Potabk commented Jan 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Potabk commented Jan 20, 2026 •

edited by github-actions bot

Loading