[CI] Add wait logic for each individual case#6036
[CI] Add wait logic for each individual case#6036wangxiyuan merged 6 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces a utility decorator wait_until_npu_memory_free to ensure sufficient NPU memory is available before running tests, which helps prevent flaky tests. The changes are generally good, but I have identified a couple of high-severity issues in the implementation of the decorator. Firstly, the docstring for one of the parameters is incorrect, which could lead to misuse. Secondly, the waiting loop is a busy-wait loop that will cause high CPU usage. I've provided suggestions to fix these issues.
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...
### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Wait until the NPU memory is clean ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com>
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use
…ode/moe_combine_normal/moe_dispatch_normal Signed-off-by: Wangyibo1005 <2633333316@qq.com> [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (vllm-project#7455) Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: wangli <wangli858794774@gmail.com> fix 123 delete [CI] Add wait logic for each individual case (vllm-project#6036) Wait until the NPU memory is clean - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: leo-pony <nengjunma@outlook.com> 123123 123 pre-commit pick use 123
What this PR does / why we need it?
Wait until the NPU memory is clean
Does this PR introduce any user-facing change?
How was this patch tested?