[CI][BugFix] Qwen3-Next nightly test fix.#6247
Conversation
Signed-off-by: InSec <1790766300@qq.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request provides a temporary fix for a nightly test failure involving the Qwen3-Next model. The changes are focused on avoiding an accuracy issue that occurs in full graph mode. To achieve this, the FULL_DECODE_ONLY CUDA graph mode is disabled by removing the --compilation-config server argument. Additionally, new server arguments (--async-scheduling, --no-enable-prefix-caching, --enable-expert-parallel) are introduced as part of the workaround. The test's scope is also narrowed by reducing the range of MAX_NUM_BATCHED_TOKENS. These changes appear to be a clear and targeted approach for a temporary CI fix.
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (86 commits) [refactor] refactor excute_model and _dymmy_run method (vllm-project#6043) [Refactor] profiler config optimze (vllm-project#6141) [Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (vllm-project#6006) [UT]: refactoring 310p ops ut (vllm-project#6296) [Refact.]: refactoring 310p-kv cache allocator, align with main branch (vllm-project#6270) [Misc] Removes unnecessary graph size re-initialization (vllm-project#6280) [Main2Main] Upgrade vllm commit to 0123 (vllm-project#6169) [BugFix] Fix wheel package build workflow (vllm-project#6276) [CI][BugFix] Qwen3-Next nightly test fix. (vllm-project#6247) [Doc] quick fix for vllm-ascend version (vllm-project#6278) [Community] Nominate whx-sjtu as maintainer (vllm-project#6268) [Lint] Fix mypy issue to make CI happy (vllm-project#6272) BugFix: Fix moe_load accumulation error in ACL graph mode (vllm-project#6182) [Patch] Remove the patch of ECExampleConnector (vllm-project#5976) [Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (vllm-project#5416) [Feat] proxy delay to remove instances (vllm-project#5934) [CI] Add workfolw_dispatch for nightly image build (vllm-project#6269) [bugfix][npugraph_ex]fix static kernel uninstall issue (vllm-project#6128) [Doc] 310P Documents update (vllm-project#6246) [Feature] Mooncake connector get remote ptp size (vllm-project#5822) ...
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com>
### What this PR does / why we need it? Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the **full graph** mode. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: InSec <1790766300@qq.com>
What this PR does / why we need it?
Qwen3-Next nightly test fix. Temporarily avoid the accuracy issue in the full graph mode.
Does this PR introduce any user-facing change?
N/A
How was this patch tested?