[npugraph_ex]enable npugraph_ex by default#6664
Conversation
Summary of ChangesHello @huyq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on enabling the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request enables npugraph_ex by default, which is a significant performance-related change. The implementation also introduces a new 'batch-invariant' mode with corresponding graph passes. My main feedback is to address the code duplication in the implementation of the batch-invariant FX pass to improve long-term maintainability. I have also provided suggestions for the PR title and description to align with the repository's style guide.
Suggested PR Title:
[npugraph_ex][Feature] Enable npugraph_ex by defaultSuggested PR Summary:
### What this PR does / why we need it?
This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the RFC.
It also introduces support for a "batch-invariant" mode, which can be enabled via the `VLLM_BATCH_INVARIANT` environment variable. When this mode is active, specific FX graph passes are applied to replace standard aten matrix operations with batch-invariant Triton kernels, making them compatible with `npugraph_ex` compilation. This also includes patches to `torchair` to support this new mode.
Fixes #6214
### Does this PR introduce _any_ user-facing change?
Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`.
A new environment variable `VLLM_BATCH_INVARIANT` is introduced to enable the batch-invariant mode.
### How was this patch tested?
CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option.| # Apply batch-invariant FX pass if enabled | ||
| if vllm_is_batch_invariant(): | ||
| from vllm_ascend.compilation.passes.batch_invariant_fx_pass import ( | ||
| apply_batch_invariant_to_fx_graph, | ||
| ) | ||
| graph = apply_batch_invariant_to_fx_graph(graph) |
There was a problem hiding this comment.
This logic for applying the batch-invariant FX pass is duplicated. Another implementation exists in BatchInvariantFXPass (defined in vllm_ascend/compilation/passes/batch_invariant_fx_pass.py), which is used when npugraph_ex is disabled. This code duplication can lead to maintenance issues and potential inconsistencies.
To improve maintainability, please consider unifying these two implementations into one. Using BatchInvariantFXPass with its PatternMatcherPass would be a more robust and standard approach for both scenarios.
2791b25 to
8547045
Compare
Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>
8547045 to
78f94ec
Compare
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Docs] Fix GLM-5 deploy command (vllm-project#6711) [npugraph_ex]enable npugraph_ex by default (vllm-project#6664) [doc]add GLM5.md (vllm-project#6709) [Model] GLM5 adaptation (vllm-project#6642) [Bugfix] Update target probs to target logits in rejection sample (vllm-project#6685) [Main][Ops] Make triton rope support index_selecting from cos_sin_cache (vllm-project#5450) [CI]fix nightly multi node test error for wait for pod ready (vllm-project#6675) [main to main] upgrade main 0210 (vllm-project#6673) [main][Quant] Remove unused rotation functions and parameters from W4A4 LAOS quantization (vllm-project#6648) [Test][BugFix] Fix torch.rand usage in triton penalty test (vllm-project#6680) Add Worker Interface:check_health (vllm-project#6681)
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>
### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>
What this PR does / why we need it?
This pull request enables the
npugraph_exbackend by default to improve performance on Ascend NPUs, as proposed in the RFC.Does this PR introduce any user-facing change?
Yes.
npugraph_exis now enabled by default. Users can disable it by settingenable: falsein thenpugraph_ex_configsection of theadditional_config.How was this patch tested?
CI passed. The changes are covered by existing and new E2E tests (
test_aclgraph_accuracy.py) and unit tests (test_ascend_config.py) that have been updated to reflect the new default behavior. The tests verify correctness and consistency withnpugraph_exenabled and disabled, as well as with the new static kernel option.