[npugraph_ex]enable npugraph_ex by default by huyq · Pull Request #6664 · vllm-project/vllm-ascend

huyq · 2026-02-10T09:09:58Z

What this PR does / why we need it?

This pull request enables the npugraph_ex backend by default to improve performance on Ascend NPUs, as proposed in the RFC.

Does this PR introduce any user-facing change?

Yes. npugraph_ex is now enabled by default. Users can disable it by setting enable: false in the npugraph_ex_config section of the additional_config.

How was this patch tested?

CI passed. The changes are covered by existing and new E2E tests (test_aclgraph_accuracy.py) and unit tests (test_ascend_config.py) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with npugraph_ex enabled and disabled, as well as with the new static kernel option.

gemini-code-assist · 2026-02-10T09:10:21Z

Summary of Changes

Hello @huyq, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enabling the npugraph_ex backend by default, aligning with a previously discussed RFC. It introduces significant optimizations for batch-invariant operations through a new FX pass that leverages Triton kernels and ensures compatibility by conditionally disabling certain fusion passes. Additionally, it includes a critical patch to improve npugraph_ex's interaction with Triton within the torchair framework, enhancing overall stability and performance for NPU-based inference.

Highlights

Default NPUGraph_EX Enablement: The npugraph_ex backend is now enabled by default, streamlining its use for improved performance. This change is reflected in both the configuration and the user guide documentation.
Batch-Invariant FX Pass Integration: A new BatchInvariantFXPass has been introduced. This pass automatically replaces standard PyTorch aten matrix operations with specialized batch-invariant Triton kernel implementations during npugraph_ex compilation, optimizing performance for specific batching scenarios.
Conditional Fusion Pass Disablement: Existing fusion passes (fuse_norm_quant, fuse_qknorm_rope, fuse_allreduce_rms) are now conditionally disabled when the batch-invariant mode is active. This prevents potential conflicts and ensures compatibility with the new batch-invariant optimizations.
Triton Compatibility Patch for NPUGraph_EX: A new patch (patch_npugraph_ex_triton.py) has been added to address compatibility issues with torchair's ValuePack handling in Triton environments, ensuring correct processing of input parameters during npugraph_ex compilation.
Test Updates: Unit and end-to-end tests have been updated to reflect the new default enablement of npugraph_ex. Specific accuracy tests now explicitly disable npugraph_ex to maintain their original testing conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/user_guide/configuration/additional_config.md
- Updated the default value for the npugraph_ex enable option from False to True in the documentation.
tests/e2e/singlecard/test_aclgraph_accuracy.py
- Added npugraph_ex_config to a test case to explicitly disable npugraph_ex for accuracy validation.
tests/ut/test_ascend_config.py
- Modified unit test assertion to expect npugraph_ex_config.enable to be True by default.
vllm_ascend/ascend_config.py
- Changed the default value of the enable parameter in NpugraphExConfig from False to True.
- Updated the docstring for the enable parameter to reflect the new default value.
- Implemented logic to disable fuse_norm_quant, fuse_qknorm_rope, and fuse_allreduce_rms when the VLLM_BATCH_INVARIANT environment variable is set.
vllm_ascend/compilation/compiler_interface.py
- Integrated the BatchInvariantFXPass to apply batch-invariant transformations if the batch-invariant mode is enabled.
- Corrected the configuration parameter name from num_speculative_token to num_speculative_tokens.
vllm_ascend/compilation/npu_graph_ex_pass_manager.py
- Imported necessary modules for batch-invariant checks and Triton availability.
- Added conditional logic to include BatchInvariantFXPass in the pass manager if batch-invariant mode is active and Triton is available.
- Modified the conditions for adding fusion passes to ensure they are not applied when batch-invariant mode is enabled.
vllm_ascend/compilation/passes/batch_invariant_fx_pass.py
- Added a new file defining BatchInvariantFXPass to replace aten matrix operations with batch-invariant Triton kernel implementations.
- Implemented pattern matching and replacement for torch.mm, torch.matmul, torch.addmm, torch.bmm, and torch.nn.functional.linear.
vllm_ascend/patch/init.py
- Added documentation for the new patch_npugraph_ex_triton.py.
vllm_ascend/patch/worker/init.py
- Imported the new patch_npugraph_ex_triton module.
vllm_ascend/patch/worker/patch_npugraph_ex_triton.py
- Added a new file to patch torchair.core._concrete_graph.ValuePack and related _unpack_meta and _unpack_npu functions.
- Ensured correct handling of ValuePack in Triton scenarios for npugraph_ex backend.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-10T09:11:16Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request enables npugraph_ex by default, which is a significant performance-related change. The implementation also introduces a new 'batch-invariant' mode with corresponding graph passes. My main feedback is to address the code duplication in the implementation of the batch-invariant FX pass to improve long-term maintainability. I have also provided suggestions for the PR title and description to align with the repository's style guide.

Suggested PR Title:

[npugraph_ex][Feature] Enable npugraph_ex by default

Suggested PR Summary:

### What this PR does / why we need it?

This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the RFC.

It also introduces support for a "batch-invariant" mode, which can be enabled via the `VLLM_BATCH_INVARIANT` environment variable. When this mode is active, specific FX graph passes are applied to replace standard aten matrix operations with batch-invariant Triton kernels, making them compatible with `npugraph_ex` compilation. This also includes patches to `torchair` to support this new mode.

Fixes #6214

### Does this PR introduce _any_ user-facing change?

Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`.

A new environment variable `VLLM_BATCH_INVARIANT` is introduced to enable the batch-invariant mode.

### How was this patch tested?

CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option.

gemini-code-assist · 2026-02-10T09:11:59Z

+    # Apply batch-invariant FX pass if enabled
+    if vllm_is_batch_invariant():
+        from vllm_ascend.compilation.passes.batch_invariant_fx_pass import (
+            apply_batch_invariant_to_fx_graph,
+        )
+        graph = apply_batch_invariant_to_fx_graph(graph)


This logic for applying the batch-invariant FX pass is duplicated. Another implementation exists in BatchInvariantFXPass (defined in vllm_ascend/compilation/passes/batch_invariant_fx_pass.py), which is used when npugraph_ex is disabled. This code duplication can lead to maintenance issues and potential inconsistencies.

To improve maintainability, please consider unifying these two implementations into one. Using BatchInvariantFXPass with its PatternMatcherPass would be a more robust and standard approach for both scenarios.

Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Docs] Fix GLM-5 deploy command (vllm-project#6711) [npugraph_ex]enable npugraph_ex by default (vllm-project#6664) [doc]add GLM5.md (vllm-project#6709) [Model] GLM5 adaptation (vllm-project#6642) [Bugfix] Update target probs to target logits in rejection sample (vllm-project#6685) [Main][Ops] Make triton rope support index_selecting from cos_sin_cache (vllm-project#5450) [CI]fix nightly multi node test error for wait for pod ready (vllm-project#6675) [main to main] upgrade main 0210 (vllm-project#6673) [main][Quant] Remove unused rotation functions and parameters from W4A4 LAOS quantization (vllm-project#6648) [Test][BugFix] Fix torch.rand usage in triton penalty test (vllm-project#6680) Add Worker Interface:check_health (vllm-project#6681)

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This pull request enables the `npugraph_ex` backend by default to improve performance on Ascend NPUs, as proposed in the [RFC](vllm-project#6214). ### Does this PR introduce _any_ user-facing change? Yes. `npugraph_ex` is now enabled by default. Users can disable it by setting `enable: false` in the `npugraph_ex_config` section of the `additional_config`. ### How was this patch tested? CI passed. The changes are covered by existing and new E2E tests (`test_aclgraph_accuracy.py`) and unit tests (`test_ascend_config.py`) that have been updated to reflect the new default behavior. The tests verify correctness and consistency with `npugraph_ex` enabled and disabled, as well as with the new static kernel option. Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com> Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>

huyq requested review from LCAIZJ, Yikun, wangxiyuan and yiz-liu as code owners February 10, 2026 09:09

github-actions Bot added documentation Improvements or additions to documentation module:tests module:core labels Feb 10, 2026

gemini-code-assist Bot reviewed Feb 10, 2026

View reviewed changes

huyq force-pushed the npugraph_ex_switch_on branch 4 times, most recently from 2791b25 to 8547045 Compare February 11, 2026 01:27

enable npugraph_ex by default

78f94ec

Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>

huyq force-pushed the npugraph_ex_switch_on branch from 8547045 to 78f94ec Compare February 11, 2026 02:13

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Feb 11, 2026

wangxiyuan approved these changes Feb 11, 2026

View reviewed changes

wangxiyuan merged commit a0315f6 into vllm-project:main Feb 12, 2026
60 checks passed

wangxiyuan mentioned this pull request Feb 24, 2026

[Misc]: test #6787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[npugraph_ex]enable npugraph_ex by default#6664

[npugraph_ex]enable npugraph_ex by default#6664
wangxiyuan merged 1 commit intovllm-project:mainfrom
huyq:npugraph_ex_switch_on

huyq commented Feb 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Feb 10, 2026

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huyq commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

huyq commented Feb 10, 2026 •

edited

Loading