Skip to content

[Misc] Removes unnecessary graph size re-initialization#6280

Merged
wangxiyuan merged 2 commits intovllm-project:mainfrom
Angazenn:bugfix
Jan 27, 2026
Merged

[Misc] Removes unnecessary graph size re-initialization#6280
wangxiyuan merged 2 commits intovllm-project:mainfrom
Angazenn:bugfix

Conversation

@Angazenn
Copy link
Copy Markdown
Collaborator

@Angazenn Angazenn commented Jan 26, 2026

What this PR does / why we need it?

This PR removes update_default_aclgraph_sizes. In earlier versions, we add this function to change default cudagraph_capture_sizes because _npu_paged_attention degrades significantly on certain shapes (which is included in default cudagraph_capture_sizes of VLLM). Now since we use FIA as default attention op (which does not contain such performance degradation), there is no need to add this default change. Otherwise, it could cause some conflicts if we set a small cudagraph_capture_sizes that < 20 now.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: Angazenn <supperccell@163.com>
@Angazenn Angazenn requested a review from wangxiyuan as a code owner January 26, 2026 12:59
@Angazenn Angazenn added the ready read for review label Jan 26, 2026
@Angazenn Angazenn added ready-for-test start test by label for PR and removed module:tests module:core labels Jan 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully removes the update_default_aclgraph_sizes function, its imports, and all calls to it. This change aligns with the stated goal of removing unnecessary graph size re-initialization, which simplifies the codebase and improves maintainability. The removal of unused code is a positive step.

Signed-off-by: Angazenn <supperccell@163.com>
@wangxiyuan wangxiyuan merged commit 5e34c70 into vllm-project:main Jan 27, 2026
20 checks passed
wangxiyuan pushed a commit that referenced this pull request Jan 27, 2026
### What this PR does / why we need it?
Cherry-pick from #6280 .
This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

---------

Signed-off-by: Angazenn <supperccell@163.com>
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 28, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (86 commits)
  [refactor] refactor excute_model and _dymmy_run method  (vllm-project#6043)
  [Refactor] profiler config optimze (vllm-project#6141)
  [Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (vllm-project#6006)
  [UT]: refactoring 310p ops ut (vllm-project#6296)
  [Refact.]: refactoring 310p-kv cache allocator, align with main branch (vllm-project#6270)
  [Misc] Removes unnecessary graph size re-initialization (vllm-project#6280)
  [Main2Main] Upgrade vllm commit to 0123 (vllm-project#6169)
  [BugFix] Fix wheel package build workflow (vllm-project#6276)
  [CI][BugFix] Qwen3-Next nightly test fix. (vllm-project#6247)
  [Doc] quick fix for vllm-ascend version (vllm-project#6278)
  [Community] Nominate whx-sjtu as maintainer (vllm-project#6268)
  [Lint] Fix mypy issue to make CI happy (vllm-project#6272)
  BugFix:  Fix moe_load accumulation error in ACL graph mode (vllm-project#6182)
  [Patch] Remove the patch of ECExampleConnector (vllm-project#5976)
  [Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (vllm-project#5416)
  [Feat] proxy delay to remove instances (vllm-project#5934)
  [CI] Add workfolw_dispatch for nightly image build (vllm-project#6269)
  [bugfix][npugraph_ex]fix static kernel uninstall issue (vllm-project#6128)
  [Doc] 310P Documents update (vllm-project#6246)
  [Feature] Mooncake connector get remote ptp size (vllm-project#5822)
  ...
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…-project#6281)

### What this PR does / why we need it?
Cherry-pick from vllm-project#6280 .
This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

---------

Signed-off-by: Angazenn <supperccell@163.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@wangxiyuan wangxiyuan mentioned this pull request Feb 24, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
…#6280)

### What this PR does / why we need it?

This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

---------

Signed-off-by: Angazenn <supperccell@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants