Skip to content

[Refactor] remove some metadata variables in attention_v1.#5160

Merged
wangxiyuan merged 16 commits intovllm-project:mainfrom
weijinqian0:refactor_attention_remove_metadata
Dec 19, 2025
Merged

[Refactor] remove some metadata variables in attention_v1.#5160
wangxiyuan merged 16 commits intovllm-project:mainfrom
weijinqian0:refactor_attention_remove_metadata

Conversation

@weijinqian0
Copy link
Copy Markdown
Collaborator

@weijinqian0 weijinqian0 commented Dec 18, 2025

RFC: #4629

Reason:

The metadata data class contains an excessive number of variables. We will inherit the metadata of the community and simultaneously remove some variables that are no longer needed at present.

Todo:

  1. remove attn_state partly.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors attention metadata by removing several redundant variables like query_lens, query_start_loc_list, and is_only_prefill. The changes are mostly clean and improve code clarity. However, I found a critical issue where query_lens is re-calculated on-the-fly in xlite.py using attn_metadata.query_start_loc_cpu, but this attribute is not present in the AscendMetadata object, which will lead to a runtime error. I've provided a detailed comment with a suggested fix.

Comment thread vllm_ascend/xlite/xlite.py
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

weijinqian_v1 and others added 12 commits December 18, 2025 16:54
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 19, 2025
@wangxiyuan wangxiyuan merged commit 35ad11b into vllm-project:main Dec 19, 2025
54 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 19, 2025
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits)
  [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084)
  [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818)
  [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171)
  [CI] Improve CI (vllm-project#5078)
  [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160)
  Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167)
  [Doc] Add a perf tune section (vllm-project#5127)
  [Image] Refactor image build (vllm-project#5175)
  [refactor] refactor weight trans nz and transpose (vllm-project#4878)
  [BugFix]Fix precision issue for LoRA feature (vllm-project#4141)
  【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827)
  support basic long_seq feature st (vllm-project#5140)
  [Bugfix] install trition for test_custom_op (vllm-project#5112)
  [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130)
  [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156)
  [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131)
  [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172)
  [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165)
  [Doc] Refact benchmark doc (vllm-project#5173)
  [Nightly]  Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174)
  ...

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…ect#5160)

RFC: vllm-project#4629

Reason:

The metadata data class contains an excessive number of variables. We
will inherit the metadata of the community and simultaneously remove
some variables that are no longer needed at present.

Todo:
1. remove attn_state partly.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
@wjunLu
Copy link
Copy Markdown
Collaborator

wjunLu commented Dec 23, 2025

I think those changes have not been tested, because there is no query_start_loc_cpu in AscendMetadata class.

But attn_metadata.query_start_loc_cpu is called, this will occur the following errors:

(EngineCore_DP0 pid=12553)   File "/vllm-workspace/vllm-ascend/vllm_ascend/xlite/xlite.py", line 260, in __call__
(EngineCore_DP0 pid=12553)     query_lens = attn_metadata.query_start_loc_cpu[
(EngineCore_DP0 pid=12553)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=12553) AttributeError: 'AscendMetadata' object has no attribute 'query_start_loc_cpu'

Env:

  • vllm: 0.13.0+empty
  • vllm-ascend: 0.12.0rc2.dev138+g55beac9c9.d20251223

@weijinqian0 weijinqian0 deleted the refactor_attention_remove_metadata branch January 7, 2026 11:46
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ect#5160)

RFC: vllm-project#4629

Reason:

The metadata data class contains an excessive number of variables. We
will inherit the metadata of the community and simultaneously remove
some variables that are no longer needed at present.

Todo:
1. remove attn_state partly.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ect#5160)

RFC: vllm-project#4629

Reason:

The metadata data class contains an excessive number of variables. We
will inherit the metadata of the community and simultaneously remove
some variables that are no longer needed at present.

Todo:
1. remove attn_state partly.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants