Skip to content

[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py#32054

Merged
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_3
Jan 12, 2026
Merged

[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py#32054
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_3

Conversation

@MatthewBonanni
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni commented Jan 9, 2026

Purpose

Step 3 of #31919. Moves chunks of code from utils.py to backend.py (unchanged) and updates imports accordingly. The following objects are moved:

  • CommonAttentionMetadata
  • AttentionMetadataBuilder
  • AttentionCGSupport

Test Plan

CI (run all tests)

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Consolidates attention metadata types into vllm.v1.attention.backend for clearer ownership and reuse.

  • Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from backends/utils.py to backend.py (definitions unchanged)
  • Updates imports across attention backends, layers, worker code, spec-decode, and tests to reference vllm.v1.attention.backend
  • Removes the duplicated definitions from backends/utils.py; retains other utilities there
  • No functional behavior changes expected; CI covers broad backend paths

Written by Cursor Bugbot for commit 9d6dca7. This will update automatically on new commits. Configure here.


Note

Centralizes attention metadata definitions for clearer ownership and reuse.

  • Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport to v1/attention/backend.py (definitions unchanged)
  • Updates imports across attention backends, model layers, worker/runtime code, spec-decode, and tests to reference v1/attention/backend
  • Removes duplicated definitions from v1/attention/backends/utils.py; retains other utilities there
  • No functional behavior changes expected

Written by Cursor Bugbot for commit 0aa31af. This will update automatically on new commits. Configure here.


Note

Centralizes attention metadata definitions for clearer ownership and reuse.

  • Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport to v1/attention/backend.py (definitions unchanged)
  • Removes duplicated definitions from v1/attention/backends/utils.py; retains other utilities
  • Updates imports across attention backends, model layers, worker/runtime code, spec-decode, and tests to reference v1/attention/backend
  • No behavioral changes expected; CI covers affected paths

Written by Cursor Bugbot for commit 03883ba. This will update automatically on new commits. Configure here.


Note

Consolidates attention metadata types and CUDA Graphs support into vllm.v1.attention.backend without changing behavior.

  • Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from v1/attention/backends/utils.py to v1/attention/backend.py
  • Updates imports across attention backends, model layers, worker code, spec-decode, tests, and docs (fixes reference in docs/design/cuda_graphs.md)
  • Removes duplicated definitions from backends/utils.py; retains other utilities
  • No functional changes expected; CI tests updated/use new imports

Written by Cursor Bugbot for commit a85395e. This will update automatically on new commits. Configure here.


Note

Consolidates attention metadata and CUDA Graphs support types into vllm.v1.attention.backend for clearer ownership.

  • Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from backends/utils.py to backend.py (definitions unchanged)
  • Updates imports across attention backends, model layers, worker/runtime code, spec-decode, tests, and fixes doc reference in docs/design/cuda_graphs.md
  • Removes duplicated definitions from backends/utils.py; retains other utilities
  • No functional behavior changes expected; CI covers affected paths

Written by Cursor Bugbot for commit 516b5f6. This will update automatically on new commits. Configure here.

… backend.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot added the v1 label Jan 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to move AttentionMetadata-related code from utils.py to backend.py. Specifically, AttentionCGSupport and AttentionMetadataBuilder are moved. The PR description also mentions moving CommonAttentionMetadata, but this class remains in utils.py. This discrepancy might be unintentional.

I've found two critical issues with this refactoring that will cause runtime errors:

  1. The destination file vllm/v1/attention/backend.py is missing several imports required by the moved code.
  2. The source file vllm/v1/attention/backends/utils.py now has undefined name errors because it still references the moved classes without importing them.

I've added comments to address these issues. Please ensure all dependencies are correctly handled after the move.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni force-pushed the attention_restructure_3 branch from ac452de to df3f8c8 Compare January 9, 2026 22:12
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot added nvidia rocm Related to AMD ROCm labels Jan 9, 2026
@mergify mergify bot added cpu Related to CPU backends speculative-decoding labels Jan 9, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Jan 10, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 12, 2026

Documentation preview: https://vllm--32054.org.readthedocs.build/en/32054/

@mergify mergify bot added the documentation Improvements or additions to documentation label Jan 12, 2026
@vllm-bot vllm-bot merged commit 20228cb into vllm-project:main Jan 12, 2026
143 of 146 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 12, 2026
@MatthewBonanni MatthewBonanni deleted the attention_restructure_3 branch January 12, 2026 17:15
TomerBN-Nvidia pushed a commit to TomerBN-Nvidia/vllm that referenced this pull request Jan 13, 2026
… backend.py (vllm-project#32054)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
… backend.py (vllm-project#32054)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
… backend.py (vllm-project#32054)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
… backend.py (vllm-project#32054)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
… backend.py (vllm-project#32054)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu Related to CPU backends documentation Improvements or additions to documentation nvidia ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs rocm Related to AMD ROCm speculative-decoding v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants