[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py by MatthewBonanni · Pull Request #32054 · vllm-project/vllm

MatthewBonanni · 2026-01-09T22:03:12Z

Purpose

Step 3 of #31919. Moves chunks of code from utils.py to backend.py (unchanged) and updates imports accordingly. The following objects are moved:

CommonAttentionMetadata
AttentionMetadataBuilder
AttentionCGSupport

Test Plan

CI (run all tests)

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Consolidates attention metadata types into vllm.v1.attention.backend for clearer ownership and reuse.

Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from backends/utils.py to backend.py (definitions unchanged)
Updates imports across attention backends, layers, worker code, spec-decode, and tests to reference vllm.v1.attention.backend
Removes the duplicated definitions from backends/utils.py; retains other utilities there
No functional behavior changes expected; CI covers broad backend paths

^{Written by Cursor Bugbot for commit 9d6dca7. This will update automatically on new commits. Configure here.}

Note

Centralizes attention metadata definitions for clearer ownership and reuse.

Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport to v1/attention/backend.py (definitions unchanged)
Updates imports across attention backends, model layers, worker/runtime code, spec-decode, and tests to reference v1/attention/backend
Removes duplicated definitions from v1/attention/backends/utils.py; retains other utilities there
No functional behavior changes expected

^{Written by Cursor Bugbot for commit 0aa31af. This will update automatically on new commits. Configure here.}

Note

Centralizes attention metadata definitions for clearer ownership and reuse.

Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport to v1/attention/backend.py (definitions unchanged)
Removes duplicated definitions from v1/attention/backends/utils.py; retains other utilities
Updates imports across attention backends, model layers, worker/runtime code, spec-decode, and tests to reference v1/attention/backend
No behavioral changes expected; CI covers affected paths

^{Written by Cursor Bugbot for commit 03883ba. This will update automatically on new commits. Configure here.}

Note

Consolidates attention metadata types and CUDA Graphs support into vllm.v1.attention.backend without changing behavior.

Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from v1/attention/backends/utils.py to v1/attention/backend.py
Updates imports across attention backends, model layers, worker code, spec-decode, tests, and docs (fixes reference in docs/design/cuda_graphs.md)
Removes duplicated definitions from backends/utils.py; retains other utilities
No functional changes expected; CI tests updated/use new imports

^{Written by Cursor Bugbot for commit a85395e. This will update automatically on new commits. Configure here.}

Note

Consolidates attention metadata and CUDA Graphs support types into vllm.v1.attention.backend for clearer ownership.

Moves CommonAttentionMetadata, AttentionMetadataBuilder, and AttentionCGSupport from backends/utils.py to backend.py (definitions unchanged)
Updates imports across attention backends, model layers, worker/runtime code, spec-decode, tests, and fixes doc reference in docs/design/cuda_graphs.md
Removes duplicated definitions from backends/utils.py; retains other utilities
No functional behavior changes expected; CI covers affected paths

^{Written by Cursor Bugbot for commit 516b5f6. This will update automatically on new commits. Configure here.}

… backend.py Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request aims to move AttentionMetadata-related code from utils.py to backend.py. Specifically, AttentionCGSupport and AttentionMetadataBuilder are moved. The PR description also mentions moving CommonAttentionMetadata, but this class remains in utils.py. This discrepancy might be unintentional.

I've found two critical issues with this refactoring that will cause runtime errors:

The destination file vllm/v1/attention/backend.py is missing several imports required by the moved code.
The source file vllm/v1/attention/backends/utils.py now has undefined name errors because it still references the moved classes without importing them.

I've added comments to address these issues. Please ensure all dependencies are correctly handled after the move.

vllm/v1/attention/backend.py

vllm/v1/attention/backends/utils.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-12T12:51:52Z

Documentation preview: https://vllm--32054.org.readthedocs.build/en/32054/

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

Move AttentionMetadataBuilder and AttentionCGSupport from utils.py to…

e8755ee

… backend.py Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the v1 label Jan 9, 2026

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

vllm/v1/attention/backend.py Show resolved Hide resolved

vllm/v1/attention/backends/utils.py Show resolved Hide resolved

MatthewBonanni mentioned this pull request Jan 9, 2026

[RFC]: Attention Restructuring Tracker #31919

Closed

1 task

MatthewBonanni added 2 commits January 9, 2026 17:08

Move CommonAttentionMetadata

dca6868

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Update imports

df3f8c8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni force-pushed the attention_restructure_3 branch from ac452de to df3f8c8 Compare January 9, 2026 22:12

Update imports

1192fcf

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added nvidia rocm Related to AMD ROCm labels Jan 9, 2026

github-project-automation bot added this to NVIDIA Jan 9, 2026

mergify bot added cpu Related to CPU backends speculative-decoding labels Jan 9, 2026

MatthewBonanni added 4 commits January 9, 2026 17:30

Fix imports

e3a50d4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'main' into attention_restructure_3

148d85b

Fix TYPE_CHECKING imports

eefc290

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Update imports

9d6dca7

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni marked this pull request as ready for review January 9, 2026 22:51

MatthewBonanni requested review from WoosukKwon, alexm-redhat, benchislett, gshtras, luccafong, mgoin, njhill, pavanimajety, tdoublep, tjtanaa, youkaichao and zhuohan123 as code owners January 9, 2026 22:51

mergify bot removed the needs-rebase label Jan 10, 2026

MatthewBonanni added 2 commits January 10, 2026 10:25

Redefine M instead of importing

03883ba

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Update docs

a85395e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the documentation Improvements or additions to documentation label Jan 12, 2026

Merge branch 'main' into attention_restructure_3

516b5f6

vllm-bot merged commit 20228cb into vllm-project:main Jan 12, 2026
143 of 146 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 12, 2026

MatthewBonanni deleted the attention_restructure_3 branch January 12, 2026 17:15

wjunLu mentioned this pull request Jan 13, 2026

[Main2Main] Upgrade vllm commit to 0113 vllm-project/vllm-ascend#5839

Merged

sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026

[3/N][Attention] Move AttentionMetadata-related code from utils.py to…

8cd31ce

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[3/N][Attention] Move AttentionMetadata-related code from utils.py to…

cffaaed

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[3/N][Attention] Move AttentionMetadata-related code from utils.py to…

bf18986

… backend.py (vllm-project#32054) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py#32054

[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py#32054
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_3

MatthewBonanni commented Jan 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MatthewBonanni commented Jan 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatthewBonanni commented Jan 9, 2026 •

edited by github-actions bot

Loading