[1/N][Attention] Restructure attention: move files by MatthewBonanni · Pull Request #31916 · vllm-project/vllm

MatthewBonanni · 2026-01-07T19:56:57Z

Purpose

Implement step 1 of #31919. This PR consists solely of file renaming and movement, and the necessary updates to imports.

Move vllm/attention/layers to vllm/model_executor/layers/attention
Move vllm/attention/backends/abstract.py to vllm/v1/attention/backend.py
Move vllm/attention/backends/registry.py to vllm/v1/attention/backends/registry.py
Eliminate vllm/attention/backends folder
Move vllm/attention/utils/fa_utils.py to vllm/v1/attention/backends/fa_utils.py
Move vllm/attention/ops to vllm/v1/attention/ops
Move vllm/attention/selector.py to vllm/v1/attention/selector.py

Note

vllm/v1/attention is subject to more mypy scrutiny than vllm/attention since fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround #31465
Since this PR moves files from vllm/attention to vllm/v1/attention, new pre-commit issues arise
Due to its size, we want to keep this PR as simple as possible: only file renaming and path changes
Therefore, we have added vllm/v1/attention/backends/fa_utils.py to EXCLUDES
This will be addressed in the next PR

Test Plan

CI

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Bulk path migration with no intended functional changes.

Move vllm/attention/backends/abstract.py to vllm/v1/attention/backend.py, and backends/registry.py + utils/fa_utils.py to vllm/v1/attention/backends/
Move attention ops from vllm/attention/ops to vllm/v1/attention/ops and selector to vllm/v1/attention/selector.py
Relocate attention layers to vllm/model_executor/layers/attention/
Update all imports across models, tests, examples, benchmarks, and engine/platform code to new v1 paths
CI: adjust Buildkite watched paths, add ROCm file matchers for new paths, and update tests to use v1 modules
Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
Tooling: update pre-commit/mypy EXCLUDE to vllm/v1/attention/ops and backends/fa_utils.py
Docs: fix references to new v1 attention locations

^{Written by Cursor Bugbot for commit 0ed4b06. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit d1dee22. Configure here.}

Note

Migrates attention code into the vllm/v1 namespace and updates references across the repo.

Move vllm/attention/backends/abstract.py → vllm/v1/attention/backend.py; backends/registry.py and utils/fa_utils.py → vllm/v1/attention/backends/
Move ops vllm/attention/ops → vllm/v1/attention/ops, selector → vllm/v1/attention/selector.py, and attention layers → vllm/model_executor/layers/attention/
Update imports across models, tests, examples, benchmarks, and engine/platform code to new v1 paths
CI: adjust Buildkite watched paths, add ROCm file matchers for new v1 locations; tests updated accordingly
Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
Tooling: update pre-commit/mypy EXCLUDE for vllm/v1/attention/ops and backends/fa_utils.py
Docs: fix references to new v1 attention modules

^{Written by Cursor Bugbot for commit d1dee22. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit 7613b31. Configure here.}

Note

Bulk path migration of attention code to the v1 namespace; no functional changes intended.

Move vllm/attention/backends/abstract.py → vllm/v1/attention/backend.py, registry and fa_utils.py → vllm/v1/attention/backends/
Move ops vllm/attention/ops → vllm/v1/attention/ops, selector → vllm/v1/attention/selector.py
Relocate attention layers to vllm/model_executor/layers/attention/
Update imports across models, tests, examples, benchmarks, engine, and platforms to new v1 paths
CI: adjust Buildkite watched paths, add ROCm matchers for vllm/v1/attention/**, update tests to new modules
Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
Tooling: update pre-commit/mypy EXCLUDE for vllm/v1/attention/ops and backends/fa_utils.py
Docs: fix references to new v1 attention locations

^{Written by Cursor Bugbot for commit 7613b31. This will update automatically on new commits. Configure here.}

gemini-code-assist

Code Review

The pull request involves extensive changes and refactoring within the vllm project's attention mechanisms, specifically impacting model_executor/layers/attention and introducing a new v1/attention module. These changes appear to touch various attention implementations such as chunked_local_attention, cross_attention, encoder_only_attention, static_sink_attention, and different operational backends like flashmla, paged_attn, and Triton-based prefill and decode attentions. No specific code changes or review comments were provided to detail the nature or purpose of these modifications.

mergify · 2026-01-07T20:51:53Z

Documentation preview: https://vllm--31916.org.readthedocs.build/en/31916/

ProExpertProg

Confirming there should be no changes other than imports and renames!

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-08T23:37:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-09T16:28:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

MatthewBonanni requested review from LucasWilkinson, WoosukKwon, alexm-redhat, njhill, tdoublep, tjtanaa, youkaichao and zhuohan123 as code owners January 7, 2026 19:56

MatthewBonanni marked this pull request as draft January 7, 2026 19:57

mergify bot added v1 tpu Related to Google TPUs labels Jan 7, 2026

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

MatthewBonanni mentioned this pull request Jan 7, 2026

[RFC]: Attention Restructuring Tracker #31919

Closed

1 task

github-project-automation bot added this to gpt-oss Issues & Enhancements Jan 7, 2026

mergify bot added the rocm Related to AMD ROCm label Jan 7, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Jan 7, 2026

github-project-automation bot added this to NVIDIA Jan 7, 2026

mergify bot added cpu Related to CPU backends speculative-decoding kv-connector labels Jan 7, 2026

MatthewBonanni force-pushed the attention_restructure_1 branch from fbca883 to f671805 Compare January 7, 2026 20:55

LucasWilkinson added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Jan 8, 2026

ProExpertProg approved these changes Jan 8, 2026

View reviewed changes

MatthewBonanni added 3 commits January 8, 2026 12:35

Fix some missed paths

b2687c4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Move vllm/v1/attention back to FILES and exclude fa_utils.py

dc795c0

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix some paths I missed

fa65321

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the needs-rebase label Jan 8, 2026

Merge branch 'main' into attention_restructure_1

0ed4b06

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot removed the needs-rebase label Jan 9, 2026

mergify bot added the needs-rebase label Jan 9, 2026

Merge branch 'main' into attention_restructure_1

d1dee22

mergify bot removed the needs-rebase label Jan 9, 2026

Updates after rebase

7613b31

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mgoin approved these changes Jan 9, 2026

View reviewed changes

vllm-bot merged commit 2612ba9 into vllm-project:main Jan 9, 2026
143 of 145 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 9, 2026

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 9, 2026

kyuyeunk mentioned this pull request Jan 10, 2026

[CI] Fix due to upstream chagne vllm-project/tpu-inference#1436

Merged

MatthewBonanni deleted the attention_restructure_1 branch January 10, 2026 15:26

wjunLu mentioned this pull request Jan 13, 2026

[Main2Main] Upgrade vllm commit to 0113 vllm-project/vllm-ascend#5839

Merged

Lucaskabela mentioned this pull request Jan 14, 2026

[Experimental][rl][unified] Update infer.py example to work with vLLM nightly pytorch/torchtitan#2226

Merged

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[1/N][Attention] Restructure attention: move files (vllm-project#31916)

12dd567

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[1/N][Attention] Restructure attention: move files (vllm-project#31916)

caa9977

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[1/N][Attention] Restructure attention: move files#31916

[1/N][Attention] Restructure attention: move files#31916
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_1

MatthewBonanni commented Jan 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Jan 7, 2026

Uh oh!

ProExpertProg left a comment

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

mergify bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

MatthewBonanni commented Jan 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Jan 7, 2026

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

mergify bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MatthewBonanni commented Jan 7, 2026 •

edited by github-actions bot

Loading