Skip to content

[1/N][Attention] Restructure attention: move files#31916

Merged
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_1
Jan 9, 2026
Merged

[1/N][Attention] Restructure attention: move files#31916
vllm-bot merged 12 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_1

Conversation

@MatthewBonanni
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni commented Jan 7, 2026

Purpose

Implement step 1 of #31919. This PR consists solely of file renaming and movement, and the necessary updates to imports.

  • Move vllm/attention/layers to vllm/model_executor/layers/attention
  • Move vllm/attention/backends/abstract.py to vllm/v1/attention/backend.py
  • Move vllm/attention/backends/registry.py to vllm/v1/attention/backends/registry.py
  • Eliminate vllm/attention/backends folder
  • Move vllm/attention/utils/fa_utils.py to vllm/v1/attention/backends/fa_utils.py
  • Move vllm/attention/ops to vllm/v1/attention/ops
  • Move vllm/attention/selector.py to vllm/v1/attention/selector.py

Note

  • vllm/v1/attention is subject to more mypy scrutiny than vllm/attention since fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround #31465
  • Since this PR moves files from vllm/attention to vllm/v1/attention, new pre-commit issues arise
  • Due to its size, we want to keep this PR as simple as possible: only file renaming and path changes
  • Therefore, we have added vllm/v1/attention/backends/fa_utils.py to EXCLUDES
  • This will be addressed in the next PR

Test Plan

CI

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Bulk path migration with no intended functional changes.

  • Move vllm/attention/backends/abstract.py to vllm/v1/attention/backend.py, and backends/registry.py + utils/fa_utils.py to vllm/v1/attention/backends/
  • Move attention ops from vllm/attention/ops to vllm/v1/attention/ops and selector to vllm/v1/attention/selector.py
  • Relocate attention layers to vllm/model_executor/layers/attention/
  • Update all imports across models, tests, examples, benchmarks, and engine/platform code to new v1 paths
  • CI: adjust Buildkite watched paths, add ROCm file matchers for new paths, and update tests to use v1 modules
  • Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
  • Tooling: update pre-commit/mypy EXCLUDE to vllm/v1/attention/ops and backends/fa_utils.py
  • Docs: fix references to new v1 attention locations

Written by Cursor Bugbot for commit 0ed4b06. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit d1dee22. Configure here.


Note

Migrates attention code into the vllm/v1 namespace and updates references across the repo.

  • Move vllm/attention/backends/abstract.pyvllm/v1/attention/backend.py; backends/registry.py and utils/fa_utils.pyvllm/v1/attention/backends/
  • Move ops vllm/attention/opsvllm/v1/attention/ops, selector → vllm/v1/attention/selector.py, and attention layers → vllm/model_executor/layers/attention/
  • Update imports across models, tests, examples, benchmarks, and engine/platform code to new v1 paths
  • CI: adjust Buildkite watched paths, add ROCm file matchers for new v1 locations; tests updated accordingly
  • Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
  • Tooling: update pre-commit/mypy EXCLUDE for vllm/v1/attention/ops and backends/fa_utils.py
  • Docs: fix references to new v1 attention modules

Written by Cursor Bugbot for commit d1dee22. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 7613b31. Configure here.


Note

Bulk path migration of attention code to the v1 namespace; no functional changes intended.

  • Move vllm/attention/backends/abstract.pyvllm/v1/attention/backend.py, registry and fa_utils.pyvllm/v1/attention/backends/
  • Move ops vllm/attention/opsvllm/v1/attention/ops, selector → vllm/v1/attention/selector.py
  • Relocate attention layers to vllm/model_executor/layers/attention/
  • Update imports across models, tests, examples, benchmarks, engine, and platforms to new v1 paths
  • CI: adjust Buildkite watched paths, add ROCm matchers for vllm/v1/attention/**, update tests to new modules
  • Repo metadata: update CODEOWNERS and Mergify label rules to point at v1 attention paths
  • Tooling: update pre-commit/mypy EXCLUDE for vllm/v1/attention/ops and backends/fa_utils.py
  • Docs: fix references to new v1 attention locations

Written by Cursor Bugbot for commit 7613b31. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request involves extensive changes and refactoring within the vllm project's attention mechanisms, specifically impacting model_executor/layers/attention and introducing a new v1/attention module. These changes appear to touch various attention implementations such as chunked_local_attention, cross_attention, encoder_only_attention, static_sink_attention, and different operational backends like flashmla, paged_attn, and Triton-based prefill and decode attentions. No specific code changes or review comments were provided to detail the nature or purpose of these modifications.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 7, 2026

Documentation preview: https://vllm--31916.org.readthedocs.build/en/31916/

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models llama Related to Llama models multi-modality Related to multi-modality (#4194) performance Performance-related issues qwen Related to Qwen models gpt-oss Related to GPT-OSS models nvidia labels Jan 7, 2026
@mergify mergify bot added the rocm Related to AMD ROCm label Jan 7, 2026
@mergify mergify bot added cpu Related to CPU backends speculative-decoding kv-connector labels Jan 7, 2026
@MatthewBonanni MatthewBonanni force-pushed the attention_restructure_1 branch from fbca883 to f671805 Compare January 7, 2026 20:55
@LucasWilkinson LucasWilkinson added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Jan 8, 2026
Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming there should be no changes other than imports and renames!

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 8, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Jan 9, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 9, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 9, 2026
@mergify mergify bot removed the needs-rebase label Jan 9, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@vllm-bot vllm-bot merged commit 2612ba9 into vllm-project:main Jan 9, 2026
143 of 145 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 9, 2026
@MatthewBonanni MatthewBonanni deleted the attention_restructure_1 branch January 10, 2026 15:26
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends deepseek Related to DeepSeek models documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs rocm Related to AMD ROCm speculative-decoding tpu Related to Google TPUs v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants