Skip to content

[5/N][Attention] Finish eliminating vllm/attention folder#32064

Merged
ProExpertProg merged 26 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_5
Jan 27, 2026
Merged

[5/N][Attention] Finish eliminating vllm/attention folder#32064
ProExpertProg merged 26 commits intovllm-project:mainfrom
MatthewBonanni:attention_restructure_5

Conversation

@MatthewBonanni
Copy link
Collaborator

@MatthewBonanni MatthewBonanni commented Jan 9, 2026

Merge #32060 before this.

Purpose

Step 5 of #31919: This PR finishes eliminating the vllm/attention folder by doing the following:

  • Split vllm/attention/layer.py into vllm/model_executor/layers/attention/mla_attention.py (MLAAttention, unified_mla_attention) and vllm/model_executor/layers/attention/attention.py (Attention, unified_attention)
  • Move vllm/attention/utils/kv_sharing_utils.py content into vllm/model_executor/layers/attention/attention.py
  • Move vllm/attention/utils/kv_transfer_utils.py to vllm/model_executor/layers/attention/kv_transfer_utils.py
  • Eliminate vllm/attention folder
  • Add imports to vllm/model_executor/layers/attention/__init__.py to enable module-level imports

Test Plan

CI (should run all tests)

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Completes migration away from vllm/attention by relocating attention layers and utilities under vllm/model_executor.

  • Split former vllm/attention/layer.py into vllm/model_executor/layers/attention/attention.py (Attention, unified_attention) and .../mla_attention.py (MLAAttention, unified_mla_attention)
  • Inlined validate_kv_sharing_target from vllm/attention/utils/kv_sharing_utils.py into .../attention.py; moved kv_transfer_utils to .../layers/attention/kv_transfer_utils.py
  • Removed vllm/attention package; updated imports across models, backends, workers, quantization, tests, docs, and CODEOWNERS; adjusted CI source_file_dependencies
  • Minor type hints and custom-op registrations updated to reflect new module layout

Written by Cursor Bugbot for commit 99e5293. This will update automatically on new commits. Configure here.


Note

Completes migration away from vllm/attention by relocating attention code into vllm/model_executor.

  • Split former vllm/attention/layer.py into .../attention/attention.py (Attention, unified_attention) and .../attention/mla_attention.py (MLAAttention, unified_mla_attention)
  • Inlined validate_kv_sharing_target and moved kv_transfer_utils into .../layers/attention; removed vllm/attention and old utilities
  • Updated imports across models, quantization, backends, workers, tests, and docs; adjusted CODEOWNERS and Buildkite test dependencies to new paths
  • Minor type hints and custom-op registrations updated to match new module layout

Written by Cursor Bugbot for commit 9942dff. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 3111dd0. Configure here.


Note

Completes migration away from vllm/attention to vllm/model_executor.

  • Splits former vllm/attention/layer.py into vllm/model_executor/layers/attention/attention.py (Attention, unified_attention) and .../mla_attention.py (MLAAttention, unified_mla_attention), moving MLA custom-ops there
  • Inlines validate_kv_sharing_target into .../attention.py and moves kv_transfer_utils to .../layers/attention/kv_transfer_utils.py; deletes vllm/attention and vllm/attention/utils/kv_sharing_utils.py
  • Updates imports across models, quantization, compilers, backends, workers, tests, and docs to new paths; adjusts CODEOWNERS and Buildkite test dependencies
  • Minor typing and API touch-ups (e.g., get_attention_context annotations, TYPE_CHECKING) to match new layout

Written by Cursor Bugbot for commit 3111dd0. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 8b56809. Configure here.


Note

Completes migration away from vllm/attention to vllm/model_executor.

  • Splits vllm/attention/layer.py into .../attention/attention.py (Attention, unified_attention) and .../attention/mla_attention.py (MLAAttention, unified_mla_attention); moves MLA custom-ops
  • Inlines validate_kv_sharing_target and moves kv_transfer_utils into .../layers/attention; deletes vllm/attention and vllm/attention/utils/kv_sharing_utils.py
  • Mass import path updates across models, quantization, compilers, backends, workers, tests, and docs; minor typing tweaks (TYPE_CHECKING, annotations)
  • CI/config updates: Buildkite test dependencies reference new paths; CODEOWNERS updated for .../layers/attention; mypy config stops listing the removed package

Written by Cursor Bugbot for commit 8b56809. This will update automatically on new commits. Configure here.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request completes the refactoring to eliminate the vllm/attention directory. The changes mostly involve moving files and splitting vllm/attention/layer.py. While the file moves are correct, a critical import path was missed during the refactoring, which will cause an ImportError. I've provided a fix for this.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni force-pushed the attention_restructure_5 branch from a873e8f to 1cb4ce3 Compare January 9, 2026 23:12
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link

mergify bot commented Jan 12, 2026

Documentation preview: https://vllm--32064.org.readthedocs.build/en/32064/

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Jan 12, 2026
@mergify mergify bot added the v1 label Jan 12, 2026
@mergify mergify bot added the kv-connector label Jan 12, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link

mergify bot commented Jan 15, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 15, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Jan 15, 2026
@mergify
Copy link

mergify bot commented Jan 19, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 19, 2026
@MatthewBonanni
Copy link
Collaborator Author

Holding off to let #25954 land first

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Jan 26, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@ProExpertProg ProExpertProg merged commit a608b4c into vllm-project:main Jan 27, 2026
150 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 27, 2026
@MatthewBonanni MatthewBonanni deleted the attention_restructure_5 branch January 27, 2026 15:10
VedantMadane pushed a commit to VedantMadane/vllm that referenced this pull request Jan 28, 2026
…ject#32064)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
vkuzo added a commit to vkuzo/vllm that referenced this pull request Jan 30, 2026
Summary:

vllm-project#32133 missed a rebase
on vllm-project#32064,
fixing the attention path import

Test Plan:

```bash
// before this PR, the test runner failed because the old attention
// import path no longer exists
pytest tests/quantization/test_fp8.py -s -x
```

Reviewers:

Subscribers:

Tasks:

Tags:

Signed-off-by: vasiliy <vasiliy@fb.com>
@vkuzo vkuzo mentioned this pull request Jan 30, 2026
5 tasks
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
…ject#32064)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs rocm Related to AMD ROCm speculative-decoding v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants