[1/N][Attention] Restructure attention: move files#31916
[1/N][Attention] Restructure attention: move files#31916vllm-bot merged 12 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The pull request involves extensive changes and refactoring within the vllm project's attention mechanisms, specifically impacting model_executor/layers/attention and introducing a new v1/attention module. These changes appear to touch various attention implementations such as chunked_local_attention, cross_attention, encoder_only_attention, static_sink_attention, and different operational backends like flashmla, paged_attn, and Triton-based prefill and decode attentions. No specific code changes or review comments were provided to detail the nature or purpose of these modifications.
|
Documentation preview: https://vllm--31916.org.readthedocs.build/en/31916/ |
fbca883 to
f671805
Compare
ProExpertProg
left a comment
There was a problem hiding this comment.
Confirming there should be no changes other than imports and renames!
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Purpose
Implement step 1 of #31919. This PR consists solely of file renaming and movement, and the necessary updates to imports.
Note
vllm/v1/attentionis subject to more mypy scrutiny thanvllm/attentionsince fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround #31465vllm/attentiontovllm/v1/attention, new pre-commit issues arisevllm/v1/attention/backends/fa_utils.pytoEXCLUDESTest Plan
CI
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Bulk path migration with no intended functional changes.
vllm/attention/backends/abstract.pytovllm/v1/attention/backend.py, andbackends/registry.py+utils/fa_utils.pytovllm/v1/attention/backends/vllm/attention/opstovllm/v1/attention/opsand selector tovllm/v1/attention/selector.pyvllm/model_executor/layers/attention/v1pathsv1modulesv1attention pathsvllm/v1/attention/opsandbackends/fa_utils.pyv1attention locationsWritten by Cursor Bugbot for commit 0ed4b06. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit d1dee22. Configure here.
Note
Migrates attention code into the
vllm/v1namespace and updates references across the repo.vllm/attention/backends/abstract.py→vllm/v1/attention/backend.py;backends/registry.pyandutils/fa_utils.py→vllm/v1/attention/backends/vllm/attention/ops→vllm/v1/attention/ops, selector →vllm/v1/attention/selector.py, and attention layers →vllm/model_executor/layers/attention/v1pathsv1locations; tests updated accordinglyCODEOWNERSand Mergify label rules to point atv1attention pathsEXCLUDEforvllm/v1/attention/opsandbackends/fa_utils.pyv1attention modulesWritten by Cursor Bugbot for commit d1dee22. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit 7613b31. Configure here.
Note
Bulk path migration of attention code to the v1 namespace; no functional changes intended.
vllm/attention/backends/abstract.py→vllm/v1/attention/backend.py, registry andfa_utils.py→vllm/v1/attention/backends/vllm/attention/ops→vllm/v1/attention/ops, selector →vllm/v1/attention/selector.pyvllm/model_executor/layers/attention/v1pathsvllm/v1/attention/**, update tests to new modulesCODEOWNERSand Mergify label rules to point atv1attention pathsEXCLUDEforvllm/v1/attention/opsandbackends/fa_utils.pyv1attention locationsWritten by Cursor Bugbot for commit 7613b31. This will update automatically on new commits. Configure here.