[Attention] Abstract the MLA prefill backends#32623
[Attention] Abstract the MLA prefill backends#32623MatthewBonanni wants to merge 33 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed abstraction for MLA prefill backends, which significantly simplifies mla_attention.py and improves modularity. The new selection mechanism via --attention-config.mla_prefill_backend is a great addition, and the backward compatibility for old flags is handled correctly. The refactoring moves backend-specific logic into separate, well-organized files, making the code cleaner and more maintainable. I've found one issue related to a hardcoded device that could affect non-CUDA platforms, which I've commented on. Overall, this is an excellent refactoring effort.
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
Documentation preview: https://vllm--32623.org.readthedocs.build/en/32623/ |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
Hi @MatthewBonanni, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
Hi @MatthewBonanni, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
Hi @MatthewBonanni, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Purpose
Abstracts the MLA prefill backends to simplify
mla_attention.pyand introduces a selection mechanism similar to that of the decode backends, via--attention-config.mla_prefill_backend. OldAttentionConfigarguments (use_cudnn_prefill,use_trtllm_ragged_deepseek_prefill, anddisable_flashinfer_prefill) are retained (with deprecation warnings) for backwards compatibility.Test Plan
(introduced by this PR) should pass in CI (part of
V1 attention (H100)andV1 attention (B200))Test Result
TBD
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.