[Attention] Abstract the MLA prefill backends by MatthewBonanni · Pull Request #32623 · vllm-project/vllm

MatthewBonanni · 2026-01-19T22:20:32Z

Purpose

Abstracts the MLA prefill backends to simplify mla_attention.py and introduces a selection mechanism similar to that of the decode backends, via --attention-config.mla_prefill_backend. Old AttentionConfig arguments (use_cudnn_prefill, use_trtllm_ragged_deepseek_prefill, and disable_flashinfer_prefill) are retained (with deprecation warnings) for backwards compatibility.

Test Plan

tests/v1/attention/test_mla_prefill_selector.py

(introduced by this PR) should pass in CI (part of V1 attention (H100) and V1 attention (B200))

Test Result

TBD

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a well-designed abstraction for MLA prefill backends, which significantly simplifies mla_attention.py and improves modularity. The new selection mechanism via --attention-config.mla_prefill_backend is a great addition, and the backward compatibility for old flags is handled correctly. The refactoring moves backend-specific logic into separate, well-organized files, making the code cleaner and more maintainable. I've found one issue related to a hardcoded device that could affect non-CUDA platforms, which I've commented on. Overall, this is an excellent refactoring effort.

vllm/model_executor/layers/attention/mla_attention.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-20T00:01:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-29T22:34:16Z

Documentation preview: https://vllm--32623.org.readthedocs.build/en/32623/

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mgoin

Looks really clean to me, nice work! I'll defer to Lucas and would like someone from AMD to review as well cc @tjtanaa

vllm/v1/attention/backends/mla/prefill/selector.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-01-31T04:26:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-02-02T16:43:34Z

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-02-05T12:28:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-03-16T21:07:17Z

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-03-16T21:13:47Z

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

MatthewBonanni added 4 commits January 19, 2026 13:54

Change default from CUTLASS MLA to FlashInfer MLA

e4228dc

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Change log lines from debug to info

30cd686

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge remote-tracking branch 'upstream/main'

0b4c746

First pass

2cfba3b

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added nvidia v1 labels Jan 19, 2026

github-project-automation bot added this to NVIDIA Jan 19, 2026

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

vllm/model_executor/layers/attention/mla_attention.py Show resolved Hide resolved

MatthewBonanni added 2 commits January 19, 2026 17:28

Fix typo

5417186

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix pre-commit

c427ff8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the rocm Related to AMD ROCm label Jan 19, 2026

mergify bot added the needs-rebase label Jan 20, 2026

Merge branch 'main' into mla_prefill_abstraction

b80db7e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added documentation Improvements or additions to documentation and removed needs-rebase labels Jan 29, 2026

github-project-automation bot moved this to Todo in AMD Jan 29, 2026

github-project-automation bot added this to AMD Jan 29, 2026

MatthewBonanni added 2 commits January 29, 2026 17:41

Use device_config

370d66d

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Remove dead code

5151bd9

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni marked this pull request as ready for review January 30, 2026 14:59

MatthewBonanni requested review from WoosukKwon, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tjtanaa, tlrmchlsmth and youkaichao as code owners January 30, 2026 14:59

mgoin reviewed Jan 30, 2026

View reviewed changes

vllm/v1/attention/backends/mla/prefill/selector.py Outdated Show resolved Hide resolved

MatthewBonanni and others added 4 commits January 30, 2026 11:29

Update table

a962bee

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Comment

25d976f

Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Add model dtype support

f30cb13

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Add type annotation to fix docs build

eb27ec8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the needs-rebase label Jan 31, 2026

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 1, 2026

Merge branch 'main' into mla_prefill_abstraction

17c9a9e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot removed the needs-rebase label Feb 2, 2026

MatthewBonanni added 6 commits February 2, 2026 12:32

Fix rebase issue

0f42a95

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Introduce hashable config

97c9aa7

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix test

fb7bead

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Pass device capability directly

3b25d20

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix hashing

0086b94

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix tests

5051737

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the needs-rebase label Feb 5, 2026

Merge branch 'main' into mla_prefill_abstraction

115a75b

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot removed the needs-rebase label Mar 16, 2026

Fix pre-commit

88d0be8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Clean up FA import

0cd22de

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni mentioned this pull request Mar 30, 2026

[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM #38562

Merged

5 tasks

MatthewBonanni and others added 4 commits April 2, 2026 15:45

Merge branch 'main' into mla_prefill_abstraction

7bf5ae8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

wip

d4c77ab

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

2f0b2ad

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

1d04d07

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention] Abstract the MLA prefill backends#32623

[Attention] Abstract the MLA prefill backends#32623
MatthewBonanni wants to merge 33 commits intovllm-project:mainfrom
MatthewBonanni:mla_prefill_abstraction

MatthewBonanni commented Jan 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

mergify bot commented Jan 29, 2026

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

mergify bot commented Jan 31, 2026

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

mergify bot commented Feb 5, 2026

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MatthewBonanni commented Jan 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

mergify bot commented Jan 29, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Jan 31, 2026

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

mergify bot commented Feb 5, 2026

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatthewBonanni commented Jan 19, 2026 •

edited by github-actions bot

Loading