feat(mla): add default do_kv_cache_update for MLA by dw2761 · Pull Request #33658 · vllm-project/vllm

dw2761 · 2026-02-03T06:53:08Z

Purpose

This PR is part of #32335

It extracts the MLA KV-cache update op from the MLA attention layer into a default MLAAttentionImpl.do_kv_cache_update implementation.

Test Plan

Run v1 latency benchmark with dummy weights on both main and this PR branch, explicitly selecting the MLA backend --attention-backend FLASH_ATTN_MLA

Test Result

Model: deepseek-ai/DeepSeek-V2-Lite-Chat | Backend: FLASH_ATTN_MLA | Weights: dummy | TP=1

Latency

Branch	Avg	P50	P90	P99
main	1.009902390	1.010018088	1.015274953	1.017686900
PR	1.010331564	1.010161117	1.014322852	1.018624747

Signed-off-by: Di Wu <dw2761@nyu.edu>

gemini-code-assist

Code Review

This pull request refactors the MLA KV-cache update logic by moving it into a default do_kv_cache_update method on MLAAttentionImpl. This is a good simplification of the MLAAttention layer. However, there is a critical issue where this new method is not implemented for SparseMLAAttentionImpl, which will cause a runtime error when using a sparse MLA backend. My review includes a comment detailing this issue and how to resolve it.

vllm/model_executor/layers/attention/mla_attention.py

Signed-off-by: Di Wu <dw2761@nyu.edu>

ElizaWszola · 2026-02-17T16:10:32Z

What is the status of this PR? @dw2761 @ProExpertProg
I made a PR that I would like to rebase with it when it gets landed

Signed-off-by: Di Wu <dw2761@nyu.edu>

…feat/mla-kv-cache-update-v2

dw2761 · 2026-02-17T20:52:33Z

What is the status of this PR? @dw2761 @ProExpertProg I made a PR that I would like to rebase with it when it gets landed

I just updated the branch with the latest main and pushed a fix for a circular-import issue that was breaking CI. The checks are still running right now. If everything turns green, I’ll request review and hopefully get this merged ASAP.

ElizaWszola · 2026-02-17T20:59:04Z

@dw2761 Great, thanks a lot!

vllm/model_executor/layers/attention/mla_attention.py

Signed-off-by: Di Wu <dw2761@nyu.edu>

…ate-v2

…feat/mla-kv-cache-update-v2

ProExpertProg · 2026-02-25T20:52:17Z

vllm/model_executor/layers/attention/mla_attention.py

                attn_metadata = attn_metadata[self.layer_name]
            self_kv_cache = self.kv_cache[forward_context.virtual_engine]

+            # Write the latent and rope to kv cache


This is missing from the indirect call path below. A new custom op needs to be created (like unified_kv_cache_update for GQA - see attention.py), and then that should be called before calling the MLA attention op

Signed-off-by: Di Wu <dw2761@nyu.edu>

ElizaWszola · 2026-02-26T06:48:44Z

Thanks for the updates! I've noticed that this PR is now the same in scope as #34627, would it work for you if I added you as a co-author as I include some changes you did to the MLA backends? (I'm also fine with pushing changes from #34627 to #33658 and merging #33658)

dw2761 · 2026-02-26T07:46:34Z

Thanks for the updates! I've noticed that this PR is now the same in scope as #34627, would it work for you if I added you as a co-author as I include some changes you did to the MLA backends? (I'm also fine with pushing changes from #34627 to #33658 and merging #33658)

Sure! I'd be happy to be added as a co author of #34627. I left a review comment in #34627. Please check it!

ElizaWszola · 2026-02-26T07:55:37Z

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

dw2761 · 2026-02-26T08:23:51Z

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

#34627 (review)

dw2761 · 2026-02-26T14:02:50Z

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

Hi @ElizaWszola! I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

It should look like this:
Co-authored-by: dw2761 95495325+dw2761@users.noreply.github.com

And I'll close this PR/ Thanks!

ElizaWszola · 2026-02-26T14:46:29Z

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I amended a couple commits listing you as author/co-author. To my knowledge, this should list you as a co-author of the PR when it's merged into main, but feel free to correct me if I'm wrong

dw2761 · 2026-02-26T15:31:38Z

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I amended a couple commits listing you as author/co-author. To my knowledge, this should list you as a co-author of the PR when it's merged into main, but feel free to correct me if I'm wrong

Looks good! thx!

ProExpertProg · 2026-02-26T20:47:21Z

Closing in favor of #34627

feat(mla): add default do_kv_cache_update for MLA

78b9dd7

Signed-off-by: Di Wu <dw2761@nyu.edu>

dw2761 requested review from LucasWilkinson, WoosukKwon, alexm-redhat, njhill, youkaichao and zhuohan123 as code owners February 3, 2026 06:53

mergify bot added the v1 label Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

vllm/model_executor/layers/attention/mla_attention.py Outdated Show resolved Hide resolved

dw2761 added 2 commits February 3, 2026 15:36

fix(mla): add default do_kv_cache_update for sparse impl

70ab48d

Signed-off-by: Di Wu <dw2761@nyu.edu>

chore(mla): deleted the redundant slot_mapping flatten

1be1e43

Signed-off-by: Di Wu <dw2761@nyu.edu>

dw2761 mentioned this pull request Feb 3, 2026

[Feature]: Extract KV-Cache update from all attention backends #32335

Open

18 tasks

ElizaWszola mentioned this pull request Feb 16, 2026

[Performance] Extract kv update ops from MLA attention backends #34627

Merged

pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 16, 2026

dw2761 and others added 3 commits February 18, 2026 03:37

Merge branch 'main' into feat/mla-kv-cache-update-v2

ae0f049

fix: resolved circular import problem in backend

5b6616a

Signed-off-by: Di Wu <dw2761@nyu.edu>

Merge remote-tracking branch 'fork/feat/mla-kv-cache-update-v2' into …

e77332a

…feat/mla-kv-cache-update-v2

dw2761 added 2 commits February 18, 2026 09:25

Merge branch 'main' into feat/mla-kv-cache-update-v2

4aa5d2e

Merge branch 'main' into feat/mla-kv-cache-update-v2

d0be8aa

ProExpertProg reviewed Feb 20, 2026

View reviewed changes

vllm/model_executor/layers/attention/mla_attention.py Outdated Show resolved Hide resolved

dw2761 added 3 commits February 24, 2026 11:56

refactor(mla): move do_kv_cache_update from forward_impl to forward

0c6dd81

Signed-off-by: Di Wu <dw2761@nyu.edu>

Merge remote-tracking branch 'origin/main' into feat/mla-kv-cache-upd…

8b17e02

…ate-v2

Merge remote-tracking branch 'fork/feat/mla-kv-cache-update-v2' into …

6b2d6b2

…feat/mla-kv-cache-update-v2

dw2761 requested a review from ProExpertProg February 24, 2026 06:41

Rohan138 mentioned this pull request Feb 24, 2026

[ROCm][WIP]: Fused aiter rope kvcache mla #35245

Draft

5 tasks

ProExpertProg reviewed Feb 25, 2026

View reviewed changes

fix: add kv cache update for indirect call path

4203f63

Signed-off-by: Di Wu <dw2761@nyu.edu>

dw2761 requested review from MatthewBonanni, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 26, 2026 04:03

Merge branch 'main' into feat/mla-kv-cache-update-v2

75ab0c0

ProExpertProg closed this Feb 26, 2026

Uh oh!

Conversation

dw2761 commented Feb 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ElizaWszola commented Feb 17, 2026

Uh oh!

dw2761 commented Feb 17, 2026

Uh oh!

ElizaWszola commented Feb 17, 2026

Uh oh!

Uh oh!

ProExpertProg Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ElizaWszola commented Feb 26, 2026

Uh oh!

dw2761 commented Feb 26, 2026

Uh oh!

ElizaWszola commented Feb 26, 2026

Uh oh!

dw2761 commented Feb 26, 2026

Uh oh!

dw2761 commented Feb 26, 2026

Uh oh!

ElizaWszola commented Feb 26, 2026

Uh oh!

dw2761 commented Feb 26, 2026

Uh oh!

ProExpertProg commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dw2761 commented Feb 3, 2026 •

edited by github-actions bot

Loading