Skip to content

feat(mla): add default do_kv_cache_update for MLA#33658

Closed
dw2761 wants to merge 13 commits intovllm-project:mainfrom
dw2761:feat/mla-kv-cache-update-v2
Closed

feat(mla): add default do_kv_cache_update for MLA#33658
dw2761 wants to merge 13 commits intovllm-project:mainfrom
dw2761:feat/mla-kv-cache-update-v2

Conversation

@dw2761
Copy link
Contributor

@dw2761 dw2761 commented Feb 3, 2026

Purpose

This PR is part of #32335

It extracts the MLA KV-cache update op from the MLA attention layer into a default MLAAttentionImpl.do_kv_cache_update implementation.

Test Plan

Run v1 latency benchmark with dummy weights on both main and this PR branch, explicitly selecting the MLA backend --attention-backend FLASH_ATTN_MLA

Test Result

Model: deepseek-ai/DeepSeek-V2-Lite-Chat | Backend: FLASH_ATTN_MLA | Weights: dummy | TP=1

Latency

Branch Avg P50 P90 P99
main 1.009902390 1.010018088 1.015274953 1.017686900
PR 1.010331564 1.010161117 1.014322852 1.018624747

Signed-off-by: Di Wu <dw2761@nyu.edu>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MLA KV-cache update logic by moving it into a default do_kv_cache_update method on MLAAttentionImpl. This is a good simplification of the MLAAttention layer. However, there is a critical issue where this new method is not implemented for SparseMLAAttentionImpl, which will cause a runtime error when using a sparse MLA backend. My review includes a comment detailing this issue and how to resolve it.

Signed-off-by: Di Wu <dw2761@nyu.edu>
Signed-off-by: Di Wu <dw2761@nyu.edu>
@pavanimajety pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 16, 2026
@ElizaWszola
Copy link
Contributor

What is the status of this PR? @dw2761 @ProExpertProg
I made a PR that I would like to rebase with it when it gets landed

@dw2761
Copy link
Contributor Author

dw2761 commented Feb 17, 2026

What is the status of this PR? @dw2761 @ProExpertProg I made a PR that I would like to rebase with it when it gets landed

I just updated the branch with the latest main and pushed a fix for a circular-import issue that was breaking CI. The checks are still running right now. If everything turns green, I’ll request review and hopefully get this merged ASAP.

@ElizaWszola
Copy link
Contributor

@dw2761 Great, thanks a lot!

attn_metadata = attn_metadata[self.layer_name]
self_kv_cache = self.kv_cache[forward_context.virtual_engine]

# Write the latent and rope to kv cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing from the indirect call path below. A new custom op needs to be created (like unified_kv_cache_update for GQA - see attention.py), and then that should be called before calling the MLA attention op

Signed-off-by: Di Wu <dw2761@nyu.edu>
@ElizaWszola
Copy link
Contributor

Thanks for the updates! I've noticed that this PR is now the same in scope as #34627, would it work for you if I added you as a co-author as I include some changes you did to the MLA backends? (I'm also fine with pushing changes from #34627 to #33658 and merging #33658)

@dw2761
Copy link
Contributor Author

dw2761 commented Feb 26, 2026

Thanks for the updates! I've noticed that this PR is now the same in scope as #34627, would it work for you if I added you as a co-author as I include some changes you did to the MLA backends? (I'm also fine with pushing changes from #34627 to #33658 and merging #33658)

Sure! I'd be happy to be added as a co author of #34627. I left a review comment in #34627. Please check it!

@ElizaWszola
Copy link
Contributor

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

@dw2761
Copy link
Contributor Author

dw2761 commented Feb 26, 2026

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

#34627 (review)

@dw2761
Copy link
Contributor Author

dw2761 commented Feb 26, 2026

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

Hi @ElizaWszola! I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

It should look like this:
Co-authored-by: dw2761 95495325+dw2761@users.noreply.github.com

And I'll close this PR/ Thanks!

@ElizaWszola
Copy link
Contributor

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I amended a couple commits listing you as author/co-author. To my knowledge, this should list you as a co-author of the PR when it's merged into main, but feel free to correct me if I'm wrong

@dw2761
Copy link
Contributor Author

dw2761 commented Feb 26, 2026

Sure! I'd be happy to be added as a co author of #34627.

Done!

I left a review comment in #34627. Please check it!

I can't see it, did you publish it?

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I checked the commit but it seems I'm not listed as a formal co-author yet. Could you please amend the commit message to include the standard trailer at the end?

I amended a couple commits listing you as author/co-author. To my knowledge, this should list you as a co-author of the PR when it's merged into main, but feel free to correct me if I'm wrong

Looks good! thx!

@ProExpertProg
Copy link
Collaborator

Closing in favor of #34627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants