feat(mla): extract KV-cache update by dw2761 · Pull Request #33250 · vllm-project/vllm

dw2761 · 2026-01-28T12:27:18Z

Purpose

This PR is a small refactor towards #32335.

MLACommonImpl (vllm/model_executor/layers/attention/mla_attention.py)
Added forward_includes_kv_cache_update = True on MLACommonImpl to preserve current behavior while enabling gradual migration.
Implemented _update_kv_cache() method that:
Calls ops.concat_and_cache_mla(...) to write MLA KV-cache (latent + RoPE-related components) using attn_metadata.slot_mapping
Updated MLACommonImpl.forward() to call KV-cache update via:
if self.forward_includes_kv_cache_update: self._update_kv_cache(...)

Test Plan

Run vLLM latency benchmark on the same GPU with dummy weights, explicitly selecting the MLA backend:
--attention-backend FLASH_ATTN_MLA
Compare main vs this PR using identical settings.

Test Result

Latency (dummy weights)

Model: deepseek-ai/DeepSeek-V2-Lite-Chat

Branch	Avg (s)	P50 (s)
main	0.5918258496	0.5916943643
PR	0.5927979626	0.5925584566

github-actions · 2026-01-28T12:27:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-01-28T12:31:30Z

Hi @dw2761, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

gemini-code-assist

Code Review

This pull request introduces a clean refactoring in the MLACommonImpl class by extracting the KV-cache update logic into a new _update_kv_cache method. The execution of this new method within the forward pass is controlled by a new boolean class attribute, forward_includes_kv_cache_update, which is set to True by default to maintain existing behavior. This change improves code modularity and prepares for future enhancements as described. The implementation is correct and the logic is preserved. The provided benchmarks confirm that there is no performance regression.

Signed-off-by: Di Wu <dw2761@nyu.edu>

mergify · 2026-01-31T15:01:24Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dw2761.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Di Wu <95495325+dw2761@users.noreply.github.com>

dw2761 · 2026-02-03T02:53:08Z

Rebased on top of #33284. This PR no longer has meaningful diff vs main, so I’m closing it as superseded.

dw2761 requested a review from LucasWilkinson as a code owner January 28, 2026 12:27

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

Kare0638 mentioned this pull request Jan 28, 2026

[Feature]: Extract KV-Cache update from all attention backends #32335

Open

18 tasks

dw2761 force-pushed the fix/mla-kv-cache-update branch from b24e4ee to 9368dd7 Compare January 29, 2026 03:59

dw2761 added 2 commits January 29, 2026 12:03

feat(mla): extract KV-cache update

40b6807

Signed-off-by: Di Wu <dw2761@nyu.edu>

chore: apply pre-commit formatting

ca2c4e7

Signed-off-by: Di Wu <dw2761@nyu.edu>

dw2761 force-pushed the fix/mla-kv-cache-update branch from 9368dd7 to ca2c4e7 Compare January 29, 2026 04:04

mergify bot added the needs-rebase label Jan 31, 2026

Merge branch 'main' into fix/mla-kv-cache-update

3a1aeec

Signed-off-by: Di Wu <95495325+dw2761@users.noreply.github.com>

dw2761 closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mla): extract KV-cache update#33250

feat(mla): extract KV-cache update#33250
dw2761 wants to merge 3 commits intovllm-project:mainfrom
dw2761:fix/mla-kv-cache-update

dw2761 commented Jan 28, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Jan 31, 2026

Uh oh!

dw2761 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dw2761 commented Jan 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Latency (dummy weights)

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Jan 31, 2026

Uh oh!

dw2761 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dw2761 commented Jan 28, 2026 •

edited by github-actions bot

Loading