Skip to content

feat(mla): extract KV-cache update#33250

Closed
dw2761 wants to merge 3 commits intovllm-project:mainfrom
dw2761:fix/mla-kv-cache-update
Closed

feat(mla): extract KV-cache update#33250
dw2761 wants to merge 3 commits intovllm-project:mainfrom
dw2761:fix/mla-kv-cache-update

Conversation

@dw2761
Copy link
Copy Markdown
Contributor

@dw2761 dw2761 commented Jan 28, 2026

Purpose

This PR is a small refactor towards #32335.

MLACommonImpl (vllm/model_executor/layers/attention/mla_attention.py)
Added forward_includes_kv_cache_update = True on MLACommonImpl to preserve current behavior while enabling gradual migration.
Implemented _update_kv_cache() method that:
Calls ops.concat_and_cache_mla(...) to write MLA KV-cache (latent + RoPE-related components) using attn_metadata.slot_mapping
Updated MLACommonImpl.forward() to call KV-cache update via:
if self.forward_includes_kv_cache_update: self._update_kv_cache(...)

Test Plan

  • Run vLLM latency benchmark on the same GPU with dummy weights, explicitly selecting the MLA backend:
    --attention-backend FLASH_ATTN_MLA
  • Compare main vs this PR using identical settings.

Test Result

Latency (dummy weights)

Model: deepseek-ai/DeepSeek-V2-Lite-Chat

Branch Avg (s) P50 (s)
main 0.5918258496 0.5916943643
PR 0.5927979626 0.5925584566

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 28, 2026

Hi @dw2761, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a clean refactoring in the MLACommonImpl class by extracting the KV-cache update logic into a new _update_kv_cache method. The execution of this new method within the forward pass is controlled by a new boolean class attribute, forward_includes_kv_cache_update, which is set to True by default to maintain existing behavior. This change improves code modularity and prepares for future enhancements as described. The implementation is correct and the logic is preserved. The provided benchmarks confirm that there is no performance regression.

Signed-off-by: Di Wu <dw2761@nyu.edu>
Signed-off-by: Di Wu <dw2761@nyu.edu>
@dw2761 dw2761 force-pushed the fix/mla-kv-cache-update branch from 9368dd7 to ca2c4e7 Compare January 29, 2026 04:04
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dw2761.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 31, 2026
Signed-off-by: Di Wu <95495325+dw2761@users.noreply.github.com>
@dw2761
Copy link
Copy Markdown
Contributor Author

dw2761 commented Feb 3, 2026

Rebased on top of #33284. This PR no longer has meaningful diff vs main, so I’m closing it as superseded.

@dw2761 dw2761 closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant