Skip to content

[Test] Add FP8 KV Cache Testing for MLA Backends#34473

Merged
LucasWilkinson merged 5 commits intovllm-project:mainfrom
wzhao18:wzhao/flashinfer-mla-fp8-tests
Feb 20, 2026
Merged

[Test] Add FP8 KV Cache Testing for MLA Backends#34473
LucasWilkinson merged 5 commits intovllm-project:mainfrom
wzhao18:wzhao/flashinfer-mla-fp8-tests

Conversation

@wzhao18
Copy link
Contributor

@wzhao18 wzhao18 commented Feb 12, 2026

Purpose

This PR improves the MLA backend test coverage to include fp8 kv cache testing.

Test Plan

Tested tests/v1/attention/

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Feb 12, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the MLA backend test coverage by adding support for FP8 KV cache testing. The changes correctly parameterize the tests for different kv_cache_dtype values and generalize the test logic to handle various FP8 formats. My review found one critical issue where a safety check was removed, which could lead to a runtime error. I've provided a suggestion to fix it.

@wzhao18 wzhao18 force-pushed the wzhao/flashinfer-mla-fp8-tests branch from fc4170b to aafb941 Compare February 13, 2026 04:14
@wzhao18
Copy link
Contributor Author

wzhao18 commented Feb 13, 2026

@pavanimajety Could you help review this PR?

Copy link
Collaborator

@MatthewBonanni MatthewBonanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution!

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci/build labels Feb 17, 2026
Copy link
Collaborator

@pavanimajety pavanimajety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @wzhao18, LGTM!

For a future self note: We need to add tests for chunked prefill when it uses FP8 KV Cache + MHA kernels

@wzhao18
Copy link
Contributor Author

wzhao18 commented Feb 17, 2026

@pavanimajety Please feel free to pin me and Xin for assistance on improving the test coverage.

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 18, 2026
@mergify mergify bot added the cpu Related to CPU backends label Feb 18, 2026
@mergify mergify bot added structured-output tpu Related to Google TPUs labels Feb 18, 2026
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Feb 18, 2026
@wzhao18 wzhao18 marked this pull request as draft February 18, 2026 17:40
@mergify mergify bot added the kv-connector label Feb 18, 2026
@mergify
Copy link

mergify bot commented Feb 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wzhao18.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 18, 2026
@wzhao18 wzhao18 force-pushed the wzhao/flashinfer-mla-fp8-tests branch from 29aecb2 to e456a25 Compare February 18, 2026 17:46
@mergify mergify bot removed the tpu Related to Google TPUs label Feb 18, 2026
@wzhao18 wzhao18 force-pushed the wzhao/flashinfer-mla-fp8-tests branch from e456a25 to 50c031d Compare February 18, 2026 17:46
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
@wzhao18 wzhao18 force-pushed the wzhao/flashinfer-mla-fp8-tests branch from 50c031d to 1622bad Compare February 18, 2026 17:47
@wzhao18 wzhao18 marked this pull request as ready for review February 18, 2026 17:48
@mergify mergify bot removed the needs-rebase label Feb 18, 2026
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@wzhao18
Copy link
Contributor Author

wzhao18 commented Feb 18, 2026

@mgoin Relevant CI tests are passing. I believe the failing ones also fail on main.

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the issue could be fixed by #34913

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) February 19, 2026 21:51
@LucasWilkinson LucasWilkinson merged commit f24b2de into vllm-project:main Feb 20, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm structured-output v1

Projects

Status: Done
Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants