[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA by FENP · Pull Request #30309 · vllm-project/vllm

FENP · 2025-12-09T05:47:11Z

Purpose

#25049 add MTP support for DCP with FA3, but this only works when cp_kv_cache_interleave_size=1.

For FA3 backend, we should check cp_kv_cache_interleave_size and set supports_dcp_with_varlen accordingly.

Test Plan

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat/ --gpu-memory-utilization 0.9 --tensor-parallel-size 4 --decode-context-parallel-size 4 --cp-kv-cache-interleave-size 64

python ./tests/evals/gsm8k/gsm8k_eval.py

Test Result

Main

Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:40<00:00, 32.62it/s]

Results:
Accuracy: 0.028
Invalid responses: 0.007
Total latency: 40.447 s
Questions per second: 32.611
Total output tokens: 189017
Output tokens per second: 4673.216

This PR

Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:45<00:00, 28.81it/s]

Results:
Accuracy: 0.639
Invalid responses: 0.002
Total latency: 45.794 s
Questions per second: 28.803
Total output tokens: 161057
Output tokens per second: 3516.982

cc @LucasWilkinson @minosfuture @pisceskkk

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>

chatgpt-codex-connector · 2025-12-09T05:47:20Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request addresses an accuracy issue in Decoded Context Parallelism (DCP) when used with FlashAttention MLA and a cp_kv_cache_interleave_size greater than 1. The fix correctly disables supports_dcp_with_varlen in this configuration, forcing requests into the prefill path and resolving the bug. The code change is correct and directly targets the issue. My main feedback is to enhance the test coverage to include the specific configuration that was manually tested and shown to be fixed, ensuring this scenario is covered by CI to prevent future regressions.

gemini-code-assist · 2025-12-09T05:48:24Z

tests/distributed/test_context_parallel.py

+        CPTestSettings.detailed(dcp_multipliers=[1]),
        CPTestSettings.detailed(
-            dcp_multipliers=[0.5, 1], cp_kv_cache_interleave_size=64
+            dcp_multipliers=[0.5],


The PR description shows a manual test with dcp_size == tp_size (i.e., dcp_multiplier=1) and cp_kv_cache_interleave_size=64 to demonstrate the fix. However, this specific test case seems to be missing from the automated tests after the changes. To prevent future regressions, it would be beneficial to include this configuration in the test suite. You can achieve this by adding 1 to the dcp_multipliers list for the test case with cp_kv_cache_interleave_size=64.

Suggested change

dcp_multipliers=[0.5],

dcp_multipliers=[0.5, 1],

LucasWilkinson

LGTM

…vllm-project#30309) Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Fix accuracy issue of DCP when using FLASH_ATTN_MLA

b947ca9

Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>

FENP requested a review from pavanimajety as a code owner December 9, 2025 05:47

FENP changed the title ~~[CI][Bugfix] Fix accuracy issue of DCP when using FLASH_ATTN_MLA~~ [DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA Dec 9, 2025

mergify bot added the v1 label Dec 9, 2025

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

LucasWilkinson approved these changes Dec 9, 2025

View reviewed changes

LucasWilkinson enabled auto-merge (squash) December 9, 2025 06:05

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025

LucasWilkinson merged commit 67475a6 into vllm-project:main Dec 9, 2025
58 of 59 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA#30309

[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA#30309
LucasWilkinson merged 1 commit intovllm-project:mainfrom
FENP:dcp-interleave-fix

FENP commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

FENP commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FENP commented Dec 9, 2025 •

edited by github-actions bot

Loading