[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA#30309
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA#30309LucasWilkinson merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request addresses an accuracy issue in Decoded Context Parallelism (DCP) when used with FlashAttention MLA and a cp_kv_cache_interleave_size greater than 1. The fix correctly disables supports_dcp_with_varlen in this configuration, forcing requests into the prefill path and resolving the bug. The code change is correct and directly targets the issue. My main feedback is to enhance the test coverage to include the specific configuration that was manually tested and shown to be fixed, ensuring this scenario is covered by CI to prevent future regressions.
| CPTestSettings.detailed(dcp_multipliers=[1]), | ||
| CPTestSettings.detailed( | ||
| dcp_multipliers=[0.5, 1], cp_kv_cache_interleave_size=64 | ||
| dcp_multipliers=[0.5], |
There was a problem hiding this comment.
The PR description shows a manual test with dcp_size == tp_size (i.e., dcp_multiplier=1) and cp_kv_cache_interleave_size=64 to demonstrate the fix. However, this specific test case seems to be missing from the automated tests after the changes. To prevent future regressions, it would be beneficial to include this configuration in the test suite. You can achieve this by adding 1 to the dcp_multipliers list for the test case with cp_kv_cache_interleave_size=64.
| dcp_multipliers=[0.5], | |
| dcp_multipliers=[0.5, 1], |
…vllm-project#30309) Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
#25049 add MTP support for DCP with FA3, but this only works when cp_kv_cache_interleave_size=1.
For FA3 backend, we should check
cp_kv_cache_interleave_sizeand setsupports_dcp_with_varlenaccordingly.Test Plan
Test Result
cc @LucasWilkinson @minosfuture @pisceskkk
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.