[0.13.0][Feat] Integrate FIA operator in mla_cp._forward_decode by 845473182 · Pull Request #6046 · vllm-project/vllm-ascend

845473182 · 2026-01-20T09:07:01Z

What this PR does / why we need it?

Replace the npu_multi_head_latent_attention with FIA operator in mla_cp _forward_decode.
Adjust mla_attn_dpc_pcp in acl_graph.py.
pick-from: #5641

Does this PR introduce any user-facing change?

no

How was this patch tested?

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Bai Yongbin <845473182@qq.com>

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

Signed-off-by: tongyuzhou <t00886357@china.huawei.com>

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

gemini-code-assist

Code Review

This pull request integrates the npu_fused_infer_attention_score operator, which appears to be a new or optimized NPU kernel, into the attention mechanism. The changes are primarily focused on adapting the attention logic, especially for context parallel (CP) and multi-head latent attention (MLA) implementations, to utilize this new operator. This includes modifications to parameter preparation, operator calls, and output handling. The _update_out_and_lse method has been removed, with its functionality consolidated into _npu_attn_out_lse_update. Test files have been updated to reflect these changes by mocking the new NPU operators and adjusting expected input/output shapes and return values. The acl_graph.py file has also been updated to correctly handle the new parameters for graph capture and replay. The changes are consistent across the codebase and appear to be a necessary adaptation to the new NPU operator API.

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Add `synced_cudagraph_mode` to limit mixed graph modes in dp ranks (vllm-project#6011)

This reverts commit 5c1f197. Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: Revert "[0.13.0][cherry-pick][bugfix] fix bug of triton mrope" (vllm-project#6075)

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Doc] Supplement PD separation parameters of DeepSeek V3.1 (vllm-project#6054) [EPLB][Bugfix][v0.13.0] Incorporate the warm up of the EPLB into the profile run. (vllm-project#6099) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) (vllm-project#6016) [0.13.0][CI]fix for CI lint (vllm-project#6093) [0.13.0][cherry-pick][bugfix] fix the complex and potentially problematic generate_kv_idx. (vllm-project#5955)

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe (vllm-project#6081) [v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode (vllm-project#5931) [0.13.0][Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6074) [v0.13.0][CI] Upgrade to CANN 8.5.0 (vllm-project#6101)

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [EPLB] Config Rename wrapper (vllm-project#6111) [v0.13.0][Bugfix] Fix the input constraints checks for the mlapo and bmm_transpose operators (vllm-project#5764) (vllm-project#6088)

…d_decode (vllm-project#6046) ### What this PR does / why we need it? Replace the npu_multi_head_latent_attention with FIA operator in mla_cp _forward_decode. Adjust mla_attn_dpc_pcp in acl_graph.py. pick-from: vllm-project#5641 ### Does this PR introduce _any_ user-facing change? no --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: Bai Yongbin <845473182@qq.com> Signed-off-by: tongyuzhou <t00886357@china.huawei.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: tongyuzhou <t00886357@china.huawei.com>

白永斌 and others added 17 commits January 20, 2026 16:45

integrate FIA operator into mla_cp

3453ed6

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

make it more readable

b87e237

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

adapt acl_graph in mla_cp FIA

b27f1a0

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

adapt graph mode

8b7243e

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

support mtp

1b98655

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

remove redundant attributes

5c61ba7

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

remove data cleaning

fc08f5c

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

Update vllm_ascend/attention/context_parallel/mla_cp.py

1035e4c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Bai Yongbin <845473182@qq.com>

fix lint

f378a63

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix lint

bc01c37

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix lint

4cb19b8

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix ut

3624e99

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix lint

99e9cc0

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

[Ops] replace _update_out_and_lse with _npu_attn_out_lse_update

5c1f197

Signed-off-by: tongyuzhou <t00886357@china.huawei.com>

fix pre-commit

1e3486a

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

restore _process_attn_out_lse

362bdab

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

restore _process_attn_out_lse

22ad847

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

白永斌 added 9 commits January 20, 2026 19:58

Revert "[Ops] replace _update_out_and_lse with _npu_attn_out_lse_update"

3ef5323

This reverts commit 5c1f197. Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

restore mla_v1

56cfbdf

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

remove redundant code

265cdfb

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix mla_cp

0aa4f8f

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

fix lint

1938da6

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 22, 2026

dsxsteven mentioned this pull request Jan 22, 2026

[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 #6039

Merged

白永斌 added 2 commits January 22, 2026 14:13

wangxiyuan merged commit d2bf9ea into vllm-project:releases/v0.13.0 Jan 22, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.0][Feat] Integrate FIA operator in mla_cp._forward_decode#6046

[0.13.0][Feat] Integrate FIA operator in mla_cp._forward_decode#6046
wangxiyuan merged 28 commits intovllm-project:releases/v0.13.0from
845473182:FIA_v0.13.0

845473182 commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

845473182 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

845473182 commented Jan 20, 2026 •

edited

Loading