[DSK] Implement mla use matrix-absorption #9875

yuanlehome · 2025-02-16T17:29:04Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Performance optimization

PR changes

Others

Description

paddle-bot · 2025-02-16T17:29:08Z

Thanks for your contribution!

codecov · 2025-02-16T18:03:15Z

Codecov Report

Attention: Patch coverage is 0% with 401 lines in your changes missing coverage. Please review.

Project coverage is 51.23%. Comparing base (235c24e) to head (03d6a0b).
Report is 23 commits behind head on develop.

❗ Current head 03d6a0b differs from pull request most recent head 2fb3378

Please upload reports for the commit 2fb3378 to get more accurate results.

Files with missing lines	Patch %	Lines
...erimental/transformers/fused_transformer_layers.py	0.00%	276 Missing ⚠️
.../experimental/transformers/deepseek_v2/modeling.py	0.00%	78 Missing ⚠️
paddlenlp/experimental/transformers/proposers.py	0.00%	9 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py	0.00%	6 Missing ⚠️
...dlenlp/experimental/transformers/bloom/modeling.py	0.00%	5 Missing ⚠️
...p/experimental/transformers/chatglm_v2/modeling.py	0.00%	5 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	5 Missing ⚠️
...enlp/experimental/transformers/mixtral/modeling.py	0.00%	5 Missing ⚠️
...dlenlp/experimental/transformers/qwen2/modeling.py	0.00%	5 Missing ⚠️
...lp/experimental/transformers/qwen2_moe/modeling.py	0.00%	5 Missing ⚠️
... and 1 more

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9875      +/-   ##
===========================================
- Coverage    51.66%   51.23%   -0.43%     
===========================================
  Files          739      745       +6     
  Lines       117426   118834    +1408     
===========================================
+ Hits         60668    60886     +218     
- Misses       56758    57948    +1190

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Deepseek mla

fix c8/c4 dtype in mla

qingqing01 · 2025-02-17T06:01:32Z

csrc/gpu/append_attn/decode_attention_kernel.cu

+//   float in_scale,
+//   bool causal,
+//   cudaStream_t &stream,
+//   paddle::Tensor *out);


这个文件里除了 MLA decode attention之外，还是支持 GQA/MHA 吗？

fix write cache

…eNLP into deepseek-v3-mla

fix mla_atn

fix mla precision

fix MLA && trick avoid append_dec

CLAassistant · 2025-02-20T12:50:13Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ lizhenyun01
✅ yuanlehome
❌ lizhenyu04

lizhenyu04 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

optimize mla

set kv_cache's bsz=1

yuanlehome force-pushed the deepseek-v3-mla branch from 818cd29 to 5b45565 Compare February 17, 2025 07:31

mla python part

7ec7f02

yuanlehome force-pushed the deepseek-v3-mla branch from 5b45565 to 7ec7f02 Compare February 17, 2025 07:36

yuanlehome marked this pull request as draft February 17, 2025 07:38

lizhenyun01 and others added 5 commits February 17, 2025 16:03

absorb mla optimizer

1e05b16

Merge pull request #13 from lizhenyun01/deepseek_mla

3e1fbe5

Deepseek mla

fix c8/c4 dtype in mla

22e4e3d

Merge pull request #14 from lizhenyun01/deepseek_mla

90a158a

fix c8/c4 dtype in mla

add weight_only part 1

1bd78d0

qingqing01 reviewed Feb 17, 2025

View reviewed changes

yuanlehome and others added 18 commits February 17, 2025 20:59

dy can run

0aab4a5

static can run

1bb8ef8

nothing

2d68bea

refine network

ebf1a76

fix write cache

871f2e6

Merge pull request #15 from lizhenyun01/deepseek_mla

8a1e982

fix write cache

update

549f709

Merge branch 'deepseek-v3-mla' of https://github.com/yuanlehome/Paddl…

1165609

…eNLP into deepseek-v3-mla

add pd_throw

7dc8c55

add pd_throw

bbd0051

fix mla_atn

eabd751

Merge branch 'deepseek-v3-mla' into deepseek_mla

820bb38

Merge pull request #16 from lizhenyun01/deepseek_mla

71e4a7d

fix mla_atn

fix mla

d0f40f6

Merge pull request #17 from lizhenyun01/deepseek_mla

657d67d

fix mla precision

update network

58a020b

weight only support group wise

d060c98

fix MLA

a11bb32

Merge pull request #18 from lizhenyun01/deepseek-v3-mla

1ecbb23

fix MLA && trick avoid append_dec

yuanlehome and others added 14 commits February 20, 2025 22:36

update split kv_b

2a96da0

fix

5e44eeb

refine if

4ed7180

half support new absorb

184765e

weight only support new absorb

9e2ea0e

fix

4d90d61

fix bf16

4f1d25c

optimize mla

8673095

Merge pull request #19 from lizhenyun01/deepseek-v3-mla

680ed55

optimize mla

set kv_cache's bsz=1

5fcaf18

Merge pull request #20 from lizhenyun01/deepseek-v3-mla

b425f74

set kv_cache's bsz=1

delete max_batch_size

d824c2a

refine if

3102788

not_need_stop to cpu

2fb3378

yuanlehome marked this pull request as ready for review February 24, 2025 14:11

yuanlehome force-pushed the deepseek-v3-mla branch from 03d6a0b to 2fb3378 Compare February 24, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DSK] Implement mla use matrix-absorption #9875

[DSK] Implement mla use matrix-absorption #9875

yuanlehome commented Feb 16, 2025

paddle-bot bot commented Feb 16, 2025

codecov bot commented Feb 16, 2025 •

edited

Loading

qingqing01 Feb 17, 2025

CLAassistant commented Feb 20, 2025 •

edited

Loading

[DSK] Implement mla use matrix-absorption #9875

Are you sure you want to change the base?

[DSK] Implement mla use matrix-absorption #9875

Conversation

yuanlehome commented Feb 16, 2025

Before submitting

PR types

PR changes

Description

paddle-bot bot commented Feb 16, 2025

codecov bot commented Feb 16, 2025 • edited Loading

Codecov Report

qingqing01 Feb 17, 2025

Choose a reason for hiding this comment

CLAassistant commented Feb 20, 2025 • edited Loading

codecov bot commented Feb 16, 2025 •

edited

Loading

CLAassistant commented Feb 20, 2025 •

edited

Loading