Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复benchmark多机任务异常退出的处理 #9651

Merged
merged 3 commits into from
Feb 6, 2025

Conversation

XieYunshen
Copy link
Contributor

PR types

PR changes

Description

Copy link

paddle-bot bot commented Dec 17, 2024

Thanks for your contribution!

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.21%. Comparing base (da7a7d2) to head (3940dc7).
Report is 94 commits behind head on develop.

❌ Your project status has failed because the head coverage (52.21%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9651      +/-   ##
===========================================
- Coverage    52.81%   52.21%   -0.61%     
===========================================
  Files          710      723      +13     
  Lines       111238   114330    +3092     
===========================================
+ Hits         58749    59695     +946     
- Misses       52489    54635    +2146     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Liujie0926 Liujie0926 merged commit fa98268 into PaddlePaddle:develop Feb 6, 2025
10 of 12 checks passed
ckl117 pushed a commit to ckl117/PaddleNLP that referenced this pull request Feb 17, 2025
update 0113

support head_dim=192,256 for append_attn c16

attention run

refine code

add softmax_scale

support weight_only_int8

refine code

support tp

delete test_append_attn

add splited fused_moe from ziyuan

add deepseek-v3 class

fix repe for deepseek-v3

fix wint8 precision and refine code

fix wint4, big diff

add e_score_correction_bias

fix head_dim

fix v3 verify

[AutoParallel] open tensor_fusion for benchmark (PaddlePaddle#9749)

* open tensor_fusion for benchmark

fix loraga merge (PaddlePaddle#9765)

* fix loraga merge

* change sign

Fix ernie ci auto trainer error (PaddlePaddle#9758)

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* [AutoParallel]:fix ernine auto_trainer error

* Update run_pretrain_auto.py

Update README.md (PaddlePaddle#9766)

* Update README.md

[BugFix] Fix matryoshka norm loss (PaddlePaddle#9774)

* fix matryoshka norm

[Distributed] support fuse optimizer (PaddlePaddle#9519) (PaddlePaddle#9777)

Update register_sequence_parallel_allreduce_hooks (PaddlePaddle#9782)

* fix sequence parallel

* update register_sequence_parallel_allreduce_hooks

* update fuse_sequence_parallel_allreduce

Fix ce error (PaddlePaddle#9783)

* [AutoParallel]:fix ci error

* [AutoParallel]:fix ci error

fix (PaddlePaddle#9779)

[MoE] fix expert parallel (PaddlePaddle#9760)

* fix moe uc

fix dpo pp criterion (PaddlePaddle#9786)

[Infer] Add pir_model path for server infer. (PaddlePaddle#9790)

fix d2s

fix v3 verify

support qk_head_dim != v_head_dim

support fp8 batch gemm on cutlass3.x

upgrade cutlass version for block_wise fp8 gemm

change cutlass commit to ckl117 group_wise branch

support fp8 block gemm, but private cutlass commit, and TODO: update fp8 dual gemm api on cutlass3.x

support auto tune fp8 block gemm code

update cutlass to v3.7.0, todo: support block gemm based on v3.7.0

support block gemm on cutlass v3.7.0 commit

code check

code check

check dynamic_quant

ad block builder dir

rename group_quant

fix wint8 v_head_dim

fix rope

fix qwen2

mla use position_ids only

remove control flow

remove gpu concat

fix norm weight dtype

remove all_reduce in fused_moe

part support fp8

check group_quant and fake fp8

check

support block gemm

[LLM] support flash device on static model (PaddlePaddle#9619) (PaddlePaddle#9787)

* [LLM] support flash device on static model

* [LLM] adapt pdc sdk

[LLM Benchmark]update scripts (PaddlePaddle#9722)

* add no_proxy & del paddlenlp_ops

* update timeout for dpo

* fix sequence_parallel

* add timeout

* add Total_Tokens_per_second_per_gpu

* fix Tokens_per_second_per_gpu

* update Total_Tokens_per_second_per_gpu

mergekit gpu 1226 (PaddlePaddle#9702)

* mergekit gpu 1226

* merge model gpu

* merge gpu

* add lora model

* change valueerror

* add lora

* gpu test

[LLM] merge code from fastdeploy (PaddlePaddle#9791)

* [LLM] update llm server dockerfiles

* merge code from fastdeploy

[Inference] Support eagle for llama (PaddlePaddle#9812)

[CI] Fix ci of small models (PaddlePaddle#9633)

[Trainer] Wrap model when lora is ON and only do evaluation. (PaddlePaddle#9803)

[README] Update README.md for documention (PaddlePaddle#9785)

* Update README.md

* Update README.md

* Update README_en.md

fix static run

wint8 and fake-fp8, todo: support data type does not match

support fp8, but ffn1 and moe in wint8

support ffn1 fp8 block gemm

done ffn1 fp8 block gemm

block gemm done

block gemm support batch

refine rope code

compute position_ids use custom op

fix split_param (PaddlePaddle#9817)

[LLM] Update model convert and fix TP for deepseekv3 (PaddlePaddle#9797)

* fix model convert and tp in MoEMLP

* fix tp_action filter

* update convert accoding to num_nextn_predict_layers

* add deepseek-R1

fuse rope

fix macro

fix mixtral

set_state_dict block_wise weight

support fp8 per tensor network, no support scale Tensor for tensor gemm

deepseek-v3 fp8 tensor gemm network, but precision fault

add triton fp8 fused_moe kernel

fix moe triton kernel

add moe triton kernel

fix

fix fp8 block gemm precision

moe triton fp8 network

support moe triton and precision correct, but shared ffn1 ffn2 incorrect

fp8 block network, no check shared ffn1-ffn2 in v2-lite

delete wint8 in fake

delete some useless code and verify per tensor net with in qkv outlinear ffn1 ffn2, but triton moe don't match api

fp8 block quant when load model, and code check

fix tokenizer and qwen

[AutoParallel] add sharding tensor_fusion save load switch (PaddlePaddle#9810)

* support tensor_fusion save load

* apply suggestions from code review

修复benchmark多机任务异常退出的处理 (PaddlePaddle#9651)

* 修复benchmark多机任务异常退出的处理

* fix bug

* update

Fix LLAMA arg parsing bug in pp (PaddlePaddle#9806)

[Readme] Update mixtral.md (PaddlePaddle#9829)

[XPU] Support empty_cache on XPUs (PaddlePaddle#9789)

* [XPU] Support empty_cache on XPUs

* warn if current device doesn't support

[Inference] Fix multibatch inference (PaddlePaddle#9831)

* fix batch infra

* fix deepseekv2 infra

Fix position_ids for infra  (PaddlePaddle#9841)

fix moe diff due to e_score_correction_bias

fix fast tokenizer

[LLM] Add pipeline and flashmask for Qwen2Moe and Deepseek (PaddlePaddle#9827)

* add modleing_pp

* add modleing_pp for qwen2moe

* add flashmask and pp for Qwen2MoE and Deepseek

* remove

* fix fast_tokenizer save

* update for topk_weight of noaux_tc

* fix for flashmask

* add use_expert_parallel for pretrain

* fix tokenizer test

[Mergekit]update & add LoRA merge (PaddlePaddle#9811)

* add

* fix bug

* fix

* add

* add lora merge

* add

* add

* add

* add

* add

* add

[Unified Checkpoint] Fix expert parallel (PaddlePaddle#9821)

* fix expert parallel

* fix split_param for expert parallel

* add filter_sync_parameters

fix import

[Inference] Flask server compatible with OpenAI api. (PaddlePaddle#9828)

* flask server compatible with OpenAI api.

* fix max_length to max_tokens.

* fix with think model.

[LLM] fix checkpoint save for non flash mode (PaddlePaddle#9830)

support mla for speculate

[DSK] support deepseek-v3/r1 (mha/fp16/bf16/wint8/wint4) (PaddlePaddle#9769)

* support deepseek-v3

* support head_dim=192,256 for append_attn c16

* update 0113

* attention run

* refine code

* add softmax_scale

* support weight_only_int8

* refine code

* support tp

* delete test_append_attn

* add splited fused_moe from ziyuan

* fix repe for deepseek-v3

* add deepseek-v3 class

* fix wint8 precision and refine code

* fix wint4, big diff

* add e_score_correction_bias

* fix head_dim

* fix v3 verify

* fix d2s

* fix v3 verify

* support qk_head_dim != v_head_dim

* fix wint8 v_head_dim

* fix rope

* fix qwen2

* mla use position_ids only

* remove control flow

* remove gpu concat

* fix norm weight dtype

* remove all_reduce in fused_moe

* fix static run

* refine rope code

* compute position_ids use custom op

* fuse rope

* fix macro

* fix mixtral

* support mla for speculate

* fix tokenizer and qwen

* fix moe diff due to e_score_correction_bias

* fix fast tokenizer

* fix import

---------

Co-authored-by: lizhenyun01 <[email protected]>
Co-authored-by: lizhenyun <[email protected]>

Solve the compatibility problem of type annotation Python version (PaddlePaddle#9853)

mix fp8 and wint8

save extra special tokens (PaddlePaddle#9837)

[Bugfix] Fix dsk rope diff (PaddlePaddle#9859)

* fix dsk diff

* fix

* update

merge develop to check fp8 moe-wint8

fix deepseek v3 fp8 precision

fix deepseek weight quant

[Optimization] Support lower memory cards. (PaddlePaddle#9804)

* support lower memory cards.

* add doc for v100 16G such devices.

* remove debug info.

* add pre divided factor to overcome overfit problem for fp16 attention.

Support XPU for auto-paralllel LLaMa (PaddlePaddle#9796)

* Support XPU for auto-paralllel LLaMa

* Update

* Update

* Update

* Update

* Fix CI errors

* Update

[XPU] Add xpu fused op for deepseek (PaddlePaddle#9854)

[Inference] Update deepseek (PaddlePaddle#9864)

* fix

* fix infra

[PreTrain] Support deepseek mfu for pretraining and fix tflops for pretrain pipe model (PaddlePaddle#9855)

* git flops with pp model.

* Support hareware tflops for deepseek.

[Inference]Support mtp with deepseek-v3 (PaddlePaddle#9856)

* support mtp with deepseek_v3 both in static and dygraph mode

* fix speculate tokenizer in unittest

* delete useless code

check code
@ckl117 ckl117 mentioned this pull request Feb 17, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants