[Bugfix] [Config] Retune ptpc fmoe deepseek-r1 for MI308#1418
[Bugfix] [Config] Retune ptpc fmoe deepseek-r1 for MI308#1418
Conversation
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
|
I fixed a problem of tuning with the CK solutions of fmoe in #1405. And you can retune the shapes with this procedure: 1. Clean the untuned_fmoe.csv and add the shapes you want to tune 2. Run AITER_REBUILD=1 python3 hsa/gfx942/fmoe_2stages/tune.py --all at root of the repository. It will update the shapes tuned in tuned_fmoe.csv |
Ok. Let me try and get back to you. |
|
@yzhou103 |
@yzhou103 I built from a clean build and I still encounter this problem. I am using And did you encounter this issue #1417? |
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
|
@yzhou103 Thank you for sharing the tuning results that you have obtained. They are the same as my tuning results. So could we proceed with the review and get this merge to fix the accuracy issue? Thank you. |
|
@yzhou103 do I need to do anything to unblock the merge? Thank you. |
Nothing, it seems ci is not stable recently |
|
@yzhou103 all the tests passed now. |
* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
Motivation
This also addressed issue #1417
Since the commit a7f63e3 is only about adding preshuffle for mxfp4,
so for gfx942 that does not support fp4, the
bpreshuffleargument has been set toFalse.The fused moe has been retuned as the original configuration has accuracy issue. We are getting lm_eval score of
0.Please let me know if the kernel usage or tuning procedure is not correct as the generated tuning file only have 1 kernel entry.
Technical Details
Retuning procedure that we have executed:
Clean the
untuned_fmoe.csvand thetuned_fmoe.csvAdd the following entries into
untuned_fmoe.csv.AITER_REBUILD=1 python3 hsa/gfx942/fmoe_2stages/tune.pyat root of the repository.Test Plan
Run E2E lmeval test for ptpc deepseek-r1
Test Result
Submission Checklist