Deepseek_v4 support w4(mxfp4)a16 on hopper#23686
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
| size_k=K, | ||
| is_k_full=is_k_full, | ||
| use_atomic_add=use_atomic_add, | ||
| use_atomic_add=False, |
There was a problem hiding this comment.
Protect this with an if branch
|
Simple smoke test: |
|
Longbench result on flash model with marlin: |
|
Hey, I tried using your PR on H20. I noticed that you modified part of the Could you share how you temporarily compiled |
|
you need modify the 54f99a8af537b3c6eb4819b69907ccbe2b600792 to ffe2b6b97420a9f8c58268ca55755168e6e2f360 |
thx ! is there anything else I should pay attention to, or do I only need to run |
|
Aime25 test: |
|
@zhangxiaolei123456 Thanks for your contribution! |
The Hopper w4a16 PR (sgl-project#23686) restructured the FP4 expert weight processing branches in a way that blocks the default deepgemm/auto backend path with a NotImplementedError. This restores the original logic and treats marlin as the special-case addition.
The Hopper w4a16 PR (sgl-project#23686) restructured the FP4 expert weight processing branches in a way that blocks the default deepgemm/auto backend path with a NotImplementedError. This restores the original logic and treats marlin as the special-case addition.
Motivation
Co-authored-by: shiyu7
Modifications
Accuracy Tests
Flash
GSM8K
MMLU
gpqa
longbench_v2
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci