Conversation
|
After this pr being merged. Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ? |
I am having a try...... |
|
We should also introduce triton fused moe kernel like moe_wna16. |
Yes, this PR is exactly for this |
|
``> > After this pr being merged.
still have a problem, i am running this model cognitivecomputations/DeepSeek-V3-AWQ |
|
What is your launch command? |
|
So, does this pr still use AWQ marlin kernel? |
I replaced the config.json with the awq version. |
R1 and MLA are not supported by now, due to some unknown accuracy reasons. You can use V3-AWQ with this command python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla |
I succeeded to deploy the model on 8*A800 by building docker image on branch fix-dpsk-v3-awq. |
|
Could you share some benchmark? |
|
How about benchmark?@chenchunhui97 |
zhyncs
left a comment
There was a problem hiding this comment.
This fix is a bit tricky, I'll merge it first to unblock the awq usage. Refactoring is on its way.
|
My launch script on 8*A800 80G. This model havs been successfully deployed with vLLM with a smaller context length. But it seems vLLM does not optimize well on MLA now. Error: @chenchunhui97 @zhyncs Any suggestions? |

Uh oh!
There was an error while loading. Please reload this page.