Fix deepseek awq v3 by hnyls2002 · Pull Request #3450 · sgl-project/sglang

hnyls2002 · 2025-02-10T03:35:07Z

python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla

python/sglang/srt/layers/quantization/awq_marlin.py

halexan · 2025-02-10T07:58:25Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

chenchunhui97 · 2025-02-10T08:06:22Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

I am having a try......

Xu-Chen · 2025-02-10T08:19:02Z

We should also introduce triton fused moe kernel like moe_wna16.
AWQ marlin kernel may be just get 10 token/s on 8*A100.

hnyls2002 · 2025-02-10T09:27:07Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

Yes, this PR is exactly for this

pachinko · 2025-02-11T06:50:00Z

``> > After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

Yes, this PR is exactly for this

still have a problem, i am running this model cognitivecomputations/DeepSeek-V3-AWQ

[2025-02-11 14:42:20 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/WORK/sglang/python/sglang/srt/managers/scheduler.py", line 1816, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_executor/model_runner.py", line 186, in __init__
    self.load_model()
  File "/WORK/sglang/python/sglang/srt/model_executor/model_runner.py", line 307, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_loader/loader.py", line 362, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/WORK/sglang/python/sglang/srt/models/deepseek_v2.py", line 924, in load_weights
    param = params_dict[name]
            ~~~~~~~~~~~^^^^^^
KeyError: 'model.layers.6.mlp.experts.w2_weight'

[2025-02-11 14:42:20] Received sigquit from a child proces. It usually means the child failed.

halexan · 2025-02-11T07:06:43Z

@pachinko

What is your launch command?

pachinko · 2025-02-11T07:09:46Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

halexan · 2025-02-11T07:09:46Z

We should also introduce triton fused moe kernel like moe_wna16. AWQ marlin kernel may be just get 10 token/s on 8*A100.

So, does this pr still use AWQ marlin kernel?

pachinko · 2025-02-11T07:10:28Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

I replaced the config.json with the awq version.

hnyls2002 · 2025-02-11T08:39:48Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

I replaced the config.json with the awq version.

R1 and MLA are not supported by now, due to some unknown accuracy reasons. You can use V3-AWQ with this command

 python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla

chenchunhui97 · 2025-02-12T01:50:26Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

I succeeded to deploy the model on 8*A800 by building docker image on branch fix-dpsk-v3-awq.

Xu-Chen · 2025-02-12T02:00:32Z

Could you share some benchmark？

Zachary-ai-engineer · 2025-02-12T03:24:35Z

We tested V3 AWQ based on the latest code and found that indicators such as tpot were relatively poor. How should we solve this problem?

halexan · 2025-02-12T09:16:31Z

How about benchmark？@chenchunhui97

zhyncs

This fix is a bit tricky, I'll merge it first to unblock the awq usage. Refactoring is on its way.

luweizheng · 2025-02-21T02:27:04Z

My launch script on 8*A800 80G. This model havs been successfully deployed with vLLM with a smaller context length. But it seems vLLM does not optimize well on MLA now.

python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-awq/DeepSeek-R1-awq --tp 8 --host 0.0.0.0 --port 11434 --trust-remote-code

Error:

File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 362, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/deepseek_v2.py", line 962, in load_weights
    w = ops.awq_dequantize(
        ^^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/vllm/_custom_ops.py", line 222, in awq_dequantize
    return torch.ops._C.awq_dequantize(qweight, scales, zeros, split_k_iters,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/torch/_ops.py", line 1116, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected scalar type Half but found BFloat16

@chenchunhui97 @zhyncs Any suggestions?

adapt code

9fba060

hnyls2002 requested review from Ying1123, ispobock, merrymercy and zhyncs as code owners February 10, 2025 03:35

update quantization __init__

35c7fe0

zhyncs reviewed Feb 10, 2025

View reviewed changes

python/sglang/srt/layers/quantization/awq_marlin.py Outdated Show resolved Hide resolved

hnyls2002 marked this pull request as draft February 10, 2025 04:43

format

b019214

This was referenced Feb 10, 2025

[Feature] sglang can run on the deepseek v3 model and support chunked prefill or prefix caching #3458

Closed

[Bug] TypeError: _ColumnvLLMParameter.load_column_parallel_weight() got an unexpected keyword argument 'tp_rank' #3464

Closed

halexan mentioned this pull request Feb 10, 2025

[Feature] add support for deepseek v3 gptq / awq #2706

Closed

2 tasks

hnyls2002 added 3 commits February 10, 2025 08:22

fix create weights

39402da

fix select_experts

d951003

update

2dd38d7

hnyls2002 added 3 commits February 10, 2025 10:01

fix skip quantization

d13f3df

fix

1e0b14f

fix param

002c12a

hnyls2002 marked this pull request as ready for review February 10, 2025 11:47

hnyls2002 requested review from ByronHsu and HaiShaw as code owners February 10, 2025 11:47

Merge branch 'main' into fix-dpsk-v3-awq

fd1e1ea

hnyls2002 changed the title ~~Fix deepseek awq v3~~ [DO NOT MERGE] Fix deepseek awq v3 Feb 10, 2025

hnyls2002 added 3 commits February 10, 2025 13:36

fix mla

391d4f4

monkey patch

0347ea2

patch apply

b464e6a

remove

fba3941

hnyls2002 changed the title ~~[DO NOT MERGE] Fix deepseek awq v3~~ Fix deepseek awq v3 Feb 10, 2025

hnyls2002 added 2 commits February 11, 2025 01:55

Merge branch 'main' into fix-dpsk-v3-awq

11162b2

Merge branch 'main' into fix-dpsk-v3-awq

1b9e363

Merge branch 'main' into fix-dpsk-v3-awq

33747f6

zhyncs added the high priority label Feb 11, 2025

Merge branch 'main' into fix-dpsk-v3-awq

a0f5418

Merge branch 'main' into fix-dpsk-v3-awq

d3ce4fb

zhyncs approved these changes Feb 12, 2025

View reviewed changes

zhyncs merged commit 8616357 into main Feb 12, 2025
21 checks passed

zhyncs deleted the fix-dpsk-v3-awq branch February 12, 2025 14:09

zjp-shadow mentioned this pull request Feb 22, 2025

[Bug] AWQ scalar type error #3780

Closed

5 tasks

Conversation

hnyls2002 commented Feb 10, 2025 • edited by merrymercy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

halexan commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenchunhui97 commented Feb 10, 2025

Uh oh!

Xu-Chen commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Feb 10, 2025

Uh oh!

pachinko commented Feb 11, 2025

Uh oh!

halexan commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pachinko commented Feb 11, 2025

Uh oh!

halexan commented Feb 11, 2025

Uh oh!

pachinko commented Feb 11, 2025

Uh oh!

hnyls2002 commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenchunhui97 commented Feb 12, 2025

Uh oh!

Xu-Chen commented Feb 12, 2025

Uh oh!

Zachary-ai-engineer commented Feb 12, 2025

Uh oh!

halexan commented Feb 12, 2025

Uh oh!

zhyncs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luweizheng commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

hnyls2002 commented Feb 10, 2025 •

edited by merrymercy

Loading

halexan commented Feb 10, 2025 •

edited

Loading

Xu-Chen commented Feb 10, 2025 •

edited

Loading

halexan commented Feb 11, 2025 •

edited

Loading

hnyls2002 commented Feb 11, 2025 •

edited

Loading