[Model] DeepSeek-V3 Enhancements #11539

simon-mo · 2024-12-27T00:39:28Z

july8023 · 2024-12-30T03:35:59Z

If I want to deploy deepseek 600B use vllm and RTX4090, are there any restrictions? How many RTX 4090 do I need at least?

fsaudm · 2024-12-31T13:35:09Z

Is inference with A100s supported? How about quantization??

mphilippnv · 2024-12-31T15:12:34Z

Deepseek v3 doesn't appear to support pipeline parallelism. I get this error when attempting to deploy to 2 8x H100 nodes:

NotImplementedError: Pipeline parallelism is only supported for the following  architectures: ['AquilaForCausalLM', 'AquilaModel', 'DeepseekV2ForCausalLM', 'GPT2LMHeadModel', 'InternLM2ForCausalLM', 'InternLMForCausalLM', 'InternVLChatModel', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'NemotronForCausalLM', 'Phi3ForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration'].

I'm using --tensor-parallel-size 8 --pipeline-parallel-size 2

simon-mo · 2024-12-31T17:04:24Z

@july8023 It should work on 4090, generally the models takes about 600GB memory, then you want about 100-300GB for KV cache so feel free to plan around that.
@fsaudm A100s are not supported because this models requires FP8 tensor cores.
@mphilippnv which version of vLLM are you using? You might need to update to v0.6.6 or higher.

fsaudm · 2024-12-31T17:36:54Z

@simon-mo right, A100s don't support fp8. Would the arg --dtype bfloat16 suffice? If not, I found the bf16 version in Huggingface, any insights on whether that would work?

simon-mo · 2024-12-31T17:38:40Z

The model currently does not support --dtype bfloat16 because it is natively trained in fp8. Can you point me to the bf16 version?

fsaudm · 2024-12-31T17:44:53Z

@simon-mo on HF: https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16/tree/main

, on the official repo they provide a script to cast fp8 to bf16, but of course you can't do it on A100s... my guess is a good soul did it and uploaded it to HF. In the repo, see 6.

https://github.com/deepseek-ai/DeepSeek-V3

simon-mo · 2024-12-31T17:47:51Z

vLLM does support this bf16 model on A100. It looks like the config.json properly removed quantization_config so it would already.

mphilippnv · 2024-12-31T17:51:07Z

@july8023 It should work on 4090, generally the models takes about 600GB memory, then you want about 100-300GB for KV cache so feel free to plan around that. @fsaudm A100s are not supported because this models requires FP8 tensor cores. @mphilippnv which version of vLLM are you using? You might need to update to v0.6.6 or higher.

Using v0.6.6

EDIT: Apologies, I was using 0.6.2. Redeploying helm chart with 0.6.6.post1. Will see how it goes.

fsaudm · 2024-12-31T17:51:59Z

Any knowledge of a working example of serving deepseekv3 on A100s with vLLM? I'll try later, but any hints or help is very much appreciated

JamesBVMNetwork · 2025-01-02T15:56:38Z

Hi everyone,
I’m encountering the following error when trying to run the image vllm/vllm-openai:v0.6.6.post1 on a node equipped with 8x H100 SMX GPUs:

ValueError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250102-072212.pkl): functional_call got multiple values for keys ['mlp.experts.e_score_correction_bias', 'mlp.gate.e_score_correction_bias'], which are tied. Consider using tie_weights=False
2025-01-02T15:22:12.753719474Z

Here’s the command I used:

--model deepseek-ai/DeepSeek-V3-Base \
--tensor-parallel-size 8 \
--disable_log_requests \
--uvicorn_log_level error \
--max-model-len 16384 \
--cpu-offload-gb 400 \
--max_num_seqs 1 \
--trust-remote-code \
--gpu-memory-utilization 0.95 \
--enforce-eager

Does anyone have suggestions or solutions for resolving this issue?

Thanks in advance!

simon-mo added misc performance Performance-related issues new model Requests to new models and removed misc labels Dec 27, 2024

simon-mo changed the title ~~[Model] Deepseek V3 Enhancements~~ [Model] DeepSeek-V3 Enhancements Dec 27, 2024

llsj14 mentioned this issue Jan 1, 2025

[Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design #11672

Open

mowentian mentioned this issue Jan 2, 2025

the normal generation throughout reference deepseek-ai/DeepSeek-V3#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] DeepSeek-V3 Enhancements #11539

[Model] DeepSeek-V3 Enhancements #11539

simon-mo commented Dec 27, 2024 •

edited

Loading

july8023 commented Dec 30, 2024

fsaudm commented Dec 31, 2024

mphilippnv commented Dec 31, 2024

simon-mo commented Dec 31, 2024

fsaudm commented Dec 31, 2024

simon-mo commented Dec 31, 2024

fsaudm commented Dec 31, 2024

simon-mo commented Dec 31, 2024

mphilippnv commented Dec 31, 2024 •

edited

Loading

fsaudm commented Dec 31, 2024

JamesBVMNetwork commented Jan 2, 2025

[Model] DeepSeek-V3 Enhancements #11539

[Model] DeepSeek-V3 Enhancements #11539

Comments

simon-mo commented Dec 27, 2024 • edited Loading

july8023 commented Dec 30, 2024

fsaudm commented Dec 31, 2024

mphilippnv commented Dec 31, 2024

simon-mo commented Dec 31, 2024

fsaudm commented Dec 31, 2024

simon-mo commented Dec 31, 2024

fsaudm commented Dec 31, 2024

simon-mo commented Dec 31, 2024

mphilippnv commented Dec 31, 2024 • edited Loading

fsaudm commented Dec 31, 2024

JamesBVMNetwork commented Jan 2, 2025

simon-mo commented Dec 27, 2024 •

edited

Loading

mphilippnv commented Dec 31, 2024 •

edited

Loading