[Bug] deepseek v3 deployment on h200 #3049

zhyncs · 2025-01-18T18:04:34Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

pip3 install lmdeploy==0.7.0
lmdeploy serve api_server deepseek-ai/DeepSeek-V3 --tp 8 --backend pytorch

LICENSE-MODEL: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 13.8k/13.8k [00:00<00:00, 48.1MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 4.03MB/s]
README_WEIGHTS.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3.65k/3.65k [00:00<00:00, 16.8MB/s]
LICENSE-CODE: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06k/1.06k [00:00<00:00, 4.70MB/s]
inference/configs/config_16B.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 417/417 [00:00<00:00, 1.70MB/s]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 22.6k/22.6k [00:00<00:00, 12.8MB/s]
figures/benchmark.png: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 184k/184k [00:00<00:00, 1.62MB/s]
figures/niah.png: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 108k/108k [00:00<00:00, 1.32MB/s]
inference/configs/config_236B.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 455/455 [00:00<00:00, 2.02MB/s]
inference/configs/config_671B.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 503/503 [00:00<00:00, 2.26MB/s]
inference/convert.py: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3.25k/3.25k [00:00<00:00, 14.4MB/s]
inference/fp8_cast_bf16.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 3.24k/3.24k [00:00<00:00, 14.3MB/s]
inference/generate.py: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 5.39k/5.39k [00:00<00:00, 21.6MB/s]
inference/kernel.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.33k/4.33k [00:00<00:00, 18.2MB/s]
inference/requirements.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 66.0/66.0 [00:00<00:00, 293kB/s]
inference/model.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 17.6k/17.6k [00:00<00:00, 50.4MB/s]
modeling_deepseek.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 75.8k/75.8k [00:00<00:00, 684kB/s]
Fetching 185 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:01<00:00, 184.46it/s]

The phenomenon is stuck here.

Reproduction

as mentioned above

Environment

sys.platform: linux
Python: 3.10.16 (main, Dec  4 2024, 08:53:37) [GCC 9.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.5.1+cu124
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.20.1+cu124
LMDeploy: 0.7.0+c2f212d
transformers: 4.48.0
gradio: Not Found
fastapi: 0.115.6
pydantic: 2.10.5
triton: 3.1.0
NVIDIA Topology:
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	NIC8	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	SYS	0-95	0		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	SYS	0-95	0		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	0-95	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	SYS	0-95	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	SYS	SYS	SYS	SYS	NODE	NODE	PIX	NODE	NODE	96-191	1		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	SYS	SYS	SYS	SYS	NODE	NODE	NODE	PIX	NODE	96-191	1		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	SYS	SYS	SYS	SYS	NODE	PIX	NODE	NODE	PHB	96-191	1		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	SYS	SYS	SYS	SYS	PIX	NODE	NODE	NODE	NODE	96-191	1		N/A
NIC0	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	 X 	NODE	NODE	NODE	SYS	SYS	SYS	SYS	SYS
NIC1	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	NODE	 X 	NODE	NODE	SYS	SYS	SYS	SYS	SYS
NIC2	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	NODE	SYS	SYS	SYS	SYS	SYS
NIC3	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	NODE	 X 	SYS	SYS	SYS	SYS	SYS
NIC4	SYS	SYS	SYS	SYS	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	 X 	NODE	NODE	NODE	NODE
NIC5	SYS	SYS	SYS	SYS	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	NODE	 X 	NODE	NODE	PHB
NIC6	SYS	SYS	SYS	SYS	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	NODE	NODE
NIC7	SYS	SYS	SYS	SYS	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	NODE	 X 	NODE
NIC8	SYS	SYS	SYS	SYS	NODE	NODE	PHB	NODE	SYS	SYS	SYS	SYS	NODE	PHB	NODE	NODE	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_8
  NIC7: mlx5_9
  NIC8: mlx5_bond_0

Error traceback

The text was updated successfully, but these errors were encountered:

zhyncs · 2025-01-18T18:12:07Z

lmdeploy serve api_server deepseek-ai/DeepSeek-V3 --tp 8 --backend pytorch --log-level INFO

zhyncs · 2025-01-18T18:28:45Z

15 minutes have passed and it still shows loading.

grimoire · 2025-01-20T03:30:18Z

Errr, it is just Desperately SLOW. We have not performed much optimization on weight loading.

RunningLeon · 2025-01-20T12:06:09Z

@zhyncs hi, can you try on this PR #2886 ?

zhyncs · 2025-01-20T12:40:05Z

@zhyncs hi, can you try on this PR #2886 ?

I'll try it today. Thanks!

zhyncs · 2025-01-21T02:54:47Z

I did a test, and it still takes over 20 minutes to finish loading. Is this within expectations?

RunningLeon · 2025-01-21T03:05:29Z

I did a test, and it still takes over 20 minutes to finish loading. Is this within expectations?

@zhyncs hi, we don't have deepseekv3 so it tested on deepseekv2-chat with tp=8 and loading timing can reduce from 15min to 8 min.

zhyncs · 2025-01-22T18:13:12Z

Hi @RunningLeon, I've granted @grimoire access to the H200. Could you @grimoire please help verify? Thanks!

grimoire · 2025-01-23T03:25:54Z

Is the model placed on NFS? The uncached first-time load could be slow.

zhyncs · 2025-01-27T12:39:34Z

Is the model placed on NFS? The uncached first-time load could be slow.

@grimoire I think it's placed on SSD.

zhyncs assigned grimoire Jan 18, 2025

grimoire linked a pull request Jan 20, 2025 that will close this issue

optimize safetensors weight loading #3052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] deepseek v3 deployment on h200 #3049

[Bug] deepseek v3 deployment on h200 #3049

zhyncs commented Jan 18, 2025

zhyncs commented Jan 18, 2025

zhyncs commented Jan 18, 2025

grimoire commented Jan 20, 2025

RunningLeon commented Jan 20, 2025

zhyncs commented Jan 20, 2025

zhyncs commented Jan 21, 2025

RunningLeon commented Jan 21, 2025

zhyncs commented Jan 22, 2025

grimoire commented Jan 23, 2025

zhyncs commented Jan 27, 2025

[Bug] deepseek v3 deployment on h200 #3049

[Bug] deepseek v3 deployment on h200 #3049

Comments

zhyncs commented Jan 18, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

zhyncs commented Jan 18, 2025

zhyncs commented Jan 18, 2025

grimoire commented Jan 20, 2025

RunningLeon commented Jan 20, 2025

zhyncs commented Jan 20, 2025

zhyncs commented Jan 21, 2025

RunningLeon commented Jan 21, 2025

zhyncs commented Jan 22, 2025

grimoire commented Jan 23, 2025

zhyncs commented Jan 27, 2025