910B 起服务失败 #2945

SefaZeng · 2024-12-24T06:56:44Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在 910B 上启动 llm 服务，失败。

[W NPUCachingAllocator.cpp:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
/usr/local/python3.10.14/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:299: ImportWarning: 
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************
    
  warnings.warn(msg, ImportWarning)
/usr/local/python3.10.14/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
2024-12-24 06:54:05,028 - lmdeploy - ERROR - __init__.py:17 - OSError: /usr/local/dlinfer/dlinfer/vendor/ascend/ascend_extension.so: undefined symbol: _ZN3c1010TensorImpl29compute_channels_last_2d_dim5ENS0_8identityINS_7SymBoolEEE
2024-12-24 06:54:05,028 - lmdeploy - ERROR - __init__.py:18 - <PyTorch> test failed!
Please ensure it has been installed correctly.

Reproduction

lmdeploy serve api_server --backend pytorch --device ascend /models/Qwen2.5-7B-Instruct --server-port 8000

Environment

[W NPUCachingAllocator.cpp:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
sys.platform: linux
Python: 3.10.14 (main, Aug  8 2024, 03:54:57) [GCC 11.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1+cpu
LMDeploy: 0.6.4+
transformers: 4.46.2
gradio: 5.9.1
fastapi: 0.115.6
pydantic: 2.8.2
triton: 3.0.0

Error traceback

No response

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-12-24T08:25:09Z

We highly recommend that users build a Docker image for streamlined environment setup.
May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

SefaZeng · 2024-12-24T11:45:06Z

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

lvhan028 · 2024-12-24T13:14:21Z

Hi, @jinminxi104
Could you do the favor?

jinminxi104 · 2024-12-24T14:56:18Z

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment?
(did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu?
pip install dlinfer-ascend? or compile dlinfer by yourself?)
Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

SefaZeng · 2024-12-25T03:05:04Z

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment? (did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu? pip install dlinfer-ascend? or compile dlinfer by yourself?) Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

Yes, the docker image is provided by huawei team. I'm not quite sure what "800I a2" means. I'm currently using 910B2C. Driver version is 8.0_RC3.0. Cpu is from Intel.
I install dlinfer from source.

I tried this scripts with eager mode I think, it raise the same error

from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
    pipe = pipeline("internlm/internlm2_5-7b-chat",
                    backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
    question = ["Shanghai is", "Please introduce China", "How are you?"]
    response = pipe(question)
    print(response)

jinminxi104 · 2024-12-25T08:30:43Z

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment? (did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu? pip install dlinfer-ascend? or compile dlinfer by yourself?) Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

Yes, the docker image is provided by huawei team. I'm not quite sure what "800I a2" means. I'm currently using 910B2C. Driver version is 8.0_RC3.0. Cpu is from Intel. I install dlinfer from source.

I tried this scripts with eager mode I think, it raise the same error
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
    pipe = pipeline("internlm/internlm2_5-7b-chat",
                    backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
    question = ["Shanghai is", "Please introduce China", "How are you?"]
    response = pipe(question)
    print(response)

Please try to downgrade the torch-npu to 2.3.1.

lvhan028 assigned jinminxi104 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

910B 起服务失败 #2945

910B 起服务失败 #2945

SefaZeng commented Dec 24, 2024 •

edited

Loading

lvhan028 commented Dec 24, 2024

SefaZeng commented Dec 24, 2024 •

edited

Loading

lvhan028 commented Dec 24, 2024

jinminxi104 commented Dec 24, 2024

SefaZeng commented Dec 25, 2024

jinminxi104 commented Dec 25, 2024

910B 起服务失败 #2945

910B 起服务失败 #2945

Comments

SefaZeng commented Dec 24, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Dec 24, 2024

SefaZeng commented Dec 24, 2024 • edited Loading

lvhan028 commented Dec 24, 2024

jinminxi104 commented Dec 24, 2024

SefaZeng commented Dec 25, 2024

jinminxi104 commented Dec 25, 2024

SefaZeng commented Dec 24, 2024 •

edited

Loading

SefaZeng commented Dec 24, 2024 •

edited

Loading