Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

910B 起服务失败 #2945

Open
3 tasks done
SefaZeng opened this issue Dec 24, 2024 · 6 comments
Open
3 tasks done

910B 起服务失败 #2945

SefaZeng opened this issue Dec 24, 2024 · 6 comments
Assignees

Comments

@SefaZeng
Copy link

SefaZeng commented Dec 24, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在 910B 上启动 llm 服务,失败。

[W NPUCachingAllocator.cpp:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
/usr/local/python3.10.14/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:299: ImportWarning: 
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************
    
  warnings.warn(msg, ImportWarning)
/usr/local/python3.10.14/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
2024-12-24 06:54:05,028 - lmdeploy - ERROR - __init__.py:17 - OSError: /usr/local/dlinfer/dlinfer/vendor/ascend/ascend_extension.so: undefined symbol: _ZN3c1010TensorImpl29compute_channels_last_2d_dim5ENS0_8identityINS_7SymBoolEEE
2024-12-24 06:54:05,028 - lmdeploy - ERROR - __init__.py:18 - <PyTorch> test failed!
Please ensure it has been installed correctly.

Reproduction

lmdeploy serve api_server --backend pytorch --device ascend /models/Qwen2.5-7B-Instruct --server-port 8000

Environment

[W NPUCachingAllocator.cpp:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
sys.platform: linux
Python: 3.10.14 (main, Aug  8 2024, 03:54:57) [GCC 11.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1+cpu
LMDeploy: 0.6.4+
transformers: 4.46.2
gradio: 5.9.1
fastapi: 0.115.6
pydantic: 2.8.2
triton: 3.0.0

Error traceback

No response

@lvhan028
Copy link
Collaborator

We highly recommend that users build a Docker image for streamlined environment setup.
May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

@SefaZeng
Copy link
Author

SefaZeng commented Dec 24, 2024

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

@lvhan028
Copy link
Collaborator

Hi, @jinminxi104
Could you do the favor?

@jinminxi104
Copy link
Collaborator

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment?
(did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu?
pip install dlinfer-ascend? or compile dlinfer by yourself?)
Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

@SefaZeng
Copy link
Author

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment? (did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu? pip install dlinfer-ascend? or compile dlinfer by yourself?) Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

Yes, the docker image is provided by huawei team. I'm not quite sure what "800I a2" means. I'm currently using 910B2C. Driver version is 8.0_RC3.0. Cpu is from Intel.
I install dlinfer from source.

I tried this scripts with eager mode I think, it raise the same error

from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
    pipe = pipeline("internlm/internlm2_5-7b-chat",
                    backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
    question = ["Shanghai is", "Please introduce China", "How are you?"]
    response = pipe(question)
    print(response)

@jinminxi104
Copy link
Collaborator

We highly recommend that users build a Docker image for streamlined environment setup. May refer to https://lmdeploy.readthedocs.io/en/latest/get_started/ascend/get_started.html

I am using CANN RC3.0 provided by the Huawei team. I saw that the version written in the provided link is RC2.0. Does it also support 3.0? I understand that the version limitation of Ascend is very strict.

RC3 works in our environment. can you provide more details about your environment? (did huawei team provide you a docker image? 800I a2 or 800T a2? driver version/ intel cpu or kunpeng cpu? pip install dlinfer-ascend? or compile dlinfer by yourself?) Also, you can try eager mode, but you need to downgrade the torch-npu from 2.3.1.post2 to 2.3.1.

Yes, the docker image is provided by huawei team. I'm not quite sure what "800I a2" means. I'm currently using 910B2C. Driver version is 8.0_RC3.0. Cpu is from Intel. I install dlinfer from source.

I tried this scripts with eager mode I think, it raise the same error

from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
    pipe = pipeline("internlm/internlm2_5-7b-chat",
                    backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
    question = ["Shanghai is", "Please introduce China", "How are you?"]
    response = pipe(question)
    print(response)

Please try to downgrade the torch-npu to 2.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants