[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

cccccya · 2025-02-09T12:18:02Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

dlinfer 目前支持在华为昇腾平台上加载 Qwen2(.5)-7B 的 w4a16(eager) 量化模型。
但是我在昇腾 310P 上加载 AWQ 模型时出现奇怪的传参数量错误，具体错误信息如下：
TypeError: QKVAwqLinear._update_all_out_features() takes 4 positional arguments but 5 were given
该问题出现在使用 dlinfer 和 lmdeploy 的 support_310P 分支时。
阅读源码后发现，LMDeploy 的 QKVAwqLinear 仅传入 all_out_features, w_bit, group_size 这几个参数。

但是在 dlinfer 中则需要 all_out_features, w_bit, group_size, replicate 四个参数。

该问题目前在最新的main分支代码中同样存在。

Reproduction

lmdeploy chat /mnt/data/llm/Qwen2.5-7B-Instruct-AWQ --device ascend --dtype float16 --eager-mode --model-format awq

Environment

sys.platform: linux
Python: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (GCC) 10.3.1
PyTorch: 2.3.1+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.18.1+cpu
LMDeploy: 0.7.0+3295475
transformers: 4.48.0
gradio: Not Found
fastapi: 0.115.6
pydantic: 2.10.5
triton: Not Found

Error traceback

Traceback (most recent call last):
  File "/home/cya/.conda/envs/lmdeploy/bin/lmdeploy", line 33, in <module>
    sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/cli/entrypoint.py", line 39, in run
    args.run(args)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/cli/cli.py", line 243, in chat
    run_chat(args.model_path, engine_config, chat_template_config=chat_template_config)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/chat.py", line 67, in run_chat
    tm_model = Engine.from_pretrained(model_path, engine_config=engine_config, trust_remote_code=trust_remote_code)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 196, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 145, in __init__
    self.model_agent = build_model_agent(model_path,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 703, in build_model_agent
    model_agent = BaseModelAgent(model_path,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 208, in __init__
    self.patched_model = self._build_model(model_path, adapters, device=device)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 229, in _build_model
    patched_model = build_patched_model(self.model_config, device=device)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/.conda/envs/lmdeploy/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/patch.py", line 195, in build_patched_model
    return build_model_from_hf_config(model_config, dtype=dtype, device=device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/patch.py", line 186, in build_model_from_hf_config
    model = model_cls(model_config, ctx_mgr, dtype=dtype, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 306, in __init__
    self.model = Qwen2Model(config, dtype=dtype, device=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 218, in __init__
    self.layers = nn.ModuleList([
                                ^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 219, in <listcomp>
    Qwen2DecoderLayer(config, layer_idx, dtype=dtype, device=device)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 153, in __init__
    self.self_attn = Qwen2Attention(config, dtype=dtype, device=device)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 30, in __init__
    self.qkv_proj = build_qkv_proj(hidden_size,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 1471, in build_qkv_proj
    return QKVAwqLinear(in_features=in_features,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 637, in __init__
    super().__init__(in_features,
  File "/home/cya/lmdeploy/dlinfer/dlinfer/framework/lmdeploy_ext/quants/ascend_awq.py", line 138, in AscendMergedAwqLinear__init__
    all_out_features = self._update_all_out_features(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: QKVAwqLinear._update_all_out_features() takes 4 positional arguments but 5 were given
[ERROR] 2025-02-09-12:30:18 (PID:64253, Device:0, RankID:-1) ERR99999 UNKNOWN application exception

The text was updated successfully, but these errors were encountered:

jinminxi104 · 2025-02-10T06:18:08Z

这个问题我们fix一下。
但是310p的eager没法跑GQA的模型。所以需要再等一下310p上的图模式。
还有，dlinfer的表格上写的那些支持列表是针对910系列的。

jinminxi104 self-assigned this Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

cccccya commented Feb 9, 2025

jinminxi104 commented Feb 10, 2025 •

edited

Loading

[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

Comments

cccccya commented Feb 9, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

jinminxi104 commented Feb 10, 2025 • edited Loading

jinminxi104 commented Feb 10, 2025 •

edited

Loading