Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 在昇腾310P上加载Qwen AWQ模型时出现参数传递错误 #3124

Open
3 tasks
cccccya opened this issue Feb 9, 2025 · 1 comment
Open
3 tasks
Assignees

Comments

@cccccya
Copy link

cccccya commented Feb 9, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

dlinfer 目前支持在华为昇腾平台上加载 Qwen2(.5)-7B 的 w4a16(eager) 量化模型。
但是我在昇腾 310P 上加载 AWQ 模型时出现奇怪的传参数量错误,具体错误信息如下:
TypeError: QKVAwqLinear._update_all_out_features() takes 4 positional arguments but 5 were given
该问题出现在使用 dlinfer 和 lmdeploy 的 support_310P 分支时。
阅读源码后发现,LMDeploy 的 QKVAwqLinear 仅传入 all_out_features, w_bit, group_size 这几个参数。

Image

但是在 dlinfer 中则需要 all_out_features, w_bit, group_size, replicate 四个参数。

Image

该问题目前在最新的main分支代码中同样存在。

Reproduction

lmdeploy chat /mnt/data/llm/Qwen2.5-7B-Instruct-AWQ --device ascend --dtype float16 --eager-mode --model-format awq

Environment

sys.platform: linux
Python: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (GCC) 10.3.1
PyTorch: 2.3.1+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.18.1+cpu
LMDeploy: 0.7.0+3295475
transformers: 4.48.0
gradio: Not Found
fastapi: 0.115.6
pydantic: 2.10.5
triton: Not Found

Error traceback

Traceback (most recent call last):
  File "/home/cya/.conda/envs/lmdeploy/bin/lmdeploy", line 33, in <module>
    sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/cli/entrypoint.py", line 39, in run
    args.run(args)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/cli/cli.py", line 243, in chat
    run_chat(args.model_path, engine_config, chat_template_config=chat_template_config)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/chat.py", line 67, in run_chat
    tm_model = Engine.from_pretrained(model_path, engine_config=engine_config, trust_remote_code=trust_remote_code)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 196, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 145, in __init__
    self.model_agent = build_model_agent(model_path,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 703, in build_model_agent
    model_agent = BaseModelAgent(model_path,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 208, in __init__
    self.patched_model = self._build_model(model_path, adapters, device=device)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 229, in _build_model
    patched_model = build_patched_model(self.model_config, device=device)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/.conda/envs/lmdeploy/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/patch.py", line 195, in build_patched_model
    return build_model_from_hf_config(model_config, dtype=dtype, device=device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/patch.py", line 186, in build_model_from_hf_config
    model = model_cls(model_config, ctx_mgr, dtype=dtype, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 306, in __init__
    self.model = Qwen2Model(config, dtype=dtype, device=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 218, in __init__
    self.layers = nn.ModuleList([
                                ^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 219, in <listcomp>
    Qwen2DecoderLayer(config, layer_idx, dtype=dtype, device=device)
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 153, in __init__
    self.self_attn = Qwen2Attention(config, dtype=dtype, device=device)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 30, in __init__
    self.qkv_proj = build_qkv_proj(hidden_size,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 1471, in build_qkv_proj
    return QKVAwqLinear(in_features=in_features,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cya/lmdeploy/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 637, in __init__
    super().__init__(in_features,
  File "/home/cya/lmdeploy/dlinfer/dlinfer/framework/lmdeploy_ext/quants/ascend_awq.py", line 138, in AscendMergedAwqLinear__init__
    all_out_features = self._update_all_out_features(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: QKVAwqLinear._update_all_out_features() takes 4 positional arguments but 5 were given
[ERROR] 2025-02-09-12:30:18 (PID:64253, Device:0, RankID:-1) ERR99999 UNKNOWN application exception
@jinminxi104 jinminxi104 self-assigned this Feb 10, 2025
@jinminxi104
Copy link
Collaborator

jinminxi104 commented Feb 10, 2025

这个问题我们fix一下。
但是310p的eager没法跑GQA的模型。所以需要再等一下310p上的图模式。
还有,dlinfer的表格上写的那些支持列表是针对910系列的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants