Skip to content

[Bug] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib when running Qwen-1_8B-Chat-q4f16_1-MLC #1728

@tlopex

Description

@tlopex

🐛 Bug

When I want to initiate Qwen model after I compiled it, I met this bug: FAILED to read metadata section in legacy model lib

To Reproduce

Steps to reproduce the behavior:

1.python -m mlc_chat convert_weight ./dist/models/Qwen-1_8B-Chat/ --quantization q4f16_1 -o dist/Qwen-1_8B-Chat-q4f16_1-MLC
2.python -m mlc_chat gen_config ./dist/models/Qwen-1_8B-Chat/ --quantization q4f16_1 --conv-template qwen -o dist/Qwen-1_8B-Chat-q4f16_1-MLC/
3.python -m mlc_chat compile ./dist/Qwen-1_8B-Chat-q4f16_1-MLC/mlc-chat-config.json --device cuda -o dist/libs/Qwen-1_8B-Chat-q4f16_1-cuda.so
4.

python
>>> from mlc_chat import ChatModule
>>> cm = ChatModule(model="./dist/Qwen-1_8B-Chat-q4f16_1-MLC",model_lib_path="./dist/libs/Qwen-1_8B-Chat-q4f16_1-cuda.so")

Traceback

>>> cm = ChatModule(model="./dist/Qwen-1_8B-Chat-q4f16_1-MLC",model_lib_path="./dist/libs/Qwen-1_8B-Chat-q4f16_1-cuda.so")
[2024-02-08 14:40:19] INFO auto_device.py:76: Found device: cuda:0
[2024-02-08 14:40:19] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-08 14:40:19] INFO auto_device.py:85: Not found device: metal:0
[2024-02-08 14:40:20] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-08 14:40:20] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-08 14:40:20] INFO auto_device.py:33: Using device: cuda:0
[2024-02-08 14:40:20] INFO chat_module.py:370: Using model folder: /home/tlopex/mlc-llm/dist/Qwen-1_8B-Chat-q4f16_1-MLC
[2024-02-08 14:40:20] INFO chat_module.py:371: Using mlc chat config: /home/tlopex/mlc-llm/dist/Qwen-1_8B-Chat-q4f16_1-MLC/mlc-chat-config.json
[2024-02-08 14:40:20] INFO chat_module.py:513: Using library model: ./dist/libs/Qwen-1_8B-Chat-q4f16_1-cuda.so
[2024-02-08 14:40:21] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
  File "/home/tlopex/mlc-llm/python/mlc_chat/cli/model_metadata.py", line 160, in main
    metadata = _extract_metadata(parsed.model_lib)
  File "/home/tlopex/mlc-llm/python/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
  File "/home/tlopex/relax/python/tvm/runtime/relax_vm.py", line 97, in __init__
    self._setup_device(device, memory_cfg)
  File "/home/tlopex/relax/python/tvm/runtime/relax_vm.py", line 133, in _setup_device
    self.module["vm_initialization"](*init_args)
  File "/home/tlopex/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/tlopex/relax/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  2: tvm::runtime::relax_vm::VirtualMachineImpl::_Init(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  1: tvm::runtime::relax_vm::VirtualMachineImpl::Init(std::vector<DLDevice, std::allocator<DLDevice> > const&, std::vector<tvm::runtime::memory::AllocatorType, std::allocator<tvm::runtime::memory::AllocatorType> > const&)
  0: tvm::runtime::relax_vm::VirtualMachineImpl::InitFuncPool()
  File "/home/tlopex/relax/src/runtime/relax_vm/vm.cc", line 676
InternalError: Check failed: (func.defined()) is false: Error: Cannot find PackedFunc flashinfer.single_prefill in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tlopex/mlc-llm/python/mlc_chat/chat_module.py", line 780, in __init__
    self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
  File "/home/tlopex/mlc-llm/python/mlc_chat/chat_module.py", line 1008, in _reload
    self._reload_func(lib, model_path, app_config_json, kv_cache_config.asjson())
  File "/home/tlopex/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/tlopex/relax/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/home/tlopex/mlc-llm/cpp/llm_chat.cc", line 1613, in mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
    chat_->Reload(args[0], args[1], args[2], args[3]);
  File "/home/tlopex/mlc-llm/cpp/llm_chat.cc", line 594, in mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String, tvm::runtime::String)
    this->ft_.Init(reload_lib, device_, model_config);
  File "/home/tlopex/mlc-llm/cpp/llm_chat.cc", line 171, in Init
    this->local_vm->GetFunction("vm_initialization")(
tvm.error.InternalError: Traceback (most recent call last):
  5: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/tlopex/mlc-llm/cpp/llm_chat.cc:1613
  4: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String, tvm::runtime::String)
        at /home/tlopex/mlc-llm/cpp/llm_chat.cc:594
  3: Init
        at /home/tlopex/mlc-llm/cpp/llm_chat.cc:171
  2: tvm::runtime::relax_vm::VirtualMachineImpl::_Init(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  1: tvm::runtime::relax_vm::VirtualMachineImpl::Init(std::vector<DLDevice, std::allocator<DLDevice> > const&, std::vector<tvm::runtime::memory::AllocatorType, std::allocator<tvm::runtime::memory::AllocatorType> > const&)
  0: tvm::runtime::relax_vm::VirtualMachineImpl::InitFuncPool()
  File "/home/tlopex/relax/src/runtime/relax_vm/vm.cc", line 676
InternalError: Check failed: (func.defined()) is false: Error: Cannot find PackedFunc flashinfer.single_prefill in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable

Environment

  • Platform : CUDA
  • Operating system : Ubuntu 20.04
  • Device: RTX3080TI
  • How you installed MLC-LLM : source
  • How you installed TVM-Unity : source
  • Python version : 3.8.10
  • GPU driver version (if applicable): 535.154.05
  • CUDA/cuDNN version (if applicable):12.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions