Skip to content

[Bug] AttributeError: Module has no function 'vm_load_executable' encountered in Step 4 of the "Bring Your Own Model Library" tutorial docs/deploy/ios.html#bring-your-own-model-library  #2212

@nobuhiroYamakado

Description

@nobuhiroYamakado

🐛 Bug

I encountered an error while following the step-by-step tutorial at https://llm.mlc.ai/docs/deploy/ios.html#bring-your-own-model-library. The issue occurred during Step 4 of the tutorial.

To Reproduce

Steps to reproduce the behavior:

(Do https://llm.mlc.ai/docs/deploy/ios.html#bring-your-own-model-library)

  1. (step 0.)Install dependencies build mlc_llm from source
# update conda
conda update --yes -n base -c defaults conda
# install `conda-libmamba-solver`
conda install --yes -n base conda-libmamba-solver
# set it as the default solver
conda config --set solver libmamba
conda env remove -n mlc-chat-venv
sudo chmod 775 ~/.conda/environments.txt 
conda create -n mlc-chat-venv -c conda-forge \
    "cmake>=3.24" \
    rust \
    git \
    python=3.11
conda activate mlc-chat-venv
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
python -c "import tvm; print(tvm.__file__)"
# /Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/__init__.py
python -c "import tvm; print(tvm._ffi.base._LIB)"
# <CDLL '/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/libtvm.dylib', handle 9dad0790 at 0x105ff40d0>
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
# GIT_COMMIT_HASH: d694451c580a931116a2c93571f21f7d791c7fa0
# HIDE_PRIVATE_SYMBOLS: ON
# USE_LLVM: llvm-config --link-static
# LLVM_VERSION: 15.0.7
# USE_VULKAN: OFF
# USE_CUDA: OFF
# CUDA_VERSION: NOT-FOUND
# USE_OPENCL: OFF
# USE_METAL: ON
# USE_ROCM: OFF
# ...
python -c "import tvm; print(tvm.metal().exist)"
# True
python -c "import tvm; print(tvm.cuda().exist)"
# False
python -c "import tvm; print(tvm.vulkan().exist)"
# False
# clone from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm/
# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
##Enter TVM_HOME in absolute path. If not specified, 3rdparty/tvm will be used by default: 
## ->/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm
cmake .. && cmake --build . --parallel $(nproc) && cd ..
  1. (step 1.)Clone from HF and convert_weight
mkdir -p dist/models && cd dist/models
# Clone HF weights
git lfs install
git clone https://huggingface.co/microsoft/phi-2
cd ../..
# Convert weight
mlc_llm convert_weight ./dist/models/phi-2/ \
    --quantization q4f16_1 \
    -o dist/phi-2-q4f16_1-MLC
  1. (step 2.)Generate mlc-chat-config and compile
# 1. gen_config: generate mlc-chat-config.json and process tokenizers
mlc_llm gen_config ./dist/models/phi-2/ \
    --quantization q4f16_1 --conv-template phi-2 \
    -o dist/phi-2-q4f16_1-MLC/
mkdir dist/libs
# 2. compile: compile model library with specification in mlc-chat-config.json
mlc_llm compile ./dist/phi-2-q4f16_1-MLC/mlc-chat-config.json \
    --device iphone -o dist/libs/phi-2-q4f16_1-iphone.tar
  1. (step 3.) Distribute model library and model weights (skip to upload)
ls -la dist/libs
# total 888
# drwxr-xr-x  3 my_name  staff      96  4 25 03:19 .
# drwxr-xr-x  5 my_name  staff     160  4 25 03:19 ..
# -rw-r--r--  1 my_name  staff  453277  4 25 03:19 phi-2-q4f16_1-iphone.tar

ls dist/phi-2-q4f16_1-MLC
# added_tokens.json       params_shard_13.bin     params_shard_21.bin     params_shard_3.bin      params_shard_38.bin     params_shard_46.bin     params_shard_9.bin
# merges.txt              params_shard_14.bin     params_shard_22.bin     params_shard_30.bin     params_shard_39.bin     params_shard_47.bin     tokenizer.json
# mlc-chat-config.json    params_shard_15.bin     params_shard_23.bin     params_shard_31.bin     params_shard_4.bin      params_shard_48.bin     tokenizer_config.json
# ndarray-cache.json      params_shard_16.bin     params_shard_24.bin     params_shard_32.bin     params_shard_40.bin     params_shard_49.bin     vocab.json
# params_shard_0.bin      params_shard_17.bin     params_shard_25.bin     params_shard_33.bin     params_shard_41.bin     params_shard_5.bin
# params_shard_1.bin      params_shard_18.bin     params_shard_26.bin     params_shard_34.bin     params_shard_42.bin     params_shard_50.bin
# params_shard_10.bin     params_shard_19.bin     params_shard_27.bin     params_shard_35.bin     params_shard_43.bin     params_shard_6.bin
# params_shard_11.bin     params_shard_2.bin      params_shard_28.bin     params_shard_36.bin     params_shard_44.bin     params_shard_7.bin
# params_shard_12.bin     params_shard_20.bin     params_shard_29.bin     params_shard_37.bin     params_shard_45.bin     params_shard_8.bin
  1. (step 4.) Calculate estimated VRAM usage
python -m mlc_llm.cli.model_metadata ./dist/libs/phi-2-q4f16_1-iphone.tar  --memory-only --mlc-chat-config ./dist/phi-2-q4f16_1-MLC/mlc-chat-config.json

then got below

[2024-04-25 03:21:26] ERROR model_metadata.py:172: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
  File "/Users/my_name/Dev/mlc-llm/python/mlc_llm/cli/model_metadata.py", line 170, in main
    metadata = _extract_metadata(parsed.model_lib)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/my_name/Dev/mlc-llm/python/mlc_llm/cli/model_metadata.py", line 27, in _extract_metadata
    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 87, in __init__
    self.module = rt_mod[load_exec]()
                  ~~~~~~^^^^^^^^^^^
  File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/module.py", line 192, in __getitem__
    return self.get_function(name)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/module.py", line 176, in get_function
    raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function 'vm_load_executable'

Expected behavior

as docs said

  INFO model_metadata.py:90: Total memory usage: 3042.96 MB (Parameters: 1492.45 MB. KVCache: 640.00 MB. Temporary buffer: 910.51 MB)
  INFO model_metadata.py:99: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): iOS
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Apple MacBook Pro/ 2021 14inch/ M1 Pro /32GB mem
  • How you installed MLC-LLM (conda, source): source
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.11
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: d694451c580a931116a2c93571f21f7d791c7fa0
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-04-18 10:05:07 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER: 
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
  • Any other relevant information:

Additional context

I suspect that the environment setup may have failed because the file size of the generated phi-2-q4f16_1-iphone.tar differs significantly from the file size of binary-mlc-llm-libs/phi-2/phi-2-q4f16_1-iphone.tar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions