-
Couldn't load subscription status.
- Fork 1.8k
Closed
Labels
bugConfirmed bugsConfirmed bugs
Description
🐛 Bug
I encountered an error while following the step-by-step tutorial at https://llm.mlc.ai/docs/deploy/ios.html#bring-your-own-model-library. The issue occurred during Step 4 of the tutorial.
To Reproduce
Steps to reproduce the behavior:
(Do https://llm.mlc.ai/docs/deploy/ios.html#bring-your-own-model-library)
- (step 0.)Install dependencies build mlc_llm from source
# update conda
conda update --yes -n base -c defaults conda
# install `conda-libmamba-solver`
conda install --yes -n base conda-libmamba-solver
# set it as the default solver
conda config --set solver libmamba
conda env remove -n mlc-chat-venv
sudo chmod 775 ~/.conda/environments.txt
conda create -n mlc-chat-venv -c conda-forge \
"cmake>=3.24" \
rust \
git \
python=3.11
conda activate mlc-chat-venv
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
python -c "import tvm; print(tvm.__file__)"
# /Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/__init__.py
python -c "import tvm; print(tvm._ffi.base._LIB)"
# <CDLL '/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/libtvm.dylib', handle 9dad0790 at 0x105ff40d0>
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
# GIT_COMMIT_HASH: d694451c580a931116a2c93571f21f7d791c7fa0
# HIDE_PRIVATE_SYMBOLS: ON
# USE_LLVM: llvm-config --link-static
# LLVM_VERSION: 15.0.7
# USE_VULKAN: OFF
# USE_CUDA: OFF
# CUDA_VERSION: NOT-FOUND
# USE_OPENCL: OFF
# USE_METAL: ON
# USE_ROCM: OFF
# ...
python -c "import tvm; print(tvm.metal().exist)"
# True
python -c "import tvm; print(tvm.cuda().exist)"
# False
python -c "import tvm; print(tvm.vulkan().exist)"
# False
# clone from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm/
# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
##Enter TVM_HOME in absolute path. If not specified, 3rdparty/tvm will be used by default:
## ->/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm
cmake .. && cmake --build . --parallel $(nproc) && cd ..
- (step 1.)Clone from HF and convert_weight
mkdir -p dist/models && cd dist/models
# Clone HF weights
git lfs install
git clone https://huggingface.co/microsoft/phi-2
cd ../..
# Convert weight
mlc_llm convert_weight ./dist/models/phi-2/ \
--quantization q4f16_1 \
-o dist/phi-2-q4f16_1-MLC- (step 2.)Generate mlc-chat-config and compile
# 1. gen_config: generate mlc-chat-config.json and process tokenizers
mlc_llm gen_config ./dist/models/phi-2/ \
--quantization q4f16_1 --conv-template phi-2 \
-o dist/phi-2-q4f16_1-MLC/
mkdir dist/libs
# 2. compile: compile model library with specification in mlc-chat-config.json
mlc_llm compile ./dist/phi-2-q4f16_1-MLC/mlc-chat-config.json \
--device iphone -o dist/libs/phi-2-q4f16_1-iphone.tar
- (step 3.) Distribute model library and model weights (skip to upload)
ls -la dist/libs
# total 888
# drwxr-xr-x 3 my_name staff 96 4 25 03:19 .
# drwxr-xr-x 5 my_name staff 160 4 25 03:19 ..
# -rw-r--r-- 1 my_name staff 453277 4 25 03:19 phi-2-q4f16_1-iphone.tar
ls dist/phi-2-q4f16_1-MLC
# added_tokens.json params_shard_13.bin params_shard_21.bin params_shard_3.bin params_shard_38.bin params_shard_46.bin params_shard_9.bin
# merges.txt params_shard_14.bin params_shard_22.bin params_shard_30.bin params_shard_39.bin params_shard_47.bin tokenizer.json
# mlc-chat-config.json params_shard_15.bin params_shard_23.bin params_shard_31.bin params_shard_4.bin params_shard_48.bin tokenizer_config.json
# ndarray-cache.json params_shard_16.bin params_shard_24.bin params_shard_32.bin params_shard_40.bin params_shard_49.bin vocab.json
# params_shard_0.bin params_shard_17.bin params_shard_25.bin params_shard_33.bin params_shard_41.bin params_shard_5.bin
# params_shard_1.bin params_shard_18.bin params_shard_26.bin params_shard_34.bin params_shard_42.bin params_shard_50.bin
# params_shard_10.bin params_shard_19.bin params_shard_27.bin params_shard_35.bin params_shard_43.bin params_shard_6.bin
# params_shard_11.bin params_shard_2.bin params_shard_28.bin params_shard_36.bin params_shard_44.bin params_shard_7.bin
# params_shard_12.bin params_shard_20.bin params_shard_29.bin params_shard_37.bin params_shard_45.bin params_shard_8.bin- (step 4.) Calculate estimated VRAM usage
python -m mlc_llm.cli.model_metadata ./dist/libs/phi-2-q4f16_1-iphone.tar --memory-only --mlc-chat-config ./dist/phi-2-q4f16_1-MLC/mlc-chat-config.jsonthen got below
[2024-04-25 03:21:26] ERROR model_metadata.py:172: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
File "/Users/my_name/Dev/mlc-llm/python/mlc_llm/cli/model_metadata.py", line 170, in main
metadata = _extract_metadata(parsed.model_lib)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/my_name/Dev/mlc-llm/python/mlc_llm/cli/model_metadata.py", line 27, in _extract_metadata
return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 87, in __init__
self.module = rt_mod[load_exec]()
~~~~~~^^^^^^^^^^^
File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/module.py", line 192, in __getitem__
return self.get_function(name)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/my_name/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/runtime/module.py", line 176, in get_function
raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function 'vm_load_executable'Expected behavior
as docs said
INFO model_metadata.py:90: Total memory usage: 3042.96 MB (Parameters: 1492.45 MB. KVCache: 640.00 MB. Temporary buffer: 910.51 MB)
INFO model_metadata.py:99: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): iOS
- Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Apple MacBook Pro/ 2021 14inch/ M1 Pro /32GB mem
- How you installed MLC-LLM (
conda, source): source - How you installed TVM-Unity (
pip, source): pip - Python version (e.g. 3.10): 3.11
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: d694451c580a931116a2c93571f21f7d791c7fa0
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-04-18 10:05:07 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER:
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
- Any other relevant information:
Additional context
I suspect that the environment setup may have failed because the file size of the generated phi-2-q4f16_1-iphone.tar differs significantly from the file size of binary-mlc-llm-libs/phi-2/phi-2-q4f16_1-iphone.tar.
Metadata
Metadata
Assignees
Labels
bugConfirmed bugsConfirmed bugs