-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
- python build.py --hf-path=databricks/dolly-v2-3b (It is OK!)
- python evaluate.py --artifact-path dist --model dolly-v2-3b --quantization q3f16_0
Expected behavior
when Running inference ,I got this error!
Tokenizing...
Running inference...
Traceback (most recent call last):
File "/mlc-llm/evaluate.py", line 178, in /mlc-llm/evaluate.py", line 136, in deploy_to_pipeline
deploy_to_pipeline(ARGS)
File "
first_k_cache = fcache_view(kv_caches[0], ShapeTuple([7, 32, 128]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/mlc-llm/lib/python3.11/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 238, in call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
2: TVMFuncCall
1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
0: tvm::runtime::relax_vm::AttentionKVCacheObj::View(tvm::runtime::ShapeTuple const&)
File "/workspace/tvm/src/runtime/relax_vm/lm_support.cc", line 78
TVMError: Check failed: shape[0] == fill_count (7 vs. 6) : Requested shape do not match the filled count
Environment
- Platform (e.g. Intel):
- Operating system (e.g. Ubuntu):
- Device ( PC+RTX 3090, ...)
- How you installed MLC-LLM (source):
- How you installed TVM-Unity (
pip): - Python version (e.g. 3.11):
- GPU driver version (if applicable):
- CUDA/cuDNN version (if applicable):
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): - Any other relevant information:
