-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
❓ General Questions
Hello.
I try to build. but i got error.
Please help me how to fix it : (
command : python build.py --hf-path=databricks/dolly-v2-3b
error : RuntimeError: #if defined(CUDA_ARCH) && (CUDA_ARCH >= 530)
\mlc-llm$ python build.py --hf-path=databricks/dolly-v2-3b
Weights exist at dist/models/dolly-v2-3b, skipping download.
Using path "dist/models/dolly-v2-3b" for model "dolly-v2-3b"
Database paths: ['log_db/vicuna-v1-7b', 'log_db/rwkv-raven-3b', 'log_db/rwkv-raven-1b5', 'log_db/redpajama-3b-q4f16', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/redpajama-3b-q4f32']
Target configured: cuda -keys=cuda,gpu -arch=sm_61 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Loading HuggingFace model into memory: dist/models/dolly-v2-3bNote: This may take a while depending on the model size and your RAM availability, and could be particularly slow if it uses swap memory.
Loading done.
Dumping independent weights dumped to: dist/dolly-v2-3b-q3f16_0/raw_params
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [00:06<00:00, 61.00it/s]
Loading model weights from: dist/dolly-v2-3b-q3f16_0/raw_params
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [00:02<00:00, 154.51it/s]
Loading done
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_61 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Traceback (most recent call last):
File "/home/gihun/IdeaProjects/230530_LLM/mlc-llm/build.py", line 409, in
main()
File "/home/gihun/IdeaProjects/230530_LLM/mlc-llm/build.py", line 387, in main
mod = mod_transform_before_build(mod, params, ARGS)
File "/home/gihun/IdeaProjects/230530_LLM/mlc-llm/build.py", line 270, in mod_transform_before_build
new_params = utils.transform_params(mod_transform, model_params)
File "/home/gihun/IdeaProjects/230530_LLM/mlc-llm/mlc_llm/utils.py", line 155, in transform_params
ex = relax.build(mod_transform, target=target)
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/relax/vm_build.py", line 338, in build
return _vmlink(builder, target, tir_mod, ext_libs, params, system_lib=system_lib)
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/relax/vm_build.py", line 242, in _vmlink
lib = tvm.build(
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/driver/build_module.py", line 281, in build
rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
7: TVMFuncCall
6: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::__mk_TVM22::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM22::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue)
5: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
4: tvm::codegen::Build(tvm::IRModule, tvm::Target)
3: _ZN3tvm7runtime13PackedFuncObj
2: tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module ()(tvm::IRModule, tvm::Target)>(tvm::runtime::Module ()(tvm::IRModule, tvm::Target), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
1: tvm::codegen::BuildCUDA(tvm::IRModule, tvm::Target)
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) [clone .cold]
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/contrib/nvcc.py", line 189, in tvm_callback_cuda_compile
ptx = compile_cuda(code, target_format="fatbin")
File "/home/gihun/IdeaProjects/230530_LLM/relax/python/tvm/contrib/nvcc.py", line 113, in compile_cuda
raise RuntimeError(msg)
RuntimeError: #if defined(CUDA_ARCH) && (CUDA_ARCH >= 530)
...
Thank you!