Skip to content

[Bug] Llama-2-7b-chat-hf-q4f16_1 model conversion reported an error: Vulkan target does not support Float16 capability.  #1606

@smileR-T

Description

@smileR-T

Perform model conversion:mlc_chat convert_weight ./dist/models/Llama2-7B-Chat-q4f16_1 --quantization q4f16_1 -o dist/Llama2-7B-Chat-q4f16_1-MLC

[2024-01-15 16:31:03] INFO auto_config.py:115: Found model configuration: dist/models/Llama2-7B-Chat-q4f16_1/config.json
[2024-01-15 16:31:03] INFO auto_device.py:85: Not found device: cuda:0
[2024-01-15 16:31:04] INFO auto_device.py:85: Not found device: rocm:0
[2024-01-15 16:31:04] INFO auto_device.py:85: Not found device: metal:0
[2024-01-15 16:31:04] INFO auto_device.py:76: Found device: vulkan:0
[2024-01-15 16:31:05] INFO auto_device.py:85: Not found device: opencl:0
[2024-01-15 16:31:05] INFO auto_device.py:33: Using device: vulkan:0
[2024-01-15 16:31:05] INFO auto_weight.py:70: Finding weights in: dist/models/Llama2-7B-Chat-q4f16_1
[2024-01-15 16:31:05] INFO auto_weight.py:120: Found source weight format: huggingface-torch. Source configuration: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json
[2024-01-15 16:31:05] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: dist/models/Llama2-7B-Chat-q4f16_1/model.safetensors.index.json
[2024-01-15 16:31:05] INFO auto_weight.py:106: Using source weight configuration: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json. Use --source to override.
[2024-01-15 16:31:05] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use --source-format to override.
[2024-01-15 16:31:05] INFO auto_config.py:153: Found model type: llama. Use --model-type to override.
Weight conversion with arguments:
--config dist/models/Llama2-7B-Chat-q4f16_1/config.json
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
--model-type llama
--device vulkan:0
--source dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model.bin.index.json
--source-format huggingface-torch
--output dist/Llama2-7B-Chat-q4f16_1-MLC
[2024-01-15 16:31:05] INFO llama_model.py:51: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
[2024-01-15 16:31:05] INFO llama_model.py:71: prefill_chunk_size defaults to context_window_size (2048)
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
[2024-01-15 16:31:10] INFO huggingface_loader.py:169: Loading HF parameters from: dist/models/Llama2-7B-Chat-q4f16_1/pytorch_model-00002-of-00002.bin
[2024-01-15 16:31:13] INFO group_quantization.py:227: Compiling quantize function for key: ((32000, 4096), float16, vulkan, axis=1, output_transpose=False)
0%| | 0/195 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/kylin/miniconda3/bin/mlc_chat", line 8, in
sys.exit(main())
^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/main.py", line 28, in main
cli.main(sys.argv[2:])
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/cli/convert_weight.py", line 87, in main
convert_weight(
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/interface/convert_weight.py", line 156, in convert_weight
_convert_args(args)
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/interface/convert_weight.py", line 107, in _convert_args
for name, param in LOADER[args.source_format](
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/loader/huggingface_loader.py", line 118, in load
q_params = self.quantize_param_map.map_funcmlc_name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/quantization/group_quantization.py", line 228, in quantize_weight
quantize_func = _compile_quantize_func(_create_quantize_func())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/mlc_chat/quantization/group_quantization.py", line 217, in _compile_quantize_func
ex = relax.build(mod, target=target)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 341, in build
return _vmlink(
^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 247, in _vmlink
lib = tvm.build(
^^^^^^^^^^
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/driver/build_module.py", line 294, in build
rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/home/kylin/miniconda3/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm.error.InternalError: Traceback (most recent call last):
10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue)
9: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
8: tvm::codegen::Build(tvm::IRModule, tvm::Target)
7: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}>(tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue)
6: tvm::codegen::BuildSPIRV(tvm::IRModule, tvm::Target)
5: tvm::codegen::LowerToSPIRV[abi:cxx11](tvm::IRModule, tvm::Target)
4: tvm::codegen::CodeGenSPIRV::BuildFunction(tvm::tir::PrimFunc const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
3: tvm::codegen::spirv::IRBuilder::GetSType(tvm::runtime::DataType const&, unsigned int, unsigned int)
2: tvm::codegen::spirv::IRBuilder::DeclareType(tvm::runtime::DataType const&, unsigned int, unsigned int)
1: tvm::codegen::spirv::IRBuilder::AddCapabilityFor(tvm::runtime::DataType const&)
0: ZN3tvm7runtime6deta
File "/workspace/tvm/src/target/spirv/ir_builder.cc", line 566
InternalError: Check failed: (spirv_support
.supports_float16) is false: Vulkan target does not support Float16 capability. If your device supports 16-bit float operations, please either add -supports_float16=1 to the target, or query all device parameters by adding -from_device=0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions