-
Notifications
You must be signed in to change notification settings - Fork 155
New IQ4_KT trellis implementation #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The new trellis generates int8_t values via sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126. CUDA dequantize works. AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B. PPL is on par or even slightly lower than original QTIP trellis.
We get 13.6 t/s vs 8.4 t/s with the f16 trellis and f32 arithmetic. Still somewhat slower than other quants, but no longer pathetic.
We get very respectable PP-512 = 120 t/s. TG-128 is pathetic at 5.3 t/s, so 20+% slower than the f16 variant.
We are now at 9.4 t/s, up from 6.6 t/s for the f16 trellis.
|
This looks interesting, was thinking to test this new I got it to compile CPU only e.g. cmake -B build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
cmake --build build --config Release -j $(nproc)But not having luck getting it compile with CUDA e.g. variations of: #cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_F16=ON
#cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
rm -rf ./build/
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_CCACHE=OFF
cmake --build ./build --config Release -j $(nproc)There is a warning about this switch/case fall through in 👈 Logs# the warning
[ 45%] Building CXX object ggml/src/CMakeFiles/ggml.dir/iqk/iqk_quantize.cpp.o
[ 45%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu: In function ‘void ggml_cuda_op_mul_mat_vec_q_impl(ggml_backend_cuda_context&, ggml_type, int
64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, const char*, const char*, float*, const char*, int64_t, int64_t, int64_t, int64_t, cudaStr
eam_t)’:
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:528:30: warning: this statement may fall through [-Wimplicit-fallthrough=]
528 | mul_mat_vec_iq4_kss_q8_1_cuda(src0_dd_i, src1_ddq_i, dst_dd_i, ids_data, ne00, row_diff, src1_padded_row_size, src1_ncols, nrows_d
st, ne2, nb02, nb12, nb2, ids_nb0, stream);
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:529:1: note: here
529 | case GGML_TYPE_IQ4_KT:
| ^
# the error
[ 48%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 48%] Linking CXX executable ../../bin/llama-gguf
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf/CMakeFiles/llama-gguf.dir/build.make:98: bin/llama-gguf] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2643: examples/gguf/CMakeFiles/llama-gguf.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[ 48%] Linking CXX executable ../../bin/llama-gguf-hash
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/build.make:104: bin/llama-gguf-hash] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2510: examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/all] Error 2
[ 49%] Linking CXX shared library libllama.so
[ 49%] Built target llama
gmake: *** [Makefile:146: all] Error 2For fun I tried compiling an earlier commit I see both Same error on the remote 24x core thread ripper pro and my local arch linux box:
Maybe missing adding a file here? Ahh yes.. Hrmm. I don't know how to run
|
|
Now that it seems to compile okay, giving it a try quantizing My first attempt threw an [ 4/ 808] blk.0.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points: 0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points: 1 2 1 0
cluster_points: Oops. Cluster 124 has no points: 0 3 3 1
cluster_points: Oops. Cluster 624 has no points: 0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size = 220.50 MiB -> 55.21 MiB
[ 5/ 808] blk.0.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 M
iB -> 55.21 MiBNot sure what that means, so I'm making a new imatrix using the some extra stuff from exllamav3 on top of my usual to see if it still throws the Will update this with results... EDIT Okay, it finished. And as ik mentioned below the It finished cooking so gonna give it a test then update one more time. 👈 Secret Recipe and Logs#!/usr/bin/env bash
# this script is a bit sloppy as left over from earlier experiments.
# its mostly un-needed as can just pass the quant level in a simple command.
# i don't recall why i was making attn_v.weight=q4_0 before?
# but it seems to quantize to q4_kt without any complaints...
custom=" 17:58:22 [4/1961]
#####
# Token embedding
token_embd\.weight=q8_0
#####
# Prioritize attn Layers by Cosine Similarity Scores
#blk.0.attn_k.weight, torch.bfloat16 --> BF16, shape = {5376, 2048}
#blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 5376}
#blk.0.attn_q.weight, torch.bfloat16 --> BF16, shape = {5376, 4096}
#blk.0.attn_v.weight, torch.bfloat16 --> BF16, shape = {5376, 2048}
#blk.[0-9].attn_v.weight=q4_0
#blk.[1-6][0-9].attn_v.weight=q4_0
blk.[0-9].attn_v.weight=iq4_kt
blk.[1-6][0-9].attn_v.weight=iq4_kt
#####
# Prioritize ffn Layers by Cosine Similarity Scores
#blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {21504, 5376}
#blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {5376, 21504}
#blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {5376, 21504}
"
custom=$(
echo "$custom" | grep -v '^#' | \
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)
#--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
#--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-ubergarm-calibration-corpus-v02
.dat \
./build/bin/llama-quantize \
--token-embedding-type q8_0 \
--custom-q "$custom" \
--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
/mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf \
/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf \
IQ4_KT \
24
main: build = 3748 (846c7b89)
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: quantizing '/mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf' to '/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf' as IQ4_KT using 24 threads
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 43 key-value pairs and 808 tensors from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gemma3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Gemma 3 27b It Qat Q4_0 Unquantized
llama_model_loader: - kv 3: general.finetune str = it-qat-unquantized
llama_model_loader: - kv 4: general.basename str = gemma-3
llama_model_loader: - kv 5: general.size_label str = 27B
llama_model_loader: - kv 6: general.license str = gemma
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = Gemma 3 27b It
llama_model_loader: - kv 9: general.base_model.0.organization str = Google
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv 11: general.tags arr[str,4] = ["gemma3", "gemma", "google", "image-...
llama_model_loader: - kv 12: gemma3.context_length u32 = 131072
llama_model_loader: - kv 13: gemma3.embedding_length u32 = 5376
llama_model_loader: - kv 14: gemma3.block_count u32 = 62
llama_model_loader: - kv 15: gemma3.feed_forward_length u32 = 21504
llama_model_loader: - kv 16: gemma3.attention.head_count u32 = 32
llama_model_loader: - kv 17: gemma3.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 18: gemma3.attention.key_length u32 = 128
llama_model_loader: - kv 19: gemma3.attention.value_length u32 = 128
llama_model_loader: - kv 20: general.file_type u32 = 32
llama_model_loader: - kv 21: gemma3.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 22: gemma3.attention.sliding_window u32 = 1024
llama_model_loader: - kv 23: gemma3.attention.head_count_kv u32 = 16
llama_model_loader: - kv 24: gemma3.rope.scaling.type str = linear
llama_model_loader: - kv 25: gemma3.rope.scaling.factor f32 = 8.000000
llama_model_loader: - kv 26: tokenizer.ggml.model str = llama
llama_model_loader: - kv 27: tokenizer.ggml.pre str = default
llama_model_loader: - kv 28: tokenizer.ggml.tokens arr[str,262208] = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv 29: tokenizer.ggml.scores arr[f32,262208] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,262208] = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 2
llama_model_loader: - kv 32: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 33: tokenizer.ggml.unknown_token_id u32 = 3
llama_model_loader: - kv 34: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 35: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 36: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 37: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv 38: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 39: general.quantization_version u32 = 2
llama_model_loader: - kv 40: split.no u16 = 0
llama_model_loader: - kv 41: split.count u16 = 2
llama_model_loader: - kv 42: split.tensors.count i32 = 808
llama_model_loader: - type f32: 373 tensors
llama_model_loader: - type bf16: 435 tensors
================================ Have weights data with 434 entries
[ 1/ 808] token_embd.weight - [ 5376, 262208, 1, 1], type = bf16, Using custom type q8_0 for tensor token_embd.weight
====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to q8_0 .. Adding custom rule token_embd\.weight -> q8_0
Adding custom rule blk.[0-9].attn_v.weight -> iq4_kt
Adding custom rule blk.[1-6][0-9].attn_v.weight -> iq4_kt
load_imatrix: imatrix dataset='calibration_data_v5_rc.txt'
load_imatrix: loaded 434 importance matrix entries from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat computed on 221 chunks
prepare_imatrix: have 434 importance matrix entries
size = 2688.66 MiB -> 1428.35 MiB
[ 2/ 808] blk.0.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 3/ 808] blk.0.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 4/ 808] blk.0.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points: 0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points: 1 2 1 0
cluster_points: Oops. Cluster 124 has no points: 0 3 3 1
cluster_points: Oops. Cluster 624 has no points: 0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size = 220.50 MiB -> 55.21 MiB
[ 5/ 808] blk.0.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 6/ 808] blk.0.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 7/ 808] blk.0.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 8/ 808] blk.0.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 9/ 808] blk.0.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 10/ 808] blk.0.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 11/ 808] blk.0.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 12/ 808] blk.0.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 13/ 808] blk.0.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 14/ 808] blk.0.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.0.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 15/ 808] blk.1.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 16/ 808] blk.1.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 17/ 808] blk.1.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 18/ 808] blk.1.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 19/ 808] blk.1.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 20/ 808] blk.1.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 21/ 808] blk.1.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.1.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 22/ 808] blk.1.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 23/ 808] blk.1.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 24/ 808] blk.1.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 25/ 808] blk.1.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 26/ 808] blk.1.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 27/ 808] blk.1.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 28/ 808] blk.2.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 29/ 808] blk.2.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 30/ 808] blk.2.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 31/ 808] blk.2.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 32/ 808] blk.2.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 33/ 808] blk.2.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 34/ 808] blk.2.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 35/ 808] blk.2.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 36/ 808] blk.2.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 37/ 808] blk.2.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 38/ 808] blk.2.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 39/ 808] blk.2.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 40/ 808] blk.2.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.2.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 41/ 808] blk.3.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 42/ 808] blk.3.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 43/ 808] blk.3.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 44/ 808] blk.3.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 45/ 808] blk.3.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 46/ 808] blk.3.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 47/ 808] blk.3.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 48/ 808] blk.3.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 49/ 808] blk.3.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 50/ 808] blk.3.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 51/ 808] blk.3.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 52/ 808] blk.3.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 53/ 808] blk.3.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.3.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 54/ 808] blk.4.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 55/ 808] blk.4.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 56/ 808] blk.4.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 57/ 808] blk.4.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 58/ 808] blk.4.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 59/ 808] blk.4.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 60/ 808] blk.4.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 61/ 808] blk.4.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 62/ 808] blk.4.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 63/ 808] blk.4.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 64/ 808] blk.4.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 65/ 808] blk.4.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 66/ 808] blk.4.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.4.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 67/ 808] blk.5.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 68/ 808] blk.5.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 69/ 808] blk.5.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 70/ 808] blk.5.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 71/ 808] blk.5.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 72/ 808] blk.5.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 73/ 808] blk.5.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 74/ 808] blk.5.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 75/ 808] blk.5.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 76/ 808] blk.5.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 77/ 808] blk.5.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 78/ 808] blk.5.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 79/ 808] blk.5.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.5.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 80/ 808] blk.6.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 81/ 808] blk.6.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 82/ 808] blk.6.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 83/ 808] blk.6.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 84/ 808] blk.6.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 85/ 808] blk.6.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 86/ 808] blk.6.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 87/ 808] blk.6.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 88/ 808] blk.6.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 89/ 808] blk.6.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 90/ 808] blk.6.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 91/ 808] blk.6.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 92/ 808] blk.6.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.6.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 93/ 808] blk.7.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 94/ 808] blk.7.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 95/ 808] blk.7.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 96/ 808] blk.7.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 97/ 808] blk.7.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 98/ 808] blk.7.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 99/ 808] blk.7.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.7.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 100/ 808] blk.10.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 101/ 808] blk.10.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 102/ 808] blk.10.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 103/ 808] blk.10.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 104/ 808] blk.10.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 105/ 808] blk.10.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 106/ 808] blk.10.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 107/ 808] blk.10.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 108/ 808] blk.10.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 109/ 808] blk.10.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 110/ 808] blk.10.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 111/ 808] blk.10.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 112/ 808] blk.10.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.10.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 113/ 808] blk.11.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 114/ 808] blk.11.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 115/ 808] blk.11.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 116/ 808] blk.11.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 117/ 808] blk.11.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 118/ 808] blk.11.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 119/ 808] blk.11.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 120/ 808] blk.11.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 121/ 808] blk.11.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 122/ 808] blk.11.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 123/ 808] blk.11.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 124/ 808] blk.11.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 125/ 808] blk.11.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.11.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 126/ 808] blk.12.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 127/ 808] blk.12.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 128/ 808] blk.12.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 129/ 808] blk.12.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 130/ 808] blk.12.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 131/ 808] blk.12.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 132/ 808] blk.12.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 133/ 808] blk.12.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 134/ 808] blk.12.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 135/ 808] blk.12.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 136/ 808] blk.12.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 137/ 808] blk.12.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 138/ 808] blk.12.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.12.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 139/ 808] blk.13.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 140/ 808] blk.13.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 141/ 808] blk.13.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 142/ 808] blk.13.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 143/ 808] blk.13.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 144/ 808] blk.13.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 145/ 808] blk.13.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.13.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 146/ 808] blk.7.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 147/ 808] blk.7.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 148/ 808] blk.7.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 149/ 808] blk.7.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 150/ 808] blk.7.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 151/ 808] blk.7.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 152/ 808] blk.8.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 153/ 808] blk.8.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 154/ 808] blk.8.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 155/ 808] blk.8.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 156/ 808] blk.8.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 157/ 808] blk.8.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 158/ 808] blk.8.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 159/ 808] blk.8.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 160/ 808] blk.8.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 161/ 808] blk.8.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 162/ 808] blk.8.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 163/ 808] blk.8.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 164/ 808] blk.8.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.8.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 165/ 808] blk.9.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 166/ 808] blk.9.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 167/ 808] blk.9.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 168/ 808] blk.9.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 169/ 808] blk.9.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 170/ 808] blk.9.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 171/ 808] blk.9.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 172/ 808] blk.9.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 173/ 808] blk.9.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 174/ 808] blk.9.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 175/ 808] blk.9.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 176/ 808] blk.9.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 177/ 808] blk.9.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.9.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 178/ 808] blk.13.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 179/ 808] blk.13.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 180/ 808] blk.13.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 181/ 808] blk.13.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 182/ 808] blk.13.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 183/ 808] blk.13.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 184/ 808] blk.14.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 185/ 808] blk.14.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 186/ 808] blk.14.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 187/ 808] blk.14.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 188/ 808] blk.14.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 189/ 808] blk.14.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 190/ 808] blk.14.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 191/ 808] blk.14.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 192/ 808] blk.14.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 193/ 808] blk.14.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 194/ 808] blk.14.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 195/ 808] blk.14.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 196/ 808] blk.14.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.14.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 197/ 808] blk.15.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 198/ 808] blk.15.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 199/ 808] blk.15.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 200/ 808] blk.15.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 201/ 808] blk.15.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 202/ 808] blk.15.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 203/ 808] blk.15.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 204/ 808] blk.15.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 205/ 808] blk.15.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 206/ 808] blk.15.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 207/ 808] blk.15.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 208/ 808] blk.15.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 209/ 808] blk.15.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.15.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 210/ 808] blk.16.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 211/ 808] blk.16.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 212/ 808] blk.16.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 213/ 808] blk.16.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 214/ 808] blk.16.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 215/ 808] blk.16.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 216/ 808] blk.16.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 217/ 808] blk.16.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 218/ 808] blk.16.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 219/ 808] blk.16.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 220/ 808] blk.16.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 221/ 808] blk.16.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 222/ 808] blk.16.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.16.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 223/ 808] blk.17.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 224/ 808] blk.17.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 225/ 808] blk.17.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 226/ 808] blk.17.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 227/ 808] blk.17.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 228/ 808] blk.17.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 229/ 808] blk.17.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 230/ 808] blk.17.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 231/ 808] blk.17.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 232/ 808] blk.17.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 233/ 808] blk.17.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 234/ 808] blk.17.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 235/ 808] blk.17.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.17.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 236/ 808] blk.18.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 237/ 808] blk.18.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 238/ 808] blk.18.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 239/ 808] blk.18.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 240/ 808] blk.18.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 241/ 808] blk.18.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 242/ 808] blk.18.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 243/ 808] blk.18.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 244/ 808] blk.18.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 245/ 808] blk.18.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 246/ 808] blk.18.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 247/ 808] blk.18.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 248/ 808] blk.18.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.18.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 249/ 808] blk.19.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 250/ 808] blk.19.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 251/ 808] blk.19.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 252/ 808] blk.19.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 253/ 808] blk.19.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 254/ 808] blk.19.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 255/ 808] blk.19.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.19.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 256/ 808] blk.19.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 257/ 808] blk.19.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 258/ 808] blk.19.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 259/ 808] blk.19.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 260/ 808] blk.19.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 261/ 808] blk.19.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 262/ 808] blk.20.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 263/ 808] blk.20.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 264/ 808] blk.20.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 265/ 808] blk.20.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 266/ 808] blk.20.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 267/ 808] blk.20.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 268/ 808] blk.20.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 269/ 808] blk.20.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 270/ 808] blk.20.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 271/ 808] blk.20.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 272/ 808] blk.20.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 273/ 808] blk.20.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 274/ 808] blk.20.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.20.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 275/ 808] blk.21.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 276/ 808] blk.21.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 277/ 808] blk.21.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 278/ 808] blk.21.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 279/ 808] blk.21.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 280/ 808] blk.21.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 281/ 808] blk.21.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 282/ 808] blk.21.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 283/ 808] blk.21.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 284/ 808] blk.21.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 285/ 808] blk.21.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 286/ 808] blk.21.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 287/ 808] blk.21.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.21.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 288/ 808] blk.22.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 289/ 808] blk.22.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 290/ 808] blk.22.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 291/ 808] blk.22.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 292/ 808] blk.22.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 293/ 808] blk.22.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 294/ 808] blk.22.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 295/ 808] blk.22.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 296/ 808] blk.22.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 297/ 808] blk.22.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 298/ 808] blk.22.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 299/ 808] blk.22.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 300/ 808] blk.22.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.22.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 301/ 808] blk.23.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 302/ 808] blk.23.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 303/ 808] blk.23.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 304/ 808] blk.23.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 305/ 808] blk.23.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 306/ 808] blk.23.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 307/ 808] blk.23.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 308/ 808] blk.23.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 309/ 808] blk.23.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 310/ 808] blk.23.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 311/ 808] blk.23.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 312/ 808] blk.23.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 313/ 808] blk.23.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.23.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 314/ 808] blk.24.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 315/ 808] blk.24.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 316/ 808] blk.24.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 317/ 808] blk.24.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 318/ 808] blk.24.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 319/ 808] blk.24.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 320/ 808] blk.24.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 321/ 808] blk.24.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 322/ 808] blk.24.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 323/ 808] blk.24.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 324/ 808] blk.24.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 325/ 808] blk.24.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 326/ 808] blk.24.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.24.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 327/ 808] blk.25.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 328/ 808] blk.25.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 329/ 808] blk.25.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 330/ 808] blk.25.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 331/ 808] blk.25.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 332/ 808] blk.25.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 333/ 808] blk.25.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.25.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 334/ 808] blk.25.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 335/ 808] blk.25.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 336/ 808] blk.25.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 337/ 808] blk.25.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 338/ 808] blk.25.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 339/ 808] blk.25.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 340/ 808] blk.26.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 341/ 808] blk.26.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 342/ 808] blk.26.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 343/ 808] blk.26.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 344/ 808] blk.26.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 345/ 808] blk.26.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 346/ 808] blk.26.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 347/ 808] blk.26.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 348/ 808] blk.26.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 349/ 808] blk.26.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 350/ 808] blk.26.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 351/ 808] blk.26.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 352/ 808] blk.26.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.26.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 353/ 808] blk.27.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 354/ 808] blk.27.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 355/ 808] blk.27.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 356/ 808] blk.27.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 357/ 808] blk.27.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 358/ 808] blk.27.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 359/ 808] blk.27.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 360/ 808] blk.27.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 361/ 808] blk.27.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 362/ 808] blk.27.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 363/ 808] blk.27.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 364/ 808] blk.27.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 365/ 808] blk.27.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.27.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 366/ 808] blk.28.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 367/ 808] blk.28.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 368/ 808] blk.28.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 369/ 808] blk.28.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 370/ 808] blk.28.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 371/ 808] blk.28.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 372/ 808] blk.28.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 373/ 808] blk.28.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 374/ 808] blk.28.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 375/ 808] blk.28.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 376/ 808] blk.28.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 377/ 808] blk.28.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 378/ 808] blk.28.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.28.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 379/ 808] blk.29.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 380/ 808] blk.29.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 381/ 808] blk.29.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 382/ 808] blk.29.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 383/ 808] blk.29.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 384/ 808] blk.29.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 385/ 808] blk.29.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 386/ 808] blk.29.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 387/ 808] blk.29.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 388/ 808] blk.29.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 389/ 808] blk.29.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 390/ 808] blk.29.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 391/ 808] blk.29.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.29.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 392/ 808] blk.30.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 393/ 808] blk.30.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 394/ 808] blk.30.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 395/ 808] blk.30.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 396/ 808] blk.30.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 397/ 808] blk.30.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 398/ 808] blk.30.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 399/ 808] blk.30.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 400/ 808] blk.30.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 401/ 808] blk.30.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 402/ 808] blk.30.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 403/ 808] blk.30.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 404/ 808] blk.30.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.30.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 405/ 808] blk.31.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 406/ 808] blk.31.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 407/ 808] blk.31.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 408/ 808] blk.31.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 409/ 808] blk.31.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 410/ 808] blk.31.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 411/ 808] blk.31.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.31.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 412/ 808] blk.31.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 413/ 808] blk.31.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 414/ 808] blk.31.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 415/ 808] blk.31.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 416/ 808] blk.31.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 417/ 808] blk.31.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 418/ 808] blk.32.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 419/ 808] blk.32.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 420/ 808] blk.32.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 421/ 808] blk.32.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 422/ 808] blk.32.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 423/ 808] blk.32.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 424/ 808] blk.32.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 425/ 808] blk.32.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 426/ 808] blk.32.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 427/ 808] blk.32.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 428/ 808] blk.32.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 429/ 808] blk.32.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 430/ 808] blk.32.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.32.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 431/ 808] blk.33.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 432/ 808] blk.33.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 433/ 808] blk.33.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 434/ 808] blk.33.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 435/ 808] blk.33.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 436/ 808] blk.33.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 437/ 808] blk.33.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 438/ 808] blk.33.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 439/ 808] blk.33.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 440/ 808] blk.33.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 441/ 808] blk.33.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 442/ 808] blk.33.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 443/ 808] blk.33.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.33.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 444/ 808] blk.34.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 445/ 808] blk.34.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 446/ 808] blk.34.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 447/ 808] blk.34.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 448/ 808] blk.34.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 449/ 808] blk.34.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 450/ 808] blk.34.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 451/ 808] blk.34.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 452/ 808] blk.34.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 453/ 808] blk.34.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 454/ 808] blk.34.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 455/ 808] blk.34.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 456/ 808] blk.34.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.34.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 457/ 808] blk.35.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 458/ 808] blk.35.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 459/ 808] blk.35.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 460/ 808] blk.35.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 461/ 808] blk.35.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 462/ 808] blk.35.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 463/ 808] blk.35.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 464/ 808] blk.35.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 465/ 808] blk.35.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 466/ 808] blk.35.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 467/ 808] blk.35.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 468/ 808] blk.35.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 469/ 808] blk.35.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.35.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 470/ 808] blk.36.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 471/ 808] blk.36.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 472/ 808] blk.36.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 473/ 808] blk.36.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 474/ 808] blk.36.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 475/ 808] blk.36.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 476/ 808] blk.36.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 477/ 808] blk.36.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 478/ 808] blk.36.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 479/ 808] blk.36.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 480/ 808] blk.36.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 481/ 808] blk.36.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 482/ 808] blk.36.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.36.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 483/ 808] blk.37.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 484/ 808] blk.37.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 485/ 808] blk.37.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 486/ 808] blk.37.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 487/ 808] blk.37.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 488/ 808] blk.37.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 489/ 808] blk.37.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.37.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 490/ 808] blk.37.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 491/ 808] blk.37.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 492/ 808] blk.37.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 493/ 808] blk.37.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 494/ 808] blk.37.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 495/ 808] blk.37.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 496/ 808] blk.38.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 497/ 808] blk.38.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 498/ 808] blk.38.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 499/ 808] blk.38.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 500/ 808] blk.38.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 501/ 808] blk.38.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 502/ 808] blk.38.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 503/ 808] blk.38.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 504/ 808] blk.38.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 505/ 808] blk.38.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 506/ 808] blk.38.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 507/ 808] blk.38.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 508/ 808] blk.38.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.38.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 509/ 808] blk.39.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 510/ 808] blk.39.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 511/ 808] blk.39.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 512/ 808] blk.39.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 513/ 808] blk.39.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 514/ 808] blk.39.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 515/ 808] blk.39.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 516/ 808] blk.39.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 517/ 808] blk.39.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 518/ 808] blk.39.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 519/ 808] blk.39.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 520/ 808] blk.39.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 521/ 808] blk.39.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.39.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 522/ 808] blk.40.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 523/ 808] blk.40.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 524/ 808] blk.40.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 525/ 808] blk.40.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 526/ 808] blk.40.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 527/ 808] blk.40.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 528/ 808] blk.40.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 529/ 808] blk.40.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 530/ 808] blk.40.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 531/ 808] blk.40.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 532/ 808] blk.40.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 533/ 808] blk.40.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 534/ 808] blk.40.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.40.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 535/ 808] blk.41.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 536/ 808] blk.41.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 537/ 808] blk.41.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 538/ 808] blk.41.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 539/ 808] blk.41.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 540/ 808] blk.41.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 541/ 808] blk.41.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 542/ 808] blk.41.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 543/ 808] blk.41.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 544/ 808] blk.41.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 545/ 808] blk.41.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 546/ 808] blk.41.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 547/ 808] blk.41.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.41.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 548/ 808] blk.42.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 549/ 808] blk.42.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 550/ 808] blk.42.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 551/ 808] blk.42.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 552/ 808] blk.42.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 553/ 808] blk.42.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 554/ 808] blk.42.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 555/ 808] blk.42.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 556/ 808] blk.42.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 557/ 808] blk.42.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 558/ 808] blk.42.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 559/ 808] blk.42.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 560/ 808] blk.42.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.42.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 561/ 808] blk.43.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 562/ 808] blk.43.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 563/ 808] blk.43.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 564/ 808] blk.43.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 565/ 808] blk.43.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 566/ 808] blk.43.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 567/ 808] blk.43.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.43.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 568/ 808] blk.43.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 569/ 808] blk.43.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 570/ 808] blk.43.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 571/ 808] blk.43.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 572/ 808] blk.43.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 573/ 808] blk.43.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 574/ 808] blk.44.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 575/ 808] blk.44.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 576/ 808] blk.44.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 577/ 808] blk.44.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 578/ 808] blk.44.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 579/ 808] blk.44.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 580/ 808] blk.44.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 581/ 808] blk.44.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 582/ 808] blk.44.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 583/ 808] blk.44.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 584/ 808] blk.44.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 585/ 808] blk.44.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 586/ 808] blk.44.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.44.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 587/ 808] blk.45.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 588/ 808] blk.45.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 589/ 808] blk.45.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 590/ 808] blk.45.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 591/ 808] blk.45.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 592/ 808] blk.45.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 593/ 808] blk.45.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 594/ 808] blk.45.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 595/ 808] blk.45.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 596/ 808] blk.45.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 597/ 808] blk.45.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 598/ 808] blk.45.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 599/ 808] blk.45.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.45.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 600/ 808] blk.46.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 601/ 808] blk.46.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 602/ 808] blk.46.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 603/ 808] blk.46.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 604/ 808] blk.46.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 605/ 808] blk.46.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 606/ 808] blk.46.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 607/ 808] blk.46.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 608/ 808] blk.46.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 609/ 808] blk.46.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 610/ 808] blk.46.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 611/ 808] blk.46.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 612/ 808] blk.46.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.46.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 613/ 808] blk.47.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 614/ 808] blk.47.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 615/ 808] blk.47.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 616/ 808] blk.47.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 617/ 808] blk.47.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 618/ 808] blk.47.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 619/ 808] blk.47.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 620/ 808] blk.47.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 621/ 808] blk.47.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 622/ 808] blk.47.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 623/ 808] blk.47.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 624/ 808] blk.47.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 625/ 808] blk.47.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.47.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 626/ 808] blk.48.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 627/ 808] blk.48.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 628/ 808] blk.48.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 629/ 808] blk.48.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 630/ 808] blk.48.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 631/ 808] blk.48.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 632/ 808] blk.48.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 633/ 808] blk.48.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 634/ 808] blk.48.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 635/ 808] blk.48.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 636/ 808] blk.48.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 637/ 808] blk.48.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 638/ 808] blk.48.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.48.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 639/ 808] blk.49.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 640/ 808] blk.49.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 641/ 808] blk.49.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 642/ 808] blk.49.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 643/ 808] blk.49.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 644/ 808] blk.49.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 645/ 808] blk.49.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.49.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 646/ 808] blk.49.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 647/ 808] blk.49.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 648/ 808] blk.49.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 649/ 808] blk.49.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 650/ 808] blk.49.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 651/ 808] blk.49.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 652/ 808] blk.50.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 653/ 808] blk.50.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 654/ 808] blk.50.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 655/ 808] blk.50.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 656/ 808] blk.50.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 657/ 808] blk.50.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 658/ 808] blk.50.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 659/ 808] blk.50.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 660/ 808] blk.50.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 661/ 808] blk.50.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 662/ 808] blk.50.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 663/ 808] blk.50.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 664/ 808] blk.50.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.50.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 665/ 808] blk.51.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 666/ 808] blk.51.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 667/ 808] blk.51.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 668/ 808] blk.51.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 669/ 808] blk.51.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 670/ 808] blk.51.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 671/ 808] blk.51.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 672/ 808] blk.51.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 673/ 808] blk.51.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 674/ 808] blk.51.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 675/ 808] blk.51.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 676/ 808] blk.51.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 677/ 808] blk.51.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.51.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 678/ 808] blk.52.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 679/ 808] blk.52.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 680/ 808] blk.52.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 681/ 808] blk.52.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 682/ 808] blk.52.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 683/ 808] blk.52.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 684/ 808] blk.52.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 685/ 808] blk.52.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 686/ 808] blk.52.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 687/ 808] blk.52.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 688/ 808] blk.52.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 689/ 808] blk.52.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 690/ 808] blk.52.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.52.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 691/ 808] blk.53.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 692/ 808] blk.53.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 693/ 808] blk.53.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 694/ 808] blk.53.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 695/ 808] blk.53.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 696/ 808] blk.53.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 697/ 808] blk.53.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 698/ 808] blk.53.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 699/ 808] blk.53.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 700/ 808] blk.53.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 701/ 808] blk.53.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 702/ 808] blk.53.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 703/ 808] blk.53.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.53.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 704/ 808] blk.54.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 705/ 808] blk.54.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 706/ 808] blk.54.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 707/ 808] blk.54.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 708/ 808] blk.54.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 709/ 808] blk.54.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 710/ 808] blk.54.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 711/ 808] blk.54.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 712/ 808] blk.54.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 713/ 808] blk.54.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 714/ 808] blk.54.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 715/ 808] blk.54.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 716/ 808] blk.54.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.54.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 717/ 808] blk.55.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 718/ 808] blk.55.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 719/ 808] blk.55.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 720/ 808] blk.55.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 721/ 808] blk.55.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 722/ 808] blk.55.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 723/ 808] blk.55.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.55.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 724/ 808] blk.55.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 725/ 808] blk.55.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 726/ 808] blk.55.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 727/ 808] blk.55.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 728/ 808] blk.55.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 729/ 808] blk.55.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 730/ 808] blk.56.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 731/ 808] blk.56.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 732/ 808] blk.56.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 733/ 808] blk.56.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 734/ 808] blk.56.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 735/ 808] blk.56.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 736/ 808] blk.56.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 737/ 808] blk.56.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 738/ 808] blk.56.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 739/ 808] blk.56.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 740/ 808] blk.56.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 741/ 808] blk.56.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 742/ 808] blk.56.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.56.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 743/ 808] blk.57.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 744/ 808] blk.57.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 745/ 808] blk.57.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 746/ 808] blk.57.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 747/ 808] blk.57.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 748/ 808] blk.57.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 749/ 808] blk.57.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 750/ 808] blk.57.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 751/ 808] blk.57.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 752/ 808] blk.57.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 753/ 808] blk.57.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 754/ 808] blk.57.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 755/ 808] blk.57.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.57.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 756/ 808] blk.58.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 757/ 808] blk.58.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 758/ 808] blk.58.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 759/ 808] blk.58.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 760/ 808] blk.58.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 761/ 808] blk.58.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 762/ 808] blk.58.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 763/ 808] blk.58.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 764/ 808] blk.58.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 765/ 808] blk.58.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 766/ 808] blk.58.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 767/ 808] blk.58.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 768/ 808] blk.58.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.58.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 769/ 808] blk.59.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 770/ 808] blk.59.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 771/ 808] blk.59.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 772/ 808] blk.59.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 773/ 808] blk.59.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 774/ 808] blk.59.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 775/ 808] blk.59.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 776/ 808] blk.59.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 777/ 808] blk.59.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 778/ 808] blk.59.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 779/ 808] blk.59.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 780/ 808] blk.59.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 781/ 808] blk.59.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.59.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 782/ 808] blk.60.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 783/ 808] blk.60.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 784/ 808] blk.60.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 785/ 808] blk.60.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 786/ 808] blk.60.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 787/ 808] blk.60.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 788/ 808] blk.60.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 789/ 808] blk.60.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 790/ 808] blk.60.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 791/ 808] blk.60.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 792/ 808] blk.60.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 793/ 808] blk.60.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 794/ 808] blk.60.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.60.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 795/ 808] blk.61.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 796/ 808] blk.61.attn_k_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 797/ 808] blk.61.attn_k.weight - [ 5376, 2048, 1, 1], type = bf16, converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 798/ 808] blk.61.attn_output.weight - [ 4096, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 799/ 808] blk.61.attn_q_norm.weight - [ 128, 1, 1, 1], type = f32, size = 0.000 MB
[ 800/ 808] blk.61.attn_q.weight - [ 5376, 4096, 1, 1], type = bf16, converting to iq4_kt .. size = 42.00 MiB -> 10.52 MiB
[ 801/ 808] blk.61.attn_v.weight - [ 5376, 2048, 1, 1], type = bf16, Using custom type iq4_kt for tensor blk.61.attn_v.weight
converting to iq4_kt .. size = 21.00 MiB -> 5.26 MiB
[ 802/ 808] blk.61.attn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 803/ 808] blk.61.ffn_down.weight - [21504, 5376, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.15 MiB
[ 804/ 808] blk.61.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 MiB -> 55.21 MiB
[ 805/ 808] blk.61.post_attention_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 806/ 808] blk.61.post_ffw_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 807/ 808] blk.61.ffn_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
[ 808/ 808] output_norm.weight - [ 5376, 1, 1, 1], type = f32, size = 0.021 MB
llama_model_quantize_internal: model size = 51518.82 MB
llama_model_quantize_internal: quant size = 13654.42 MB
main: quantize time = 1143720.04 ms
main: total time = 1143720.04 ms👈 Perplexity CommandPerplexity I ended up using the same imatrix.dat file for both.
This probably isn't the best comparison given gemma-3-27B-it-qat behaves unlike most "normal" non-QAT quants. But llama-perplexity runs clean with no Lastly I'll do some quick sweep benches. 👈 sweep-bench results#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_ks.gguf
model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf
#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-q4_0.gguf
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-sweep-bench \
--model "$model" \
-c 32768 \
-fa \
-ngl 99 \
--warmup-batch \
--threads 1PR505 iq4_ks
PR505 iq4_kt
PR505 q4_0
Very nice this new I'm holding off on releasing any of my experimental If you get the |
|
The Ops are harmless, just forgotten to remove
…On Sun, 8 Jun 2025 at 23:34, ubergarm ***@***.***> wrote:
*ubergarm* left a comment (ikawrakow/ik_llama.cpp#505)
<#505 (comment)>
Now that it seems to compile okay, giving it a try quantizing
gemma-3-27B-it-qat-iq4_kt
My first attempt threw an Oops Cluster N has no points but seems to keep
going okay:
[ 4/ 808] blk.0.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points: 0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points: 1 2 1 0
cluster_points: Oops. Cluster 124 has no points: 0 3 3 1
cluster_points: Oops. Cluster 624 has no points: 0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size = 220.50 MiB -> 55.21 MiB
[ 5/ 808] blk.0.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 M
iB -> 55.21 MiB
Not sure what that means, so I'm making a new imatrix using the some extra
stuff from exllamav3 on top of my usual to see if it still throws the Oops
knowing it might be completely unrelated.
Will update this with results...
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALR6H4LJBPYLWEQ2RC437S33CSM6NAVCNFSM6AAAAAB6273QMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNJUGI3DKMRRHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Yes, sorry I forgot to add the |
|
Concerning PPL: yes, |
|
Closing in favor of #529 |


This PR adds a new version of
IQ4_KTbased on a new trellis.The new trellis generated
int8_tvalues in[-126...126]instead of the original "3INST" version taken from the "3INTS" version taken from QTIP, which producesfp16values. The Gaussian distribution generated by the new trellis is much better that the original QTIP trellis. Sadly, this does not result in a lower quantization error. ForIQ4_KT, the quantization error as measured by PPL is on par, or perhaps slightly lower than the exiting implementation on the main branch. But forIQ2_KTI consistently get a higher PPL, so for now this PR only changes the implementation to the new trellis forIQ4_KT.The main advantage of the new trellis is not a lower quantization error but a massively better performance, especially on the CPU. In addition, it allows for quantized GEMM and GEMV implementation on the GPU, which avoids numerical issues with DeepSeek models when dequantizing to
fp16, along with a significantly better GEMM performance.Here some performance examples for LLaMA-3.1-8B
What is the trick? If$v$ is an unsigned 32 bit integer and $A, B$ are unsigned 32-bit integer magic constants, in both cases we use $v \to A v + B$ to generate the next trellis value. The difference comes from the conversion of $v$ to an actual values to be used as a model weight:
s = (v & M_1) ^ M_2, wherefp16values and using their sums = v & M,int8_tvalues, and the result is their sum minus 126 forM = 0x3f3f3f3f, which can be computed very efficiently without requiring nativefp16arithmetic support:__dp4a(s, 0x01010101, -126)Zen4one can use_mm256_dpbusd_epi32to compute 8 values with a single instructionNEON, where one gets 4 values in a single instruction viavdotq_s32