Skip to content

Conversation

@ikawrakow
Copy link
Owner

This PR adds a new version of IQ4_KT based on a new trellis.

The new trellis generated int8_t values in [-126...126] instead of the original "3INST" version taken from the "3INTS" version taken from QTIP, which produces fp16 values. The Gaussian distribution generated by the new trellis is much better that the original QTIP trellis. Sadly, this does not result in a lower quantization error. For IQ4_KT, the quantization error as measured by PPL is on par, or perhaps slightly lower than the exiting implementation on the main branch. But for IQ2_KT I consistently get a higher PPL, so for now this PR only changes the implementation to the new trellis for IQ4_KT.

The main advantage of the new trellis is not a lower quantization error but a massively better performance, especially on the CPU. In addition, it allows for quantized GEMM and GEMV implementation on the GPU, which avoids numerical issues with DeepSeek models when dequantizing to fp16, along with a significantly better GEMM performance.

Here some performance examples for LLaMA-3.1-8B

  • Ryzen-7950X CPU: PP-512 = 273 t/s vs 133 t/s on main. TG-128 = 13.6 t/s vs 8.4 t/s on main
  • M2-Max CPU: PP-512 = 121 t/s vs 75 t/s on main. TG-128 = 9.4 t/s vs 6.6 t/s on main
  • RTX-4080 GPU: PP-512 = 8000 t/s vs 5800 t/s on main. TG-128 = 134 t/s vs 128 t/s on main.

What is the trick? If $v$ is an unsigned 32 bit integer and $A, B$ are unsigned 32-bit integer magic constants, in both cases we use $v \to A v + B$ to generate the next trellis value. The difference comes from the conversion of $v$ to an actual values to be used as a model weight:

  • In the original QTIP trellis we have s = (v & M_1) ^ M_2, where $M_1$ and $M_2$ are suitable masks, and $s$ is another 32-bit unsigned integer. The used value is generated by viewing $s$ as two fp16 values and using their sum
  • In the new trellis we have s = v & M, $s$ is viewed as 4 int8_t values, and the result is their sum minus 126 for M = 0x3f3f3f3f, which can be computed very efficiently without requiring native fp16 arithmetic support:
    • On CUDA one can use __dp4a(s, 0x01010101, -126)
    • On Zen4 one can use _mm256_dpbusd_epi32 to compute 8 values with a single instruction
    • Same on NEON, where one gets 4 values in a single instruction via vdotq_s32

Iwan Kawrakow added 16 commits June 7, 2025 12:30
The new trellis generates int8_t values via
sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126.
CUDA dequantize works.
AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B.
PPL is on par or even slightly lower than original QTIP trellis.
We get 13.6 t/s vs 8.4 t/s with the f16 trellis and f32 arithmetic.
Still somewhat slower than other quants, but no longer pathetic.
We get very respectable PP-512 = 120 t/s.
TG-128 is pathetic at 5.3 t/s, so 20+% slower than the f16 variant.
We are now at 9.4 t/s, up from 6.6 t/s for the f16 trellis.
@ikawrakow
Copy link
Owner Author

Here a plot of the pdf generated via the the new trellis (black dots) and a Gaussian fit (red line)

trellis

One would get an even better Gaussian by summing the bytes of two trellis values (so, 8 int8_t values). But this only increases computation time without leading to a better quantization quality.

@ubergarm
Copy link
Contributor

ubergarm commented Jun 8, 2025

This looks interesting, was thinking to test this new iq4_kt implementation against my ubergarm/gemma-3-27B-it-qat-iq4_ks which is supposedly pretty good according to the linked discussion comment.

I got it to compile CPU only e.g.

cmake -B build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
cmake --build build --config Release -j $(nproc)

But not having luck getting it compile with CUDA e.g. variations of:

#cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_F16=ON
#cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1

rm -rf ./build/
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_CCACHE=OFF
cmake --build ./build --config Release -j $(nproc)

There is a warning about this switch/case fall through in mmvq.cu and a linker error about mul_mat_q_case<(ggml_type)155> ...

👈 Logs
# the warning
[ 45%] Building CXX object ggml/src/CMakeFiles/ggml.dir/iqk/iqk_quantize.cpp.o
[ 45%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu: In function ‘void ggml_cuda_op_mul_mat_vec_q_impl(ggml_backend_cuda_context&, ggml_type, int
64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, const char*, const char*, float*, const char*, int64_t, int64_t, int64_t, int64_t, cudaStr
eam_t)’:
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:528:30: warning: this statement may fall through [-Wimplicit-fallthrough=]
  528 |             mul_mat_vec_iq4_kss_q8_1_cuda(src0_dd_i, src1_ddq_i, dst_dd_i, ids_data, ne00, row_diff, src1_padded_row_size, src1_ncols, nrows_d
st, ne2, nb02, nb12, nb2, ids_nb0, stream);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:529:1: note: here
  529 |         case GGML_TYPE_IQ4_KT:
      | ^

# the error
[ 48%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 48%] Linking CXX executable ../../bin/llama-gguf
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf/CMakeFiles/llama-gguf.dir/build.make:98: bin/llama-gguf] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2643: examples/gguf/CMakeFiles/llama-gguf.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[ 48%] Linking CXX executable ../../bin/llama-gguf-hash
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/build.make:104: bin/llama-gguf-hash] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2510: examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/all] Error 2
[ 49%] Linking CXX shared library libllama.so
[ 49%] Built target llama
gmake: *** [Makefile:146: all] Error 2

For fun I tried compiling an earlier commit fb776ab closer to the CUDA implementation, but same error. I tried moving the duplicated break; which didn't effect the error. I tried rebasing it on top of main which has the IQ2_M_R4 functionality but same error.

I see both IQ4_KT = 155 and GGML_TYPE_IQ4_KT 155 but don't know enough about c++ templates to figure out what I'm missing.

Same error on the remote 24x core thread ripper pro and my local arch linux box:

  • gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
  • gcc version 14.2.1 20250128 (GCC)

Maybe missing adding a file here?

$ find . -name mmq-instance-iq4_k*
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_ks.cu
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_k.cu
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_ks_r4.cu
# no mmq-instance-iq4_kt.cu ?

Ahh yes.. Hrmm. I don't know how to run python ./ggml/src/ggml-cuda/template-instances/generate_cu_files.py so just did the dirty thing and made the following file:

./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_kt.cu

// This file has been autogenerated by generate_cu_files.py, do not edit manually.

#include "../mmq.cuh"

DECL_MMQ_CASE(GGML_TYPE_IQ4_KT);

@ubergarm
Copy link
Contributor

ubergarm commented Jun 8, 2025

Now that it seems to compile okay, giving it a try quantizing gemma-3-27B-it-qat-iq4_kt

My first attempt threw an Oops Cluster N has no points but seems to keep going okay:

[   4/ 808]                blk.0.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points:  0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points:  1 2 1 0
cluster_points: Oops. Cluster 124 has no points:  0 3 3 1
cluster_points: Oops. Cluster 624 has no points:  0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size =   220.50 MiB ->    55.21 MiB
[   5/ 808]                  blk.0.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 M
iB ->    55.21 MiB

Not sure what that means, so I'm making a new imatrix using the some extra stuff from exllamav3 on top of my usual to see if it still throws the Oops knowing it might be completely unrelated.

Will update this with results...

EDIT

Okay, it finished. And as ik mentioned below the Oops is harmless and was still there with my new imatrix in a quick test.

It finished cooking so gonna give it a test then update one more time.

👈 Secret Recipe and Logs
#!/usr/bin/env bash

# this script is a bit sloppy as left over from earlier experiments.
# its mostly un-needed as can just pass the quant level in a simple command.
# i don't recall why i was making attn_v.weight=q4_0 before?
# but it seems to quantize to q4_kt without any complaints...

custom="                                                                                                                             17:58:22 [4/1961]
#####
# Token embedding
token_embd\.weight=q8_0

#####
# Prioritize attn Layers by Cosine Similarity Scores
#blk.0.attn_k.weight,               torch.bfloat16 --> BF16, shape = {5376, 2048}
#blk.0.attn_output.weight,          torch.bfloat16 --> BF16, shape = {4096, 5376}
#blk.0.attn_q.weight,               torch.bfloat16 --> BF16, shape = {5376, 4096}
#blk.0.attn_v.weight,               torch.bfloat16 --> BF16, shape = {5376, 2048}

#blk.[0-9].attn_v.weight=q4_0
#blk.[1-6][0-9].attn_v.weight=q4_0

blk.[0-9].attn_v.weight=iq4_kt
blk.[1-6][0-9].attn_v.weight=iq4_kt

#####
# Prioritize ffn Layers by Cosine Similarity Scores
#blk.0.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {21504, 5376}
#blk.0.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {5376, 21504}
#blk.0.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {5376, 21504}
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

    #--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
    #--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-ubergarm-calibration-corpus-v02
.dat \
./build/bin/llama-quantize \
    --token-embedding-type q8_0 \
    --custom-q "$custom" \
    --imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
    /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf \
    /mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf \
    IQ4_KT \
    24


main: build = 3748 (846c7b89)
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: quantizing '/mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf' to '/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf' as IQ4_KT using 24 threads
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 43 key-value pairs and 808 tensors from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 3 27b It Qat Q4_0 Unquantized
llama_model_loader: - kv   3:                           general.finetune str              = it-qat-unquantized
llama_model_loader: - kv   4:                           general.basename str              = gemma-3
llama_model_loader: - kv   5:                         general.size_label str              = 27B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                   general.base_model.count u32              = 1
llama_model_loader: - kv   8:                  general.base_model.0.name str              = Gemma 3 27b It
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv  11:                               general.tags arr[str,4]       = ["gemma3", "gemma", "google", "image-...
llama_model_loader: - kv  12:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv  13:                    gemma3.embedding_length u32              = 5376
llama_model_loader: - kv  14:                         gemma3.block_count u32              = 62
llama_model_loader: - kv  15:                 gemma3.feed_forward_length u32              = 21504
llama_model_loader: - kv  16:                gemma3.attention.head_count u32              = 32
llama_model_loader: - kv  17:    gemma3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  18:                gemma3.attention.key_length u32              = 128
llama_model_loader: - kv  19:              gemma3.attention.value_length u32              = 128
llama_model_loader: - kv  20:                          general.file_type u32              = 32
llama_model_loader: - kv  21:                      gemma3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  22:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv  23:             gemma3.attention.head_count_kv u32              = 16
llama_model_loader: - kv  24:                   gemma3.rope.scaling.type str              = linear
llama_model_loader: - kv  25:                 gemma3.rope.scaling.factor f32              = 8.000000
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  27:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  28:                      tokenizer.ggml.tokens arr[str,262208]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  29:                      tokenizer.ggml.scores arr[f32,262208]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,262208]  = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  32:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  33:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  34:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  35:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  36:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  38:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  39:               general.quantization_version u32              = 2
llama_model_loader: - kv  40:                                   split.no u16              = 0
llama_model_loader: - kv  41:                                split.count u16              = 2
llama_model_loader: - kv  42:                        split.tensors.count i32              = 808
llama_model_loader: - type  f32:  373 tensors
llama_model_loader: - type bf16:  435 tensors
================================ Have weights data with 434 entries
[   1/ 808]                    token_embd.weight - [ 5376, 262208,     1,     1], type =   bf16, Using custom type q8_0 for tensor token_embd.weight

====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to q8_0 .. Adding custom rule token_embd\.weight -> q8_0
Adding custom rule blk.[0-9].attn_v.weight -> iq4_kt
Adding custom rule blk.[1-6][0-9].attn_v.weight -> iq4_kt
load_imatrix: imatrix dataset='calibration_data_v5_rc.txt'
load_imatrix: loaded 434 importance matrix entries from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat computed on 221 chunks
prepare_imatrix: have 434 importance matrix entries
size =  2688.66 MiB ->  1428.35 MiB
[   2/ 808]               blk.0.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   3/ 808]                blk.0.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[   4/ 808]                blk.0.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points:  0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points:  1 2 1 0
cluster_points: Oops. Cluster 124 has no points:  0 3 3 1
cluster_points: Oops. Cluster 624 has no points:  0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size =   220.50 MiB ->    55.21 MiB
[   5/ 808]                  blk.0.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[   6/ 808]     blk.0.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   7/ 808]           blk.0.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   8/ 808]                blk.0.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   9/ 808]             blk.0.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  10/ 808]                  blk.0.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  11/ 808]             blk.0.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  12/ 808]             blk.0.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  13/ 808]                  blk.0.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  14/ 808]                  blk.0.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.0.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  15/ 808]                blk.1.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  16/ 808]             blk.1.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  17/ 808]                  blk.1.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  18/ 808]             blk.1.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  19/ 808]             blk.1.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  20/ 808]                  blk.1.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  21/ 808]                  blk.1.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.1.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  22/ 808]               blk.1.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  23/ 808]                blk.1.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  24/ 808]                  blk.1.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  25/ 808]     blk.1.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  26/ 808]           blk.1.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  27/ 808]                blk.1.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  28/ 808]               blk.2.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  29/ 808]                blk.2.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  30/ 808]                blk.2.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  31/ 808]                  blk.2.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  32/ 808]     blk.2.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  33/ 808]           blk.2.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  34/ 808]                blk.2.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  35/ 808]             blk.2.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  36/ 808]                  blk.2.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  37/ 808]             blk.2.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  38/ 808]             blk.2.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  39/ 808]                  blk.2.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  40/ 808]                  blk.2.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.2.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  41/ 808]               blk.3.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  42/ 808]                blk.3.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  43/ 808]                blk.3.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  44/ 808]                  blk.3.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  45/ 808]     blk.3.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  46/ 808]           blk.3.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  47/ 808]                blk.3.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  48/ 808]             blk.3.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  49/ 808]                  blk.3.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  50/ 808]             blk.3.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  51/ 808]             blk.3.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  52/ 808]                  blk.3.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  53/ 808]                  blk.3.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.3.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  54/ 808]               blk.4.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  55/ 808]                blk.4.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  56/ 808]                blk.4.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  57/ 808]                  blk.4.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  58/ 808]     blk.4.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  59/ 808]           blk.4.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  60/ 808]                blk.4.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  61/ 808]             blk.4.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  62/ 808]                  blk.4.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  63/ 808]             blk.4.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  64/ 808]             blk.4.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  65/ 808]                  blk.4.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  66/ 808]                  blk.4.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.4.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  67/ 808]               blk.5.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  68/ 808]                blk.5.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  69/ 808]                blk.5.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  70/ 808]                  blk.5.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  71/ 808]     blk.5.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  72/ 808]           blk.5.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  73/ 808]                blk.5.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  74/ 808]             blk.5.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  75/ 808]                  blk.5.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  76/ 808]             blk.5.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  77/ 808]             blk.5.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  78/ 808]                  blk.5.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  79/ 808]                  blk.5.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.5.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  80/ 808]               blk.6.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  81/ 808]                blk.6.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  82/ 808]                blk.6.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  83/ 808]                  blk.6.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  84/ 808]     blk.6.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  85/ 808]           blk.6.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  86/ 808]                blk.6.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  87/ 808]             blk.6.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  88/ 808]                  blk.6.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  89/ 808]             blk.6.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  90/ 808]             blk.6.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  91/ 808]                  blk.6.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  92/ 808]                  blk.6.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.6.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  93/ 808]                blk.7.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  94/ 808]             blk.7.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  95/ 808]                  blk.7.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  96/ 808]             blk.7.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  97/ 808]             blk.7.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  98/ 808]                  blk.7.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  99/ 808]                  blk.7.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.7.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 100/ 808]              blk.10.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 101/ 808]               blk.10.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 102/ 808]               blk.10.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 103/ 808]                 blk.10.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 104/ 808]    blk.10.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 105/ 808]          blk.10.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 106/ 808]               blk.10.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 107/ 808]            blk.10.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 108/ 808]                 blk.10.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 109/ 808]            blk.10.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 110/ 808]            blk.10.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 111/ 808]                 blk.10.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 112/ 808]                 blk.10.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.10.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 113/ 808]              blk.11.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 114/ 808]               blk.11.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 115/ 808]               blk.11.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 116/ 808]                 blk.11.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 117/ 808]    blk.11.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 118/ 808]          blk.11.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 119/ 808]               blk.11.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 120/ 808]            blk.11.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 121/ 808]                 blk.11.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 122/ 808]            blk.11.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 123/ 808]            blk.11.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 124/ 808]                 blk.11.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 125/ 808]                 blk.11.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.11.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 126/ 808]              blk.12.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 127/ 808]               blk.12.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 128/ 808]               blk.12.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 129/ 808]                 blk.12.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 130/ 808]    blk.12.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 131/ 808]          blk.12.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 132/ 808]               blk.12.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 133/ 808]            blk.12.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 134/ 808]                 blk.12.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 135/ 808]            blk.12.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 136/ 808]            blk.12.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 137/ 808]                 blk.12.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 138/ 808]                 blk.12.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.12.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 139/ 808]               blk.13.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 140/ 808]            blk.13.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 141/ 808]                 blk.13.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 142/ 808]            blk.13.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 143/ 808]            blk.13.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 144/ 808]                 blk.13.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 145/ 808]                 blk.13.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.13.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 146/ 808]               blk.7.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 147/ 808]                blk.7.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 148/ 808]                  blk.7.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 149/ 808]     blk.7.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 150/ 808]           blk.7.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 151/ 808]                blk.7.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 152/ 808]               blk.8.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 153/ 808]                blk.8.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 154/ 808]                blk.8.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 155/ 808]                  blk.8.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 156/ 808]     blk.8.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 157/ 808]           blk.8.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 158/ 808]                blk.8.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 159/ 808]             blk.8.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 160/ 808]                  blk.8.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 161/ 808]             blk.8.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 162/ 808]             blk.8.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 163/ 808]                  blk.8.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 164/ 808]                  blk.8.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.8.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 165/ 808]               blk.9.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 166/ 808]                blk.9.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 167/ 808]                blk.9.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 168/ 808]                  blk.9.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 169/ 808]     blk.9.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 170/ 808]           blk.9.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 171/ 808]                blk.9.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 172/ 808]             blk.9.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 173/ 808]                  blk.9.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 174/ 808]             blk.9.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 175/ 808]             blk.9.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 176/ 808]                  blk.9.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 177/ 808]                  blk.9.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.9.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 178/ 808]              blk.13.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 179/ 808]               blk.13.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 180/ 808]                 blk.13.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 181/ 808]    blk.13.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 182/ 808]          blk.13.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 183/ 808]               blk.13.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 184/ 808]              blk.14.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 185/ 808]               blk.14.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 186/ 808]               blk.14.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 187/ 808]                 blk.14.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 188/ 808]    blk.14.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 189/ 808]          blk.14.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 190/ 808]               blk.14.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 191/ 808]            blk.14.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 192/ 808]                 blk.14.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 193/ 808]            blk.14.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 194/ 808]            blk.14.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 195/ 808]                 blk.14.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 196/ 808]                 blk.14.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.14.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 197/ 808]              blk.15.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 198/ 808]               blk.15.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 199/ 808]               blk.15.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 200/ 808]                 blk.15.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 201/ 808]    blk.15.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 202/ 808]          blk.15.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 203/ 808]               blk.15.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 204/ 808]            blk.15.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 205/ 808]                 blk.15.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 206/ 808]            blk.15.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 207/ 808]            blk.15.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 208/ 808]                 blk.15.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 209/ 808]                 blk.15.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.15.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 210/ 808]              blk.16.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 211/ 808]               blk.16.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 212/ 808]               blk.16.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 213/ 808]                 blk.16.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 214/ 808]    blk.16.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 215/ 808]          blk.16.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 216/ 808]               blk.16.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 217/ 808]            blk.16.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 218/ 808]                 blk.16.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 219/ 808]            blk.16.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 220/ 808]            blk.16.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 221/ 808]                 blk.16.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 222/ 808]                 blk.16.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.16.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 223/ 808]              blk.17.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 224/ 808]               blk.17.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 225/ 808]               blk.17.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 226/ 808]                 blk.17.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 227/ 808]    blk.17.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 228/ 808]          blk.17.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 229/ 808]               blk.17.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 230/ 808]            blk.17.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 231/ 808]                 blk.17.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 232/ 808]            blk.17.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 233/ 808]            blk.17.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 234/ 808]                 blk.17.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 235/ 808]                 blk.17.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.17.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 236/ 808]              blk.18.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 237/ 808]               blk.18.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 238/ 808]               blk.18.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 239/ 808]                 blk.18.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 240/ 808]    blk.18.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 241/ 808]          blk.18.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 242/ 808]               blk.18.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 243/ 808]            blk.18.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 244/ 808]                 blk.18.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 245/ 808]            blk.18.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 246/ 808]            blk.18.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 247/ 808]                 blk.18.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 248/ 808]                 blk.18.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.18.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 249/ 808]               blk.19.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 250/ 808]            blk.19.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 251/ 808]                 blk.19.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 252/ 808]            blk.19.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 253/ 808]            blk.19.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 254/ 808]                 blk.19.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 255/ 808]                 blk.19.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.19.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 256/ 808]              blk.19.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 257/ 808]               blk.19.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 258/ 808]                 blk.19.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 259/ 808]    blk.19.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 260/ 808]          blk.19.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 261/ 808]               blk.19.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 262/ 808]              blk.20.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 263/ 808]               blk.20.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 264/ 808]               blk.20.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 265/ 808]                 blk.20.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 266/ 808]    blk.20.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 267/ 808]          blk.20.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 268/ 808]               blk.20.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 269/ 808]            blk.20.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 270/ 808]                 blk.20.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 271/ 808]            blk.20.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 272/ 808]            blk.20.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 273/ 808]                 blk.20.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 274/ 808]                 blk.20.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.20.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 275/ 808]              blk.21.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 276/ 808]               blk.21.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 277/ 808]               blk.21.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 278/ 808]                 blk.21.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 279/ 808]    blk.21.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 280/ 808]          blk.21.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 281/ 808]               blk.21.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 282/ 808]            blk.21.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 283/ 808]                 blk.21.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 284/ 808]            blk.21.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 285/ 808]            blk.21.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 286/ 808]                 blk.21.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 287/ 808]                 blk.21.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.21.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 288/ 808]              blk.22.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 289/ 808]               blk.22.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 290/ 808]               blk.22.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 291/ 808]                 blk.22.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 292/ 808]    blk.22.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 293/ 808]          blk.22.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 294/ 808]               blk.22.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 295/ 808]            blk.22.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 296/ 808]                 blk.22.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 297/ 808]            blk.22.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 298/ 808]            blk.22.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 299/ 808]                 blk.22.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 300/ 808]                 blk.22.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.22.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 301/ 808]              blk.23.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 302/ 808]               blk.23.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 303/ 808]               blk.23.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 304/ 808]                 blk.23.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 305/ 808]    blk.23.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 306/ 808]          blk.23.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 307/ 808]               blk.23.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 308/ 808]            blk.23.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 309/ 808]                 blk.23.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 310/ 808]            blk.23.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 311/ 808]            blk.23.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 312/ 808]                 blk.23.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 313/ 808]                 blk.23.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.23.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 314/ 808]              blk.24.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 315/ 808]               blk.24.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 316/ 808]               blk.24.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 317/ 808]                 blk.24.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 318/ 808]    blk.24.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 319/ 808]          blk.24.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 320/ 808]               blk.24.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 321/ 808]            blk.24.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 322/ 808]                 blk.24.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 323/ 808]            blk.24.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 324/ 808]            blk.24.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 325/ 808]                 blk.24.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 326/ 808]                 blk.24.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.24.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 327/ 808]               blk.25.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 328/ 808]            blk.25.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 329/ 808]                 blk.25.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 330/ 808]            blk.25.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 331/ 808]            blk.25.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 332/ 808]                 blk.25.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 333/ 808]                 blk.25.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.25.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 334/ 808]              blk.25.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 335/ 808]               blk.25.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 336/ 808]                 blk.25.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 337/ 808]    blk.25.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 338/ 808]          blk.25.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 339/ 808]               blk.25.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 340/ 808]              blk.26.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 341/ 808]               blk.26.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 342/ 808]               blk.26.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 343/ 808]                 blk.26.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 344/ 808]    blk.26.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 345/ 808]          blk.26.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 346/ 808]               blk.26.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 347/ 808]            blk.26.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 348/ 808]                 blk.26.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 349/ 808]            blk.26.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 350/ 808]            blk.26.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 351/ 808]                 blk.26.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 352/ 808]                 blk.26.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.26.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 353/ 808]              blk.27.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 354/ 808]               blk.27.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 355/ 808]               blk.27.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 356/ 808]                 blk.27.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 357/ 808]    blk.27.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 358/ 808]          blk.27.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 359/ 808]               blk.27.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 360/ 808]            blk.27.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 361/ 808]                 blk.27.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 362/ 808]            blk.27.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 363/ 808]            blk.27.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 364/ 808]                 blk.27.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 365/ 808]                 blk.27.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.27.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 366/ 808]              blk.28.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 367/ 808]               blk.28.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 368/ 808]               blk.28.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 369/ 808]                 blk.28.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 370/ 808]    blk.28.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 371/ 808]          blk.28.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 372/ 808]               blk.28.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 373/ 808]            blk.28.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 374/ 808]                 blk.28.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 375/ 808]            blk.28.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 376/ 808]            blk.28.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 377/ 808]                 blk.28.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 378/ 808]                 blk.28.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.28.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 379/ 808]              blk.29.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 380/ 808]               blk.29.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 381/ 808]               blk.29.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 382/ 808]                 blk.29.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 383/ 808]    blk.29.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 384/ 808]          blk.29.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 385/ 808]               blk.29.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 386/ 808]            blk.29.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 387/ 808]                 blk.29.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 388/ 808]            blk.29.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 389/ 808]            blk.29.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 390/ 808]                 blk.29.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 391/ 808]                 blk.29.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.29.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 392/ 808]              blk.30.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 393/ 808]               blk.30.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 394/ 808]               blk.30.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 395/ 808]                 blk.30.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 396/ 808]    blk.30.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 397/ 808]          blk.30.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 398/ 808]               blk.30.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 399/ 808]            blk.30.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 400/ 808]                 blk.30.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 401/ 808]            blk.30.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 402/ 808]            blk.30.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 403/ 808]                 blk.30.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 404/ 808]                 blk.30.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.30.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 405/ 808]               blk.31.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 406/ 808]            blk.31.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 407/ 808]                 blk.31.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 408/ 808]            blk.31.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 409/ 808]            blk.31.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 410/ 808]                 blk.31.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 411/ 808]                 blk.31.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.31.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 412/ 808]              blk.31.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 413/ 808]               blk.31.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 414/ 808]                 blk.31.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 415/ 808]    blk.31.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 416/ 808]          blk.31.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 417/ 808]               blk.31.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 418/ 808]              blk.32.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 419/ 808]               blk.32.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 420/ 808]               blk.32.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 421/ 808]                 blk.32.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 422/ 808]    blk.32.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 423/ 808]          blk.32.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 424/ 808]               blk.32.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 425/ 808]            blk.32.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 426/ 808]                 blk.32.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 427/ 808]            blk.32.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 428/ 808]            blk.32.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 429/ 808]                 blk.32.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 430/ 808]                 blk.32.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.32.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 431/ 808]              blk.33.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 432/ 808]               blk.33.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 433/ 808]               blk.33.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 434/ 808]                 blk.33.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 435/ 808]    blk.33.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 436/ 808]          blk.33.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 437/ 808]               blk.33.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 438/ 808]            blk.33.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 439/ 808]                 blk.33.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 440/ 808]            blk.33.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 441/ 808]            blk.33.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 442/ 808]                 blk.33.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 443/ 808]                 blk.33.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.33.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 444/ 808]              blk.34.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 445/ 808]               blk.34.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 446/ 808]               blk.34.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 447/ 808]                 blk.34.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 448/ 808]    blk.34.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 449/ 808]          blk.34.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 450/ 808]               blk.34.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 451/ 808]            blk.34.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 452/ 808]                 blk.34.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 453/ 808]            blk.34.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 454/ 808]            blk.34.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 455/ 808]                 blk.34.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 456/ 808]                 blk.34.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.34.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 457/ 808]              blk.35.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 458/ 808]               blk.35.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 459/ 808]               blk.35.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 460/ 808]                 blk.35.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 461/ 808]    blk.35.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 462/ 808]          blk.35.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 463/ 808]               blk.35.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 464/ 808]            blk.35.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 465/ 808]                 blk.35.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 466/ 808]            blk.35.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 467/ 808]            blk.35.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 468/ 808]                 blk.35.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 469/ 808]                 blk.35.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.35.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 470/ 808]              blk.36.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 471/ 808]               blk.36.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 472/ 808]               blk.36.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 473/ 808]                 blk.36.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 474/ 808]    blk.36.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 475/ 808]          blk.36.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 476/ 808]               blk.36.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 477/ 808]            blk.36.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 478/ 808]                 blk.36.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 479/ 808]            blk.36.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 480/ 808]            blk.36.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 481/ 808]                 blk.36.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 482/ 808]                 blk.36.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.36.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 483/ 808]               blk.37.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 484/ 808]            blk.37.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 485/ 808]                 blk.37.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 486/ 808]            blk.37.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 487/ 808]            blk.37.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 488/ 808]                 blk.37.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 489/ 808]                 blk.37.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.37.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 490/ 808]              blk.37.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 491/ 808]               blk.37.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 492/ 808]                 blk.37.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 493/ 808]    blk.37.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 494/ 808]          blk.37.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 495/ 808]               blk.37.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 496/ 808]              blk.38.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 497/ 808]               blk.38.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 498/ 808]               blk.38.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 499/ 808]                 blk.38.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 500/ 808]    blk.38.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 501/ 808]          blk.38.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 502/ 808]               blk.38.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 503/ 808]            blk.38.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 504/ 808]                 blk.38.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 505/ 808]            blk.38.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 506/ 808]            blk.38.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 507/ 808]                 blk.38.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 508/ 808]                 blk.38.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.38.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 509/ 808]              blk.39.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 510/ 808]               blk.39.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 511/ 808]               blk.39.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 512/ 808]                 blk.39.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 513/ 808]    blk.39.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 514/ 808]          blk.39.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 515/ 808]               blk.39.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 516/ 808]            blk.39.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 517/ 808]                 blk.39.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 518/ 808]            blk.39.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 519/ 808]            blk.39.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 520/ 808]                 blk.39.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 521/ 808]                 blk.39.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.39.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 522/ 808]              blk.40.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 523/ 808]               blk.40.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 524/ 808]               blk.40.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 525/ 808]                 blk.40.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 526/ 808]    blk.40.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 527/ 808]          blk.40.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 528/ 808]               blk.40.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 529/ 808]            blk.40.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 530/ 808]                 blk.40.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 531/ 808]            blk.40.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 532/ 808]            blk.40.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 533/ 808]                 blk.40.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 534/ 808]                 blk.40.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.40.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 535/ 808]              blk.41.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 536/ 808]               blk.41.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 537/ 808]               blk.41.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 538/ 808]                 blk.41.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 539/ 808]    blk.41.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 540/ 808]          blk.41.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 541/ 808]               blk.41.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 542/ 808]            blk.41.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 543/ 808]                 blk.41.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 544/ 808]            blk.41.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 545/ 808]            blk.41.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 546/ 808]                 blk.41.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 547/ 808]                 blk.41.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.41.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 548/ 808]              blk.42.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 549/ 808]               blk.42.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 550/ 808]               blk.42.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 551/ 808]                 blk.42.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 552/ 808]    blk.42.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 553/ 808]          blk.42.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 554/ 808]               blk.42.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 555/ 808]            blk.42.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 556/ 808]                 blk.42.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 557/ 808]            blk.42.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 558/ 808]            blk.42.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 559/ 808]                 blk.42.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 560/ 808]                 blk.42.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.42.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 561/ 808]               blk.43.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 562/ 808]            blk.43.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 563/ 808]                 blk.43.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 564/ 808]            blk.43.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 565/ 808]            blk.43.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 566/ 808]                 blk.43.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 567/ 808]                 blk.43.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.43.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 568/ 808]              blk.43.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 569/ 808]               blk.43.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 570/ 808]                 blk.43.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 571/ 808]    blk.43.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 572/ 808]          blk.43.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 573/ 808]               blk.43.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 574/ 808]              blk.44.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 575/ 808]               blk.44.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 576/ 808]               blk.44.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 577/ 808]                 blk.44.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 578/ 808]    blk.44.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 579/ 808]          blk.44.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 580/ 808]               blk.44.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 581/ 808]            blk.44.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 582/ 808]                 blk.44.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 583/ 808]            blk.44.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 584/ 808]            blk.44.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 585/ 808]                 blk.44.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 586/ 808]                 blk.44.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.44.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 587/ 808]              blk.45.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 588/ 808]               blk.45.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 589/ 808]               blk.45.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 590/ 808]                 blk.45.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 591/ 808]    blk.45.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 592/ 808]          blk.45.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 593/ 808]               blk.45.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 594/ 808]            blk.45.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 595/ 808]                 blk.45.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 596/ 808]            blk.45.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 597/ 808]            blk.45.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 598/ 808]                 blk.45.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 599/ 808]                 blk.45.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.45.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 600/ 808]              blk.46.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 601/ 808]               blk.46.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 602/ 808]               blk.46.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 603/ 808]                 blk.46.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 604/ 808]    blk.46.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 605/ 808]          blk.46.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 606/ 808]               blk.46.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 607/ 808]            blk.46.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 608/ 808]                 blk.46.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 609/ 808]            blk.46.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 610/ 808]            blk.46.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 611/ 808]                 blk.46.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 612/ 808]                 blk.46.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.46.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 613/ 808]              blk.47.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 614/ 808]               blk.47.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 615/ 808]               blk.47.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 616/ 808]                 blk.47.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 617/ 808]    blk.47.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 618/ 808]          blk.47.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 619/ 808]               blk.47.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 620/ 808]            blk.47.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 621/ 808]                 blk.47.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 622/ 808]            blk.47.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 623/ 808]            blk.47.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 624/ 808]                 blk.47.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 625/ 808]                 blk.47.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.47.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 626/ 808]              blk.48.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 627/ 808]               blk.48.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 628/ 808]               blk.48.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 629/ 808]                 blk.48.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 630/ 808]    blk.48.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 631/ 808]          blk.48.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 632/ 808]               blk.48.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 633/ 808]            blk.48.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 634/ 808]                 blk.48.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 635/ 808]            blk.48.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 636/ 808]            blk.48.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 637/ 808]                 blk.48.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 638/ 808]                 blk.48.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.48.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 639/ 808]               blk.49.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 640/ 808]            blk.49.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 641/ 808]                 blk.49.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 642/ 808]            blk.49.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 643/ 808]            blk.49.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 644/ 808]                 blk.49.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 645/ 808]                 blk.49.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.49.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 646/ 808]              blk.49.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 647/ 808]               blk.49.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 648/ 808]                 blk.49.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 649/ 808]    blk.49.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 650/ 808]          blk.49.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 651/ 808]               blk.49.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 652/ 808]              blk.50.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 653/ 808]               blk.50.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 654/ 808]               blk.50.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 655/ 808]                 blk.50.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 656/ 808]    blk.50.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 657/ 808]          blk.50.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 658/ 808]               blk.50.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 659/ 808]            blk.50.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 660/ 808]                 blk.50.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 661/ 808]            blk.50.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 662/ 808]            blk.50.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 663/ 808]                 blk.50.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 664/ 808]                 blk.50.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.50.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 665/ 808]              blk.51.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 666/ 808]               blk.51.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 667/ 808]               blk.51.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 668/ 808]                 blk.51.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 669/ 808]    blk.51.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 670/ 808]          blk.51.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 671/ 808]               blk.51.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 672/ 808]            blk.51.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 673/ 808]                 blk.51.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 674/ 808]            blk.51.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 675/ 808]            blk.51.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 676/ 808]                 blk.51.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 677/ 808]                 blk.51.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.51.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 678/ 808]              blk.52.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 679/ 808]               blk.52.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 680/ 808]               blk.52.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 681/ 808]                 blk.52.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 682/ 808]    blk.52.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 683/ 808]          blk.52.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 684/ 808]               blk.52.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 685/ 808]            blk.52.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 686/ 808]                 blk.52.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 687/ 808]            blk.52.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 688/ 808]            blk.52.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 689/ 808]                 blk.52.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 690/ 808]                 blk.52.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.52.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 691/ 808]              blk.53.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 692/ 808]               blk.53.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 693/ 808]               blk.53.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 694/ 808]                 blk.53.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 695/ 808]    blk.53.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 696/ 808]          blk.53.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 697/ 808]               blk.53.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 698/ 808]            blk.53.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 699/ 808]                 blk.53.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 700/ 808]            blk.53.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 701/ 808]            blk.53.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 702/ 808]                 blk.53.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 703/ 808]                 blk.53.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.53.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 704/ 808]              blk.54.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 705/ 808]               blk.54.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 706/ 808]               blk.54.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 707/ 808]                 blk.54.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 708/ 808]    blk.54.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 709/ 808]          blk.54.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 710/ 808]               blk.54.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 711/ 808]            blk.54.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 712/ 808]                 blk.54.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 713/ 808]            blk.54.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 714/ 808]            blk.54.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 715/ 808]                 blk.54.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 716/ 808]                 blk.54.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.54.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 717/ 808]               blk.55.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 718/ 808]            blk.55.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 719/ 808]                 blk.55.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 720/ 808]            blk.55.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 721/ 808]            blk.55.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 722/ 808]                 blk.55.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 723/ 808]                 blk.55.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.55.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 724/ 808]              blk.55.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 725/ 808]               blk.55.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 726/ 808]                 blk.55.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 727/ 808]    blk.55.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 728/ 808]          blk.55.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 729/ 808]               blk.55.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 730/ 808]              blk.56.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 731/ 808]               blk.56.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 732/ 808]               blk.56.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 733/ 808]                 blk.56.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 734/ 808]    blk.56.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 735/ 808]          blk.56.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 736/ 808]               blk.56.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 737/ 808]            blk.56.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 738/ 808]                 blk.56.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 739/ 808]            blk.56.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 740/ 808]            blk.56.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 741/ 808]                 blk.56.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 742/ 808]                 blk.56.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.56.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 743/ 808]              blk.57.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 744/ 808]               blk.57.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 745/ 808]               blk.57.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 746/ 808]                 blk.57.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 747/ 808]    blk.57.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 748/ 808]          blk.57.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 749/ 808]               blk.57.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 750/ 808]            blk.57.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 751/ 808]                 blk.57.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 752/ 808]            blk.57.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 753/ 808]            blk.57.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 754/ 808]                 blk.57.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 755/ 808]                 blk.57.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.57.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 756/ 808]              blk.58.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 757/ 808]               blk.58.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 758/ 808]               blk.58.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 759/ 808]                 blk.58.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 760/ 808]    blk.58.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 761/ 808]          blk.58.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 762/ 808]               blk.58.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 763/ 808]            blk.58.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 764/ 808]                 blk.58.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 765/ 808]            blk.58.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 766/ 808]            blk.58.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 767/ 808]                 blk.58.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 768/ 808]                 blk.58.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.58.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 769/ 808]              blk.59.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 770/ 808]               blk.59.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 771/ 808]               blk.59.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 772/ 808]                 blk.59.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 773/ 808]    blk.59.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 774/ 808]          blk.59.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 775/ 808]               blk.59.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 776/ 808]            blk.59.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 777/ 808]                 blk.59.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 778/ 808]            blk.59.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 779/ 808]            blk.59.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 780/ 808]                 blk.59.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 781/ 808]                 blk.59.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.59.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 782/ 808]              blk.60.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 783/ 808]               blk.60.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 784/ 808]               blk.60.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 785/ 808]                 blk.60.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 786/ 808]    blk.60.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 787/ 808]          blk.60.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 788/ 808]               blk.60.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 789/ 808]            blk.60.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 790/ 808]                 blk.60.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 791/ 808]            blk.60.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 792/ 808]            blk.60.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 793/ 808]                 blk.60.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 794/ 808]                 blk.60.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.60.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 795/ 808]               blk.61.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 796/ 808]            blk.61.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 797/ 808]                 blk.61.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 798/ 808]            blk.61.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 799/ 808]            blk.61.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 800/ 808]                 blk.61.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 801/ 808]                 blk.61.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.61.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 802/ 808]              blk.61.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 803/ 808]               blk.61.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 804/ 808]                 blk.61.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 805/ 808]    blk.61.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 806/ 808]          blk.61.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 807/ 808]               blk.61.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 808/ 808]                   output_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
llama_model_quantize_internal: model size  = 51518.82 MB
llama_model_quantize_internal: quant size  = 13654.42 MB

main: quantize time = 1143720.04 ms
main:    total time = 1143720.04 ms
👈 Perplexity Command

Perplexity

./build/bin/llama-perplexity \
    --model /mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf \
    --ctx-size 512 \
    --ubatch-size 512 \
    -f wiki.test.raw \
    --seed 1337 \
    --n-gpu-layers 99 \
    --threads 1

I ended up using the same imatrix.dat file for both.

  • gemma-3-27B-it-qat-iq4_kt

    • 13.334 GiB (4.241 BPW)
    • Final estimate: PPL = 8.3431 +/- 0.06508
  • gemma-3-27B-it-qat-iq4_ks

    • 14.099 GiB (4.484 BPW) attn_k_b at q4_0 i forget why
    • Final estimate: PPL = 8.1750 +/- 0.06294

This probably isn't the best comparison given gemma-3-27B-it-qat behaves unlike most "normal" non-QAT quants. But llama-perplexity runs clean with no nans so that is the real test for me right now.

Lastly I'll do some quick sweep benches.

👈 sweep-bench results
#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_ks.gguf
model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf
#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-q4_0.gguf
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-sweep-bench \
  --model "$model" \
  -c 32768 \
  -fa \
  -ngl 99 \
  --warmup-batch \
  --threads 1

PR505 iq4_ks

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
512 128 0 0.342 1497.20 3.743 34.19
512 128 512 0.348 1472.43 3.794 33.74
512 128 1024 0.353 1449.53 3.830 33.42
512 128 1536 0.358 1429.54 3.888 32.92
512 128 2048 0.364 1407.80 3.950 32.40
512 128 2560 0.368 1390.97 4.070 31.45
512 128 3072 0.374 1367.73 4.088 31.31
512 128 3584 0.379 1351.24 4.128 31.01
512 128 4096 0.385 1328.47 4.179 30.63
512 128 4608 0.389 1314.88 4.228 30.27
512 128 5120 0.394 1299.02 4.280 29.90
512 128 5632 0.399 1282.70 4.372 29.28
512 128 6144 0.406 1262.58 4.395 29.13
512 128 6656 0.410 1249.42 4.445 28.80
512 128 7168 0.416 1230.78 4.493 28.49
512 128 7680 0.421 1217.42 4.536 28.22
512 128 8192 0.426 1202.74 4.650 27.53

PR505 iq4_kt

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
512 128 0 0.329 1554.50 3.594 35.61
512 128 512 0.336 1524.23 3.651 35.06
512 128 1024 0.341 1499.47 3.679 34.79
512 128 1536 0.345 1482.10 3.732 34.30
512 128 2048 0.352 1453.17 3.784 33.83
512 128 2560 0.355 1442.46 3.889 32.91
512 128 3072 0.360 1424.20 3.918 32.67
512 128 3584 0.366 1399.86 3.963 32.30
512 128 4096 0.371 1380.12 4.007 31.95
512 128 4608 0.377 1359.59 4.066 31.48
512 128 5120 0.381 1343.44 4.115 31.10
512 128 5632 0.386 1327.56 4.205 30.44
512 128 6144 0.392 1304.90 4.224 30.30
512 128 6656 0.396 1291.41 4.267 30.00
512 128 7168 0.402 1273.86 4.319 29.64
512 128 7680 0.406 1260.46 4.347 29.44
512 128 8192 0.411 1244.46 4.459 28.71

PR505 q4_0

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
512 128 0 0.306 1673.44 3.499 36.58
512 128 512 0.311 1646.26 3.542 36.14
512 128 1024 0.318 1611.98 3.579 35.76
512 128 1536 0.322 1592.37 3.639 35.18
512 128 2048 0.328 1561.52 3.698 34.61
512 128 2560 0.334 1531.43 3.817 33.53
512 128 3072 0.339 1509.07 3.827 33.45
512 128 3584 0.346 1480.93 3.870 33.07
512 128 4096 0.351 1456.85 3.921 32.64
512 128 4608 0.355 1440.80 3.972 32.22
512 128 5120 0.360 1420.48 4.024 31.81
512 128 5632 0.366 1399.51 4.101 31.21
512 128 6144 0.370 1382.54 4.134 30.96
512 128 6656 0.378 1356.18 4.180 30.63
512 128 7168 0.382 1341.12 4.239 30.20
512 128 7680 0.386 1324.94 4.277 29.93
512 128 8192 0.392 1307.55 4.387 29.18

sweep-bench-pr505-gemma-3-27b-it-qat-iq4_kt

Very nice this new iq4_kt is faster than iq4_ks and very close to q4_0! It confirmed it does get a speed benefit from compiling with -DGGML_CUDA_F16=ON.

I'm holding off on releasing any of my experimental iqN_kt quants until you're happy with everything. So feel free to make breaking changes with this stuff as far as I'm concerned.

If you get the iq3_kt going as well, it might work to help me target a ~3.3ish BPW (~256GB) R1-0528. No pressure, I'm just daydreaming hah... Cheers and thanks!

@ikawrakow
Copy link
Owner Author

ikawrakow commented Jun 8, 2025 via email

@ikawrakow
Copy link
Owner Author

Yes, sorry I forgot to add the iq4_kt MMQ template instance (it is done now). I manually add files there instead of using the Python script as it is way more complicated in ik_llama.cpp than it is in llama.cpp, so I figure it will take longer to change the script than to just manually add a file from time to time.

@ikawrakow ikawrakow mentioned this pull request Jun 9, 2025
@ikawrakow
Copy link
Owner Author

Concerning PPL: yes, IQ4_KT is not quite on par with IQ4_KS. It is 4.0 bpw versus 4.25 bpw for IQ4_KS, so PPL is somewhat higher. But it is better than IQ4_KSS, which is also exactly 4.0 bpw. As you get to 4 bpw, the benefit of using a trellis becomes very small.

@ikawrakow
Copy link
Owner Author

Closing in favor of #529

@ikawrakow ikawrakow closed this Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants