New IQ4_KT trellis implementation #505

ikawrakow · 2025-06-08T11:27:04Z

This PR adds a new version of IQ4_KT based on a new trellis.

The new trellis generated int8_t values in [-126...126] instead of the original "3INST" version taken from the "3INTS" version taken from QTIP, which produces fp16 values. The Gaussian distribution generated by the new trellis is much better that the original QTIP trellis. Sadly, this does not result in a lower quantization error. For IQ4_KT, the quantization error as measured by PPL is on par, or perhaps slightly lower than the exiting implementation on the main branch. But for IQ2_KT I consistently get a higher PPL, so for now this PR only changes the implementation to the new trellis for IQ4_KT.

The main advantage of the new trellis is not a lower quantization error but a massively better performance, especially on the CPU. In addition, it allows for quantized GEMM and GEMV implementation on the GPU, which avoids numerical issues with DeepSeek models when dequantizing to fp16, along with a significantly better GEMM performance.

Here some performance examples for LLaMA-3.1-8B

Ryzen-7950X CPU: PP-512 = 273 t/s vs 133 t/s on main. TG-128 = 13.6 t/s vs 8.4 t/s on main
M2-Max CPU: PP-512 = 121 t/s vs 75 t/s on main. TG-128 = 9.4 t/s vs 6.6 t/s on main
RTX-4080 GPU: PP-512 = 8000 t/s vs 5800 t/s on main. TG-128 = 134 t/s vs 128 t/s on main.

What is the trick? If $v$ is an unsigned 32 bit integer and $A, B$ are unsigned 32-bit integer magic constants, in both cases we use $v \to A v + B$ to generate the next trellis value. The difference comes from the conversion of $v$ to an actual values to be used as a model weight:

In the original QTIP trellis we have s = (v & M_1) ^ M_2, where $M_1$ and $M_2$ are suitable masks, and $s$ is another 32-bit unsigned integer. The used value is generated by viewing $s$ as two fp16 values and using their sum
In the new trellis we have s = v & M, $s$ is viewed as 4 int8_t values, and the result is their sum minus 126 for M = 0x3f3f3f3f, which can be computed very efficiently without requiring native fp16 arithmetic support:
- On CUDA one can use __dp4a(s, 0x01010101, -126)
- On Zen4 one can use _mm256_dpbusd_epi32 to compute 8 values with a single instruction
- Same on NEON, where one gets 4 values in a single instruction via vdotq_s32

The new trellis generates int8_t values via sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126. CUDA dequantize works. AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B. PPL is on par or even slightly lower than original QTIP trellis.

We get 13.6 t/s vs 8.4 t/s with the f16 trellis and f32 arithmetic. Still somewhat slower than other quants, but no longer pathetic.

We get very respectable PP-512 = 120 t/s. TG-128 is pathetic at 5.3 t/s, so 20+% slower than the f16 variant.

We are now at 9.4 t/s, up from 6.6 t/s for the f16 trellis.

ikawrakow · 2025-06-08T11:37:36Z

Here a plot of the pdf generated via the the new trellis (black dots) and a Gaussian fit (red line)

One would get an even better Gaussian by summing the bytes of two trellis values (so, 8 int8_t values). But this only increases computation time without leading to a better quantization quality.

ubergarm · 2025-06-08T19:45:04Z

This looks interesting, was thinking to test this new iq4_kt implementation against my ubergarm/gemma-3-27B-it-qat-iq4_ks which is supposedly pretty good according to the linked discussion comment.

I got it to compile CPU only e.g.

cmake -B build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
cmake --build build --config Release -j $(nproc)

But not having luck getting it compile with CUDA e.g. variations of:

#cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_F16=ON
#cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1

rm -rf ./build/
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_CCACHE=OFF
cmake --build ./build --config Release -j $(nproc)

There is a warning about this switch/case fall through in mmvq.cu and a linker error about mul_mat_q_case<(ggml_type)155> ...

👈 Logs

# the warning
[ 45%] Building CXX object ggml/src/CMakeFiles/ggml.dir/iqk/iqk_quantize.cpp.o
[ 45%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu: In function ‘void ggml_cuda_op_mul_mat_vec_q_impl(ggml_backend_cuda_context&, ggml_type, int
64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, const char*, const char*, float*, const char*, int64_t, int64_t, int64_t, int64_t, cudaStr
eam_t)’:
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:528:30: warning: this statement may fall through [-Wimplicit-fallthrough=]
  528 |             mul_mat_vec_iq4_kss_q8_1_cuda(src0_dd_i, src1_ddq_i, dst_dd_i, ids_data, ne00, row_diff, src1_padded_row_size, src1_ncols, nrows_d
st, ne2, nb02, nb12, nb2, ids_nb0, stream);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/w/projects/ik_llama.cpp/ggml/src/ggml-cuda/mmvq.cu:529:1: note: here
  529 |         case GGML_TYPE_IQ4_KT:
      | ^

# the error
[ 48%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 48%] Linking CXX executable ../../bin/llama-gguf
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf/CMakeFiles/llama-gguf.dir/build.make:98: bin/llama-gguf] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2643: examples/gguf/CMakeFiles/llama-gguf.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[ 48%] Linking CXX executable ../../bin/llama-gguf-hash
/usr/bin/ld: ../../ggml/src/libggml.so: undefined reference to `void mul_mat_q_case<(ggml_type)155>(ggml_backend_cuda_context&, mmq_args const&, CUstr
eam_st*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/build.make:104: bin/llama-gguf-hash] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2510: examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/all] Error 2
[ 49%] Linking CXX shared library libllama.so
[ 49%] Built target llama
gmake: *** [Makefile:146: all] Error 2

For fun I tried compiling an earlier commit fb776ab closer to the CUDA implementation, but same error. I tried moving the duplicated break; which didn't effect the error. I tried rebasing it on top of main which has the IQ2_M_R4 functionality but same error.

I see both IQ4_KT = 155 and GGML_TYPE_IQ4_KT 155 but don't know enough about c++ templates to figure out what I'm missing.

Same error on the remote 24x core thread ripper pro and my local arch linux box:

gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
gcc version 14.2.1 20250128 (GCC)

Maybe missing adding a file here?

$ find . -name mmq-instance-iq4_k*
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_ks.cu
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_k.cu
./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_ks_r4.cu
# no mmq-instance-iq4_kt.cu ?

Ahh yes.. Hrmm. I don't know how to run python ./ggml/src/ggml-cuda/template-instances/generate_cu_files.py so just did the dirty thing and made the following file:

./ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_kt.cu

// This file has been autogenerated by generate_cu_files.py, do not edit manually.

#include "../mmq.cuh"

DECL_MMQ_CASE(GGML_TYPE_IQ4_KT);

ubergarm · 2025-06-08T20:34:25Z

Now that it seems to compile okay, giving it a try quantizing gemma-3-27B-it-qat-iq4_kt

My first attempt threw an Oops Cluster N has no points but seems to keep going okay:

[   4/ 808]                blk.0.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points:  0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points:  1 2 1 0
cluster_points: Oops. Cluster 124 has no points:  0 3 3 1
cluster_points: Oops. Cluster 624 has no points:  0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size =   220.50 MiB ->    55.21 MiB
[   5/ 808]                  blk.0.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 M
iB ->    55.21 MiB

Not sure what that means, so I'm making a new imatrix using the some extra stuff from exllamav3 on top of my usual to see if it still throws the Oops knowing it might be completely unrelated.

Will update this with results...

EDIT

Okay, it finished. And as ik mentioned below the Oops is harmless and was still there with my new imatrix in a quick test.

It finished cooking so gonna give it a test then update one more time.

👈 Secret Recipe and Logs

#!/usr/bin/env bash

# this script is a bit sloppy as left over from earlier experiments.
# its mostly un-needed as can just pass the quant level in a simple command.
# i don't recall why i was making attn_v.weight=q4_0 before?
# but it seems to quantize to q4_kt without any complaints...

custom="                                                                                                                             17:58:22 [4/1961]
#####
# Token embedding
token_embd\.weight=q8_0

#####
# Prioritize attn Layers by Cosine Similarity Scores
#blk.0.attn_k.weight,               torch.bfloat16 --> BF16, shape = {5376, 2048}
#blk.0.attn_output.weight,          torch.bfloat16 --> BF16, shape = {4096, 5376}
#blk.0.attn_q.weight,               torch.bfloat16 --> BF16, shape = {5376, 4096}
#blk.0.attn_v.weight,               torch.bfloat16 --> BF16, shape = {5376, 2048}

#blk.[0-9].attn_v.weight=q4_0
#blk.[1-6][0-9].attn_v.weight=q4_0

blk.[0-9].attn_v.weight=iq4_kt
blk.[1-6][0-9].attn_v.weight=iq4_kt

#####
# Prioritize ffn Layers by Cosine Similarity Scores
#blk.0.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {21504, 5376}
#blk.0.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {5376, 21504}
#blk.0.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {5376, 21504}
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

    #--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
    #--imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-ubergarm-calibration-corpus-v02
.dat \
./build/bin/llama-quantize \
    --token-embedding-type q8_0 \
    --custom-q "$custom" \
    --imatrix /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat \
    /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf \
    /mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf \
    IQ4_KT \
    24


main: build = 3748 (846c7b89)
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: quantizing '/mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf' to '/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf' as IQ4_KT using 24 threads
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 43 key-value pairs and 808 tensors from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/gemma-3-27B-it-qat-unquantized-BF16-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 3 27b It Qat Q4_0 Unquantized
llama_model_loader: - kv   3:                           general.finetune str              = it-qat-unquantized
llama_model_loader: - kv   4:                           general.basename str              = gemma-3
llama_model_loader: - kv   5:                         general.size_label str              = 27B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                   general.base_model.count u32              = 1
llama_model_loader: - kv   8:                  general.base_model.0.name str              = Gemma 3 27b It
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv  11:                               general.tags arr[str,4]       = ["gemma3", "gemma", "google", "image-...
llama_model_loader: - kv  12:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv  13:                    gemma3.embedding_length u32              = 5376
llama_model_loader: - kv  14:                         gemma3.block_count u32              = 62
llama_model_loader: - kv  15:                 gemma3.feed_forward_length u32              = 21504
llama_model_loader: - kv  16:                gemma3.attention.head_count u32              = 32
llama_model_loader: - kv  17:    gemma3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  18:                gemma3.attention.key_length u32              = 128
llama_model_loader: - kv  19:              gemma3.attention.value_length u32              = 128
llama_model_loader: - kv  20:                          general.file_type u32              = 32
llama_model_loader: - kv  21:                      gemma3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  22:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv  23:             gemma3.attention.head_count_kv u32              = 16
llama_model_loader: - kv  24:                   gemma3.rope.scaling.type str              = linear
llama_model_loader: - kv  25:                 gemma3.rope.scaling.factor f32              = 8.000000
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  27:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  28:                      tokenizer.ggml.tokens arr[str,262208]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  29:                      tokenizer.ggml.scores arr[f32,262208]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,262208]  = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  32:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  33:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  34:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  35:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  36:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  38:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  39:               general.quantization_version u32              = 2
llama_model_loader: - kv  40:                                   split.no u16              = 0
llama_model_loader: - kv  41:                                split.count u16              = 2
llama_model_loader: - kv  42:                        split.tensors.count i32              = 808
llama_model_loader: - type  f32:  373 tensors
llama_model_loader: - type bf16:  435 tensors
================================ Have weights data with 434 entries
[   1/ 808]                    token_embd.weight - [ 5376, 262208,     1,     1], type =   bf16, Using custom type q8_0 for tensor token_embd.weight

====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to q8_0 .. Adding custom rule token_embd\.weight -> q8_0
Adding custom rule blk.[0-9].attn_v.weight -> iq4_kt
Adding custom rule blk.[1-6][0-9].attn_v.weight -> iq4_kt
load_imatrix: imatrix dataset='calibration_data_v5_rc.txt'
load_imatrix: loaded 434 importance matrix entries from /mnt/raid/models/google/gemma-3-27b-it-qat-q4_0-unquantized/imatrix-gemma-3-27B-it-qat-unquantized-BF16-calibration_data_v5_rc.dat computed on 221 chunks
prepare_imatrix: have 434 importance matrix entries
size =  2688.66 MiB ->  1428.35 MiB
[   2/ 808]               blk.0.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   3/ 808]                blk.0.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[   4/ 808]                blk.0.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points:  0 3 2 1
cluster_points: 1 out of 625 clusters dir not have any points
cluster_points: Oops. Cluster 25 has no points:  1 2 1 0
cluster_points: Oops. Cluster 124 has no points:  0 3 3 1
cluster_points: Oops. Cluster 624 has no points:  0 0 3 1
cluster_points: 3 out of 625 clusters dir not have any points
size =   220.50 MiB ->    55.21 MiB
[   5/ 808]                  blk.0.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[   6/ 808]     blk.0.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   7/ 808]           blk.0.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   8/ 808]                blk.0.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[   9/ 808]             blk.0.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  10/ 808]                  blk.0.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  11/ 808]             blk.0.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  12/ 808]             blk.0.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  13/ 808]                  blk.0.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  14/ 808]                  blk.0.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.0.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  15/ 808]                blk.1.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  16/ 808]             blk.1.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  17/ 808]                  blk.1.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  18/ 808]             blk.1.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  19/ 808]             blk.1.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  20/ 808]                  blk.1.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  21/ 808]                  blk.1.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.1.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  22/ 808]               blk.1.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  23/ 808]                blk.1.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  24/ 808]                  blk.1.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  25/ 808]     blk.1.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  26/ 808]           blk.1.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  27/ 808]                blk.1.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  28/ 808]               blk.2.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  29/ 808]                blk.2.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  30/ 808]                blk.2.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  31/ 808]                  blk.2.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  32/ 808]     blk.2.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  33/ 808]           blk.2.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  34/ 808]                blk.2.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  35/ 808]             blk.2.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  36/ 808]                  blk.2.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  37/ 808]             blk.2.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  38/ 808]             blk.2.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  39/ 808]                  blk.2.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  40/ 808]                  blk.2.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.2.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  41/ 808]               blk.3.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  42/ 808]                blk.3.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  43/ 808]                blk.3.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  44/ 808]                  blk.3.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  45/ 808]     blk.3.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  46/ 808]           blk.3.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  47/ 808]                blk.3.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  48/ 808]             blk.3.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  49/ 808]                  blk.3.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  50/ 808]             blk.3.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  51/ 808]             blk.3.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  52/ 808]                  blk.3.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  53/ 808]                  blk.3.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.3.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  54/ 808]               blk.4.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  55/ 808]                blk.4.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  56/ 808]                blk.4.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  57/ 808]                  blk.4.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  58/ 808]     blk.4.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  59/ 808]           blk.4.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  60/ 808]                blk.4.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  61/ 808]             blk.4.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  62/ 808]                  blk.4.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  63/ 808]             blk.4.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  64/ 808]             blk.4.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  65/ 808]                  blk.4.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  66/ 808]                  blk.4.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.4.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  67/ 808]               blk.5.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  68/ 808]                blk.5.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  69/ 808]                blk.5.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  70/ 808]                  blk.5.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  71/ 808]     blk.5.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  72/ 808]           blk.5.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  73/ 808]                blk.5.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  74/ 808]             blk.5.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  75/ 808]                  blk.5.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  76/ 808]             blk.5.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  77/ 808]             blk.5.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  78/ 808]                  blk.5.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  79/ 808]                  blk.5.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.5.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  80/ 808]               blk.6.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  81/ 808]                blk.6.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[  82/ 808]                blk.6.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  83/ 808]                  blk.6.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  84/ 808]     blk.6.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  85/ 808]           blk.6.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  86/ 808]                blk.6.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[  87/ 808]             blk.6.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  88/ 808]                  blk.6.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  89/ 808]             blk.6.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  90/ 808]             blk.6.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  91/ 808]                  blk.6.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  92/ 808]                  blk.6.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.6.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  93/ 808]                blk.7.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[  94/ 808]             blk.7.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  95/ 808]                  blk.7.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[  96/ 808]             blk.7.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  97/ 808]             blk.7.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  98/ 808]                  blk.7.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[  99/ 808]                  blk.7.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.7.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 100/ 808]              blk.10.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 101/ 808]               blk.10.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 102/ 808]               blk.10.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 103/ 808]                 blk.10.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 104/ 808]    blk.10.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 105/ 808]          blk.10.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 106/ 808]               blk.10.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 107/ 808]            blk.10.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 108/ 808]                 blk.10.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 109/ 808]            blk.10.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 110/ 808]            blk.10.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 111/ 808]                 blk.10.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 112/ 808]                 blk.10.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.10.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 113/ 808]              blk.11.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 114/ 808]               blk.11.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 115/ 808]               blk.11.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 116/ 808]                 blk.11.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 117/ 808]    blk.11.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 118/ 808]          blk.11.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 119/ 808]               blk.11.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 120/ 808]            blk.11.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 121/ 808]                 blk.11.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 122/ 808]            blk.11.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 123/ 808]            blk.11.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 124/ 808]                 blk.11.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 125/ 808]                 blk.11.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.11.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 126/ 808]              blk.12.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 127/ 808]               blk.12.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 128/ 808]               blk.12.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 129/ 808]                 blk.12.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 130/ 808]    blk.12.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 131/ 808]          blk.12.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 132/ 808]               blk.12.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 133/ 808]            blk.12.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 134/ 808]                 blk.12.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 135/ 808]            blk.12.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 136/ 808]            blk.12.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 137/ 808]                 blk.12.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 138/ 808]                 blk.12.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.12.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 139/ 808]               blk.13.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 140/ 808]            blk.13.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 141/ 808]                 blk.13.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 142/ 808]            blk.13.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 143/ 808]            blk.13.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 144/ 808]                 blk.13.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 145/ 808]                 blk.13.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.13.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 146/ 808]               blk.7.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 147/ 808]                blk.7.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 148/ 808]                  blk.7.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 149/ 808]     blk.7.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 150/ 808]           blk.7.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 151/ 808]                blk.7.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 152/ 808]               blk.8.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 153/ 808]                blk.8.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 154/ 808]                blk.8.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 155/ 808]                  blk.8.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 156/ 808]     blk.8.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 157/ 808]           blk.8.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 158/ 808]                blk.8.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 159/ 808]             blk.8.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 160/ 808]                  blk.8.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 161/ 808]             blk.8.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 162/ 808]             blk.8.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 163/ 808]                  blk.8.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 164/ 808]                  blk.8.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.8.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 165/ 808]               blk.9.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 166/ 808]                blk.9.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 167/ 808]                blk.9.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 168/ 808]                  blk.9.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 169/ 808]     blk.9.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 170/ 808]           blk.9.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 171/ 808]                blk.9.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 172/ 808]             blk.9.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 173/ 808]                  blk.9.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 174/ 808]             blk.9.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 175/ 808]             blk.9.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 176/ 808]                  blk.9.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 177/ 808]                  blk.9.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.9.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 178/ 808]              blk.13.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 179/ 808]               blk.13.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 180/ 808]                 blk.13.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 181/ 808]    blk.13.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 182/ 808]          blk.13.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 183/ 808]               blk.13.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 184/ 808]              blk.14.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 185/ 808]               blk.14.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 186/ 808]               blk.14.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 187/ 808]                 blk.14.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 188/ 808]    blk.14.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 189/ 808]          blk.14.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 190/ 808]               blk.14.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 191/ 808]            blk.14.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 192/ 808]                 blk.14.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 193/ 808]            blk.14.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 194/ 808]            blk.14.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 195/ 808]                 blk.14.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 196/ 808]                 blk.14.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.14.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 197/ 808]              blk.15.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 198/ 808]               blk.15.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 199/ 808]               blk.15.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 200/ 808]                 blk.15.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 201/ 808]    blk.15.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 202/ 808]          blk.15.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 203/ 808]               blk.15.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 204/ 808]            blk.15.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 205/ 808]                 blk.15.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 206/ 808]            blk.15.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 207/ 808]            blk.15.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 208/ 808]                 blk.15.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 209/ 808]                 blk.15.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.15.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 210/ 808]              blk.16.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 211/ 808]               blk.16.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 212/ 808]               blk.16.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 213/ 808]                 blk.16.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 214/ 808]    blk.16.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 215/ 808]          blk.16.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 216/ 808]               blk.16.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 217/ 808]            blk.16.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 218/ 808]                 blk.16.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 219/ 808]            blk.16.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 220/ 808]            blk.16.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 221/ 808]                 blk.16.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 222/ 808]                 blk.16.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.16.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 223/ 808]              blk.17.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 224/ 808]               blk.17.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 225/ 808]               blk.17.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 226/ 808]                 blk.17.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 227/ 808]    blk.17.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 228/ 808]          blk.17.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 229/ 808]               blk.17.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 230/ 808]            blk.17.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 231/ 808]                 blk.17.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 232/ 808]            blk.17.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 233/ 808]            blk.17.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 234/ 808]                 blk.17.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 235/ 808]                 blk.17.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.17.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 236/ 808]              blk.18.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 237/ 808]               blk.18.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 238/ 808]               blk.18.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 239/ 808]                 blk.18.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 240/ 808]    blk.18.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 241/ 808]          blk.18.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 242/ 808]               blk.18.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 243/ 808]            blk.18.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 244/ 808]                 blk.18.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 245/ 808]            blk.18.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 246/ 808]            blk.18.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 247/ 808]                 blk.18.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 248/ 808]                 blk.18.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.18.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 249/ 808]               blk.19.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 250/ 808]            blk.19.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 251/ 808]                 blk.19.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 252/ 808]            blk.19.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 253/ 808]            blk.19.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 254/ 808]                 blk.19.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 255/ 808]                 blk.19.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.19.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 256/ 808]              blk.19.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 257/ 808]               blk.19.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 258/ 808]                 blk.19.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 259/ 808]    blk.19.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 260/ 808]          blk.19.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 261/ 808]               blk.19.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 262/ 808]              blk.20.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 263/ 808]               blk.20.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 264/ 808]               blk.20.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 265/ 808]                 blk.20.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 266/ 808]    blk.20.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 267/ 808]          blk.20.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 268/ 808]               blk.20.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 269/ 808]            blk.20.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 270/ 808]                 blk.20.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 271/ 808]            blk.20.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 272/ 808]            blk.20.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 273/ 808]                 blk.20.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 274/ 808]                 blk.20.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.20.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 275/ 808]              blk.21.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 276/ 808]               blk.21.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 277/ 808]               blk.21.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 278/ 808]                 blk.21.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 279/ 808]    blk.21.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 280/ 808]          blk.21.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 281/ 808]               blk.21.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 282/ 808]            blk.21.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 283/ 808]                 blk.21.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 284/ 808]            blk.21.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 285/ 808]            blk.21.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 286/ 808]                 blk.21.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 287/ 808]                 blk.21.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.21.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 288/ 808]              blk.22.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 289/ 808]               blk.22.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 290/ 808]               blk.22.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 291/ 808]                 blk.22.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 292/ 808]    blk.22.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 293/ 808]          blk.22.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 294/ 808]               blk.22.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 295/ 808]            blk.22.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 296/ 808]                 blk.22.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 297/ 808]            blk.22.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 298/ 808]            blk.22.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 299/ 808]                 blk.22.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 300/ 808]                 blk.22.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.22.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 301/ 808]              blk.23.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 302/ 808]               blk.23.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 303/ 808]               blk.23.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 304/ 808]                 blk.23.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 305/ 808]    blk.23.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 306/ 808]          blk.23.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 307/ 808]               blk.23.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 308/ 808]            blk.23.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 309/ 808]                 blk.23.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 310/ 808]            blk.23.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 311/ 808]            blk.23.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 312/ 808]                 blk.23.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 313/ 808]                 blk.23.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.23.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 314/ 808]              blk.24.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 315/ 808]               blk.24.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 316/ 808]               blk.24.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 317/ 808]                 blk.24.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 318/ 808]    blk.24.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 319/ 808]          blk.24.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 320/ 808]               blk.24.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 321/ 808]            blk.24.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 322/ 808]                 blk.24.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 323/ 808]            blk.24.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 324/ 808]            blk.24.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 325/ 808]                 blk.24.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 326/ 808]                 blk.24.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.24.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 327/ 808]               blk.25.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 328/ 808]            blk.25.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 329/ 808]                 blk.25.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 330/ 808]            blk.25.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 331/ 808]            blk.25.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 332/ 808]                 blk.25.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 333/ 808]                 blk.25.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.25.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 334/ 808]              blk.25.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 335/ 808]               blk.25.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 336/ 808]                 blk.25.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 337/ 808]    blk.25.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 338/ 808]          blk.25.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 339/ 808]               blk.25.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 340/ 808]              blk.26.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 341/ 808]               blk.26.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 342/ 808]               blk.26.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 343/ 808]                 blk.26.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 344/ 808]    blk.26.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 345/ 808]          blk.26.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 346/ 808]               blk.26.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 347/ 808]            blk.26.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 348/ 808]                 blk.26.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 349/ 808]            blk.26.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 350/ 808]            blk.26.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 351/ 808]                 blk.26.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 352/ 808]                 blk.26.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.26.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 353/ 808]              blk.27.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 354/ 808]               blk.27.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 355/ 808]               blk.27.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 356/ 808]                 blk.27.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 357/ 808]    blk.27.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 358/ 808]          blk.27.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 359/ 808]               blk.27.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 360/ 808]            blk.27.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 361/ 808]                 blk.27.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 362/ 808]            blk.27.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 363/ 808]            blk.27.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 364/ 808]                 blk.27.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 365/ 808]                 blk.27.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.27.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 366/ 808]              blk.28.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 367/ 808]               blk.28.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 368/ 808]               blk.28.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 369/ 808]                 blk.28.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 370/ 808]    blk.28.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 371/ 808]          blk.28.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 372/ 808]               blk.28.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 373/ 808]            blk.28.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 374/ 808]                 blk.28.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 375/ 808]            blk.28.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 376/ 808]            blk.28.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 377/ 808]                 blk.28.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 378/ 808]                 blk.28.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.28.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 379/ 808]              blk.29.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 380/ 808]               blk.29.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 381/ 808]               blk.29.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 382/ 808]                 blk.29.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 383/ 808]    blk.29.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 384/ 808]          blk.29.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 385/ 808]               blk.29.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 386/ 808]            blk.29.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 387/ 808]                 blk.29.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 388/ 808]            blk.29.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 389/ 808]            blk.29.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 390/ 808]                 blk.29.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 391/ 808]                 blk.29.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.29.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 392/ 808]              blk.30.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 393/ 808]               blk.30.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 394/ 808]               blk.30.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 395/ 808]                 blk.30.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 396/ 808]    blk.30.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 397/ 808]          blk.30.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 398/ 808]               blk.30.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 399/ 808]            blk.30.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 400/ 808]                 blk.30.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 401/ 808]            blk.30.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 402/ 808]            blk.30.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 403/ 808]                 blk.30.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 404/ 808]                 blk.30.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.30.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 405/ 808]               blk.31.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 406/ 808]            blk.31.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 407/ 808]                 blk.31.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 408/ 808]            blk.31.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 409/ 808]            blk.31.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 410/ 808]                 blk.31.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 411/ 808]                 blk.31.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.31.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 412/ 808]              blk.31.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 413/ 808]               blk.31.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 414/ 808]                 blk.31.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 415/ 808]    blk.31.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 416/ 808]          blk.31.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 417/ 808]               blk.31.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 418/ 808]              blk.32.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 419/ 808]               blk.32.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 420/ 808]               blk.32.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 421/ 808]                 blk.32.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 422/ 808]    blk.32.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 423/ 808]          blk.32.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 424/ 808]               blk.32.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 425/ 808]            blk.32.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 426/ 808]                 blk.32.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 427/ 808]            blk.32.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 428/ 808]            blk.32.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 429/ 808]                 blk.32.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 430/ 808]                 blk.32.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.32.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 431/ 808]              blk.33.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 432/ 808]               blk.33.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 433/ 808]               blk.33.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 434/ 808]                 blk.33.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 435/ 808]    blk.33.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 436/ 808]          blk.33.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 437/ 808]               blk.33.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 438/ 808]            blk.33.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 439/ 808]                 blk.33.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 440/ 808]            blk.33.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 441/ 808]            blk.33.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 442/ 808]                 blk.33.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 443/ 808]                 blk.33.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.33.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 444/ 808]              blk.34.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 445/ 808]               blk.34.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 446/ 808]               blk.34.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 447/ 808]                 blk.34.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 448/ 808]    blk.34.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 449/ 808]          blk.34.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 450/ 808]               blk.34.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 451/ 808]            blk.34.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 452/ 808]                 blk.34.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 453/ 808]            blk.34.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 454/ 808]            blk.34.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 455/ 808]                 blk.34.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 456/ 808]                 blk.34.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.34.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 457/ 808]              blk.35.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 458/ 808]               blk.35.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 459/ 808]               blk.35.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 460/ 808]                 blk.35.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 461/ 808]    blk.35.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 462/ 808]          blk.35.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 463/ 808]               blk.35.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 464/ 808]            blk.35.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 465/ 808]                 blk.35.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 466/ 808]            blk.35.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 467/ 808]            blk.35.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 468/ 808]                 blk.35.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 469/ 808]                 blk.35.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.35.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 470/ 808]              blk.36.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 471/ 808]               blk.36.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 472/ 808]               blk.36.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 473/ 808]                 blk.36.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 474/ 808]    blk.36.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 475/ 808]          blk.36.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 476/ 808]               blk.36.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 477/ 808]            blk.36.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 478/ 808]                 blk.36.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 479/ 808]            blk.36.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 480/ 808]            blk.36.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 481/ 808]                 blk.36.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 482/ 808]                 blk.36.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.36.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 483/ 808]               blk.37.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 484/ 808]            blk.37.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 485/ 808]                 blk.37.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 486/ 808]            blk.37.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 487/ 808]            blk.37.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 488/ 808]                 blk.37.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 489/ 808]                 blk.37.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.37.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 490/ 808]              blk.37.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 491/ 808]               blk.37.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 492/ 808]                 blk.37.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 493/ 808]    blk.37.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 494/ 808]          blk.37.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 495/ 808]               blk.37.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 496/ 808]              blk.38.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 497/ 808]               blk.38.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 498/ 808]               blk.38.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 499/ 808]                 blk.38.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 500/ 808]    blk.38.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 501/ 808]          blk.38.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 502/ 808]               blk.38.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 503/ 808]            blk.38.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 504/ 808]                 blk.38.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 505/ 808]            blk.38.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 506/ 808]            blk.38.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 507/ 808]                 blk.38.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 508/ 808]                 blk.38.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.38.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 509/ 808]              blk.39.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 510/ 808]               blk.39.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 511/ 808]               blk.39.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 512/ 808]                 blk.39.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 513/ 808]    blk.39.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 514/ 808]          blk.39.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 515/ 808]               blk.39.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 516/ 808]            blk.39.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 517/ 808]                 blk.39.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 518/ 808]            blk.39.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 519/ 808]            blk.39.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 520/ 808]                 blk.39.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 521/ 808]                 blk.39.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.39.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 522/ 808]              blk.40.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 523/ 808]               blk.40.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 524/ 808]               blk.40.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 525/ 808]                 blk.40.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 526/ 808]    blk.40.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 527/ 808]          blk.40.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 528/ 808]               blk.40.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 529/ 808]            blk.40.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 530/ 808]                 blk.40.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 531/ 808]            blk.40.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 532/ 808]            blk.40.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 533/ 808]                 blk.40.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 534/ 808]                 blk.40.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.40.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 535/ 808]              blk.41.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 536/ 808]               blk.41.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 537/ 808]               blk.41.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 538/ 808]                 blk.41.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 539/ 808]    blk.41.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 540/ 808]          blk.41.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 541/ 808]               blk.41.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 542/ 808]            blk.41.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 543/ 808]                 blk.41.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 544/ 808]            blk.41.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 545/ 808]            blk.41.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 546/ 808]                 blk.41.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 547/ 808]                 blk.41.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.41.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 548/ 808]              blk.42.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 549/ 808]               blk.42.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 550/ 808]               blk.42.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 551/ 808]                 blk.42.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 552/ 808]    blk.42.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 553/ 808]          blk.42.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 554/ 808]               blk.42.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 555/ 808]            blk.42.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 556/ 808]                 blk.42.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 557/ 808]            blk.42.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 558/ 808]            blk.42.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 559/ 808]                 blk.42.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 560/ 808]                 blk.42.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.42.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 561/ 808]               blk.43.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 562/ 808]            blk.43.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 563/ 808]                 blk.43.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 564/ 808]            blk.43.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 565/ 808]            blk.43.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 566/ 808]                 blk.43.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 567/ 808]                 blk.43.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.43.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 568/ 808]              blk.43.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 569/ 808]               blk.43.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 570/ 808]                 blk.43.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 571/ 808]    blk.43.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 572/ 808]          blk.43.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 573/ 808]               blk.43.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 574/ 808]              blk.44.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 575/ 808]               blk.44.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 576/ 808]               blk.44.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 577/ 808]                 blk.44.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 578/ 808]    blk.44.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 579/ 808]          blk.44.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 580/ 808]               blk.44.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 581/ 808]            blk.44.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 582/ 808]                 blk.44.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 583/ 808]            blk.44.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 584/ 808]            blk.44.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 585/ 808]                 blk.44.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 586/ 808]                 blk.44.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.44.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 587/ 808]              blk.45.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 588/ 808]               blk.45.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 589/ 808]               blk.45.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 590/ 808]                 blk.45.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 591/ 808]    blk.45.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 592/ 808]          blk.45.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 593/ 808]               blk.45.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 594/ 808]            blk.45.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 595/ 808]                 blk.45.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 596/ 808]            blk.45.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 597/ 808]            blk.45.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 598/ 808]                 blk.45.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 599/ 808]                 blk.45.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.45.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 600/ 808]              blk.46.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 601/ 808]               blk.46.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 602/ 808]               blk.46.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 603/ 808]                 blk.46.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 604/ 808]    blk.46.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 605/ 808]          blk.46.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 606/ 808]               blk.46.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 607/ 808]            blk.46.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 608/ 808]                 blk.46.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 609/ 808]            blk.46.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 610/ 808]            blk.46.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 611/ 808]                 blk.46.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 612/ 808]                 blk.46.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.46.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 613/ 808]              blk.47.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 614/ 808]               blk.47.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 615/ 808]               blk.47.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 616/ 808]                 blk.47.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 617/ 808]    blk.47.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 618/ 808]          blk.47.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 619/ 808]               blk.47.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 620/ 808]            blk.47.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 621/ 808]                 blk.47.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 622/ 808]            blk.47.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 623/ 808]            blk.47.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 624/ 808]                 blk.47.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 625/ 808]                 blk.47.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.47.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 626/ 808]              blk.48.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 627/ 808]               blk.48.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 628/ 808]               blk.48.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 629/ 808]                 blk.48.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 630/ 808]    blk.48.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 631/ 808]          blk.48.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 632/ 808]               blk.48.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 633/ 808]            blk.48.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 634/ 808]                 blk.48.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 635/ 808]            blk.48.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 636/ 808]            blk.48.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 637/ 808]                 blk.48.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 638/ 808]                 blk.48.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.48.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 639/ 808]               blk.49.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 640/ 808]            blk.49.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 641/ 808]                 blk.49.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 642/ 808]            blk.49.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 643/ 808]            blk.49.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 644/ 808]                 blk.49.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 645/ 808]                 blk.49.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.49.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 646/ 808]              blk.49.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 647/ 808]               blk.49.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 648/ 808]                 blk.49.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 649/ 808]    blk.49.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 650/ 808]          blk.49.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 651/ 808]               blk.49.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 652/ 808]              blk.50.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 653/ 808]               blk.50.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 654/ 808]               blk.50.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 655/ 808]                 blk.50.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 656/ 808]    blk.50.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 657/ 808]          blk.50.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 658/ 808]               blk.50.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 659/ 808]            blk.50.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 660/ 808]                 blk.50.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 661/ 808]            blk.50.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 662/ 808]            blk.50.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 663/ 808]                 blk.50.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 664/ 808]                 blk.50.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.50.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 665/ 808]              blk.51.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 666/ 808]               blk.51.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 667/ 808]               blk.51.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 668/ 808]                 blk.51.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 669/ 808]    blk.51.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 670/ 808]          blk.51.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 671/ 808]               blk.51.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 672/ 808]            blk.51.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 673/ 808]                 blk.51.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 674/ 808]            blk.51.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 675/ 808]            blk.51.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 676/ 808]                 blk.51.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 677/ 808]                 blk.51.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.51.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 678/ 808]              blk.52.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 679/ 808]               blk.52.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 680/ 808]               blk.52.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 681/ 808]                 blk.52.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 682/ 808]    blk.52.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 683/ 808]          blk.52.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 684/ 808]               blk.52.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 685/ 808]            blk.52.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 686/ 808]                 blk.52.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 687/ 808]            blk.52.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 688/ 808]            blk.52.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 689/ 808]                 blk.52.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 690/ 808]                 blk.52.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.52.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 691/ 808]              blk.53.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 692/ 808]               blk.53.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 693/ 808]               blk.53.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 694/ 808]                 blk.53.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 695/ 808]    blk.53.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 696/ 808]          blk.53.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 697/ 808]               blk.53.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 698/ 808]            blk.53.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 699/ 808]                 blk.53.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 700/ 808]            blk.53.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 701/ 808]            blk.53.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 702/ 808]                 blk.53.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 703/ 808]                 blk.53.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.53.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 704/ 808]              blk.54.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 705/ 808]               blk.54.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 706/ 808]               blk.54.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 707/ 808]                 blk.54.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 708/ 808]    blk.54.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 709/ 808]          blk.54.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 710/ 808]               blk.54.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 711/ 808]            blk.54.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 712/ 808]                 blk.54.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 713/ 808]            blk.54.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 714/ 808]            blk.54.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 715/ 808]                 blk.54.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 716/ 808]                 blk.54.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.54.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 717/ 808]               blk.55.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 718/ 808]            blk.55.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 719/ 808]                 blk.55.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 720/ 808]            blk.55.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 721/ 808]            blk.55.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 722/ 808]                 blk.55.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 723/ 808]                 blk.55.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.55.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 724/ 808]              blk.55.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 725/ 808]               blk.55.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 726/ 808]                 blk.55.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 727/ 808]    blk.55.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 728/ 808]          blk.55.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 729/ 808]               blk.55.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 730/ 808]              blk.56.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 731/ 808]               blk.56.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 732/ 808]               blk.56.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 733/ 808]                 blk.56.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 734/ 808]    blk.56.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 735/ 808]          blk.56.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 736/ 808]               blk.56.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 737/ 808]            blk.56.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 738/ 808]                 blk.56.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 739/ 808]            blk.56.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 740/ 808]            blk.56.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 741/ 808]                 blk.56.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 742/ 808]                 blk.56.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.56.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 743/ 808]              blk.57.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 744/ 808]               blk.57.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 745/ 808]               blk.57.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 746/ 808]                 blk.57.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 747/ 808]    blk.57.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 748/ 808]          blk.57.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 749/ 808]               blk.57.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 750/ 808]            blk.57.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 751/ 808]                 blk.57.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 752/ 808]            blk.57.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 753/ 808]            blk.57.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 754/ 808]                 blk.57.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 755/ 808]                 blk.57.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.57.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 756/ 808]              blk.58.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 757/ 808]               blk.58.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 758/ 808]               blk.58.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 759/ 808]                 blk.58.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 760/ 808]    blk.58.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 761/ 808]          blk.58.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 762/ 808]               blk.58.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 763/ 808]            blk.58.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 764/ 808]                 blk.58.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 765/ 808]            blk.58.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 766/ 808]            blk.58.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 767/ 808]                 blk.58.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 768/ 808]                 blk.58.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.58.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 769/ 808]              blk.59.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 770/ 808]               blk.59.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 771/ 808]               blk.59.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 772/ 808]                 blk.59.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 773/ 808]    blk.59.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 774/ 808]          blk.59.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 775/ 808]               blk.59.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 776/ 808]            blk.59.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 777/ 808]                 blk.59.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 778/ 808]            blk.59.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 779/ 808]            blk.59.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 780/ 808]                 blk.59.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 781/ 808]                 blk.59.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.59.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 782/ 808]              blk.60.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 783/ 808]               blk.60.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 784/ 808]               blk.60.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 785/ 808]                 blk.60.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 786/ 808]    blk.60.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 787/ 808]          blk.60.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 788/ 808]               blk.60.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 789/ 808]            blk.60.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 790/ 808]                 blk.60.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 791/ 808]            blk.60.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 792/ 808]            blk.60.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 793/ 808]                 blk.60.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 794/ 808]                 blk.60.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.60.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 795/ 808]               blk.61.ffn_gate.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 796/ 808]            blk.61.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 797/ 808]                 blk.61.attn_k.weight - [ 5376,  2048,     1,     1], type =   bf16, converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 798/ 808]            blk.61.attn_output.weight - [ 4096,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 799/ 808]            blk.61.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[ 800/ 808]                 blk.61.attn_q.weight - [ 5376,  4096,     1,     1], type =   bf16, converting to iq4_kt .. size =    42.00 MiB ->    10.52 MiB
[ 801/ 808]                 blk.61.attn_v.weight - [ 5376,  2048,     1,     1], type =   bf16, Using custom type iq4_kt for tensor blk.61.attn_v.weight
converting to iq4_kt .. size =    21.00 MiB ->     5.26 MiB
[ 802/ 808]              blk.61.attn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 803/ 808]               blk.61.ffn_down.weight - [21504,  5376,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.15 MiB
[ 804/ 808]                 blk.61.ffn_up.weight - [ 5376, 21504,     1,     1], type =   bf16, converting to iq4_kt .. size =   220.50 MiB ->    55.21 MiB
[ 805/ 808]    blk.61.post_attention_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 806/ 808]          blk.61.post_ffw_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 807/ 808]               blk.61.ffn_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
[ 808/ 808]                   output_norm.weight - [ 5376,     1,     1,     1], type =    f32, size =    0.021 MB
llama_model_quantize_internal: model size  = 51518.82 MB
llama_model_quantize_internal: quant size  = 13654.42 MB

main: quantize time = 1143720.04 ms
main:    total time = 1143720.04 ms

👈 Perplexity Command

Perplexity

./build/bin/llama-perplexity \
    --model /mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf \
    --ctx-size 512 \
    --ubatch-size 512 \
    -f wiki.test.raw \
    --seed 1337 \
    --n-gpu-layers 99 \
    --threads 1

I ended up using the same imatrix.dat file for both.

gemma-3-27B-it-qat-iq4_kt
- 13.334 GiB (4.241 BPW)
- Final estimate: PPL = 8.3431 +/- 0.06508
gemma-3-27B-it-qat-iq4_ks
- 14.099 GiB (4.484 BPW) attn_k_b at q4_0 i forget why
- Final estimate: PPL = 8.1750 +/- 0.06294

This probably isn't the best comparison given gemma-3-27B-it-qat behaves unlike most "normal" non-QAT quants. But llama-perplexity runs clean with no nans so that is the real test for me right now.

Lastly I'll do some quick sweep benches.

👈 sweep-bench results

#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_ks.gguf
model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-iq4_kt.gguf
#model=/mnt/raid/models/ubergarm/gemma-3-27b-it-qat-GGUF/gemma-3-27B-it-qat-q4_0.gguf
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-sweep-bench \
  --model "$model" \
  -c 32768 \
  -fa \
  -ngl 99 \
  --warmup-batch \
  --threads 1

PR505 iq4_ks

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	0.342	1497.20	3.743	34.19
512	128	512	0.348	1472.43	3.794	33.74
512	128	1024	0.353	1449.53	3.830	33.42
512	128	1536	0.358	1429.54	3.888	32.92
512	128	2048	0.364	1407.80	3.950	32.40
512	128	2560	0.368	1390.97	4.070	31.45
512	128	3072	0.374	1367.73	4.088	31.31
512	128	3584	0.379	1351.24	4.128	31.01
512	128	4096	0.385	1328.47	4.179	30.63
512	128	4608	0.389	1314.88	4.228	30.27
512	128	5120	0.394	1299.02	4.280	29.90
512	128	5632	0.399	1282.70	4.372	29.28
512	128	6144	0.406	1262.58	4.395	29.13
512	128	6656	0.410	1249.42	4.445	28.80
512	128	7168	0.416	1230.78	4.493	28.49
512	128	7680	0.421	1217.42	4.536	28.22
512	128	8192	0.426	1202.74	4.650	27.53

PR505 iq4_kt

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	0.329	1554.50	3.594	35.61
512	128	512	0.336	1524.23	3.651	35.06
512	128	1024	0.341	1499.47	3.679	34.79
512	128	1536	0.345	1482.10	3.732	34.30
512	128	2048	0.352	1453.17	3.784	33.83
512	128	2560	0.355	1442.46	3.889	32.91
512	128	3072	0.360	1424.20	3.918	32.67
512	128	3584	0.366	1399.86	3.963	32.30
512	128	4096	0.371	1380.12	4.007	31.95
512	128	4608	0.377	1359.59	4.066	31.48
512	128	5120	0.381	1343.44	4.115	31.10
512	128	5632	0.386	1327.56	4.205	30.44
512	128	6144	0.392	1304.90	4.224	30.30
512	128	6656	0.396	1291.41	4.267	30.00
512	128	7168	0.402	1273.86	4.319	29.64
512	128	7680	0.406	1260.46	4.347	29.44
512	128	8192	0.411	1244.46	4.459	28.71

PR505 q4_0

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	0.306	1673.44	3.499	36.58
512	128	512	0.311	1646.26	3.542	36.14
512	128	1024	0.318	1611.98	3.579	35.76
512	128	1536	0.322	1592.37	3.639	35.18
512	128	2048	0.328	1561.52	3.698	34.61
512	128	2560	0.334	1531.43	3.817	33.53
512	128	3072	0.339	1509.07	3.827	33.45
512	128	3584	0.346	1480.93	3.870	33.07
512	128	4096	0.351	1456.85	3.921	32.64
512	128	4608	0.355	1440.80	3.972	32.22
512	128	5120	0.360	1420.48	4.024	31.81
512	128	5632	0.366	1399.51	4.101	31.21
512	128	6144	0.370	1382.54	4.134	30.96
512	128	6656	0.378	1356.18	4.180	30.63
512	128	7168	0.382	1341.12	4.239	30.20
512	128	7680	0.386	1324.94	4.277	29.93
512	128	8192	0.392	1307.55	4.387	29.18

Very nice this new iq4_kt is faster than iq4_ks and very close to q4_0! It confirmed it does get a speed benefit from compiling with -DGGML_CUDA_F16=ON.

I'm holding off on releasing any of my experimental iqN_kt quants until you're happy with everything. So feel free to make breaking changes with this stuff as far as I'm concerned.

If you get the iq3_kt going as well, it might work to help me target a ~3.3ish BPW (~256GB) R1-0528. No pressure, I'm just daydreaming hah... Cheers and thanks!

ikawrakow · 2025-06-08T20:37:58Z

The Ops are harmless, just forgotten to remove

…

On Sun, 8 Jun 2025 at 23:34, ubergarm ***@***.***> wrote: *ubergarm* left a comment (ikawrakow/ik_llama.cpp#505) <#505 (comment)> Now that it seems to compile okay, giving it a try quantizing gemma-3-27B-it-qat-iq4_kt My first attempt threw an Oops Cluster N has no points but seems to keep going okay: [ 4/ 808] blk.0.ffn_gate.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. cluster_points: Oops. Cluster 620 has no points: 0 3 2 1 cluster_points: 1 out of 625 clusters dir not have any points cluster_points: Oops. Cluster 25 has no points: 1 2 1 0 cluster_points: Oops. Cluster 124 has no points: 0 3 3 1 cluster_points: Oops. Cluster 624 has no points: 0 0 3 1 cluster_points: 3 out of 625 clusters dir not have any points size = 220.50 MiB -> 55.21 MiB [ 5/ 808] blk.0.ffn_up.weight - [ 5376, 21504, 1, 1], type = bf16, converting to iq4_kt .. size = 220.50 M iB -> 55.21 MiB Not sure what that means, so I'm making a new imatrix using the some extra stuff from exllamav3 on top of my usual to see if it still throws the Oops knowing it might be completely unrelated. Will update this with results... — Reply to this email directly, view it on GitHub <#505 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALR6H4LJBPYLWEQ2RC437S33CSM6NAVCNFSM6AAAAAB6273QMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNJUGI3DKMRRHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ikawrakow · 2025-06-09T07:10:24Z

Yes, sorry I forgot to add the iq4_kt MMQ template instance (it is done now). I manually add files there instead of using the Python script as it is way more complicated in ik_llama.cpp than it is in llama.cpp, so I figure it will take longer to change the script than to just manually add a file from time to time.

ikawrakow · 2025-06-09T15:54:34Z

Concerning PPL: yes, IQ4_KT is not quite on par with IQ4_KS. It is 4.0 bpw versus 4.25 bpw for IQ4_KS, so PPL is somewhat higher. But it is better than IQ4_KSS, which is also exactly 4.0 bpw. As you get to 4 bpw, the benefit of using a trellis becomes very small.

ikawrakow · 2025-06-18T13:22:19Z

Closing in favor of #529

Iwan Kawrakow added 16 commits June 7, 2025 12:30

New iq4_kt trellis

eb61f49

The new trellis generates int8_t values via sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126. CUDA dequantize works. AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B. PPL is on par or even slightly lower than original QTIP trellis.

Something is not working with the AVX2 dot product

a296134

New iq4_kt: CUDA MMVQ

98f35bf

New iq4_kt: CUDA MMQ

9b4103e

For now have only iq4_kt use the new trellis

fb776ab

Fix iq2_kt that got broken along the way

2b6acd5

New iq4_kt: AVX2 dot product finally works

65e654a

We get 13.6 t/s vs 8.4 t/s with the f16 trellis and f32 arithmetic. Still somewhat slower than other quants, but no longer pathetic.

New iq4_kt: fix vanilla AVX2

9d7bf1c

New iq4_kt: NEON implementation

8cc8b1e

We get very respectable PP-512 = 120 t/s. TG-128 is pathetic at 5.3 t/s, so 20+% slower than the f16 variant.

New iq4_kt: slightly faster NEON

eed2215

New iq4_kt: slightly faster NEON

b41f471

New iq4_kt: faster NEON

d334cbf

We are now at 9.4 t/s, up from 6.6 t/s for the f16 trellis.

Minor

1e6ff8a

New iq4_kt trellis: not working Metal implementation

314674d

Remove the extra 4 bytes of row meta data that is no longer used

b30bfc1

Cleanup

07efec0

ikawrakow added the Breaking change label Jun 8, 2025

Adding forgottent file

f59fe11

ikawrakow mentioned this pull request Jun 9, 2025

New IQ2_KT #511

Closed

This was referenced Jun 11, 2025

IQ2_XXS: much faster CPU prompt processing #515

Merged

New IQ2_KT, IQ3_KT and IQ4_KT, V2 #529

Merged

ikawrakow closed this Jun 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New IQ4_KT trellis implementation #505

New IQ4_KT trellis implementation #505

Uh oh!

ikawrakow commented Jun 8, 2025

Uh oh!

ikawrakow commented Jun 8, 2025

Uh oh!

ubergarm commented Jun 8, 2025 •

edited

Loading

Uh oh!

ubergarm commented Jun 8, 2025 •

edited

Loading

PR505 iq4_ks

PR505 iq4_kt

PR505 q4_0

Uh oh!

ikawrakow commented Jun 8, 2025 via email

Uh oh!

ikawrakow commented Jun 9, 2025

Uh oh!

ikawrakow commented Jun 9, 2025

Uh oh!

ikawrakow commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New IQ4_KT trellis implementation #505

New IQ4_KT trellis implementation #505

Uh oh!

Conversation

ikawrakow commented Jun 8, 2025

Uh oh!

ikawrakow commented Jun 8, 2025

Uh oh!

ubergarm commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubergarm commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR505 iq4_ks

PR505 iq4_kt

PR505 q4_0

Uh oh!

ikawrakow commented Jun 8, 2025 via email

Uh oh!

ikawrakow commented Jun 9, 2025

Uh oh!

ikawrakow commented Jun 9, 2025

Uh oh!

ikawrakow commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ubergarm commented Jun 8, 2025 •

edited

Loading

ubergarm commented Jun 8, 2025 •

edited

Loading