model: support GLM MoE DSA arch (NOTE: indexer is not yet supported)#19460
model: support GLM MoE DSA arch (NOTE: indexer is not yet supported)#19460ngxson merged 10 commits intoggml-org:masterfrom
Conversation
|
Well, the upstream PR has: :D |
|
How different is GLM’s DSA compared to DeepSeek V3.2’s DSA? Curious if it would be possible to backport because DeepSeek V3.2 DSA never got llama.cpp support. |
|
@DocShotgun From the mentioned PR, it seems to be exactly the same as DSv3.2. I assume they need to separate into a new arch just because they have some specific config, like activation func or MTP layer. I seems like DSv3.2 already supported by llama.cpp, example: https://huggingface.co/unsloth/DeepSeek-V3.2-GGUF , I'm not quite sure what's the differences between DSA and MLA implemented in DSv3 though |
It's possible to hack DS V3.2 to work in llama.cpp by treating it as a regular DeepSeekV3 MLA model rather than a DSA model, which is likely what the Unsloth folks did here, but proper DSA support would still be interesting. This is where there's some discussion of how to hack non-DSA DS v3.2: |
|
Are we planning on implementing the sparse attention with the indexer? @ngxson |
|
@pwilkin not yet, but I will have a look. for now, I think one safe way could be to modify my PR to keep these indexer, so if glm-5 is released suddenly, then we still at least has the gguf |
|
@ngxson I was taking a look at the DSA, obviously it's black magic out there (it uses a dedicated optimized kernel for the newest CUDA devices to calculate the pre-attention logits in 8-bit), but I'll try to extract the naive logic somewhere so that we can think about how to implement it in llama.cpp. |
|
@pwilkin Yeah I'm doing the same way, extracting the logic and then let gemini + claude to compete who get the first pytorch-only version without all the optimization stuff. But so far that takes me too much time, not yet success. So I would appreciate if you can get a naive version of DSA. |
|
GLM-5 is released: https://huggingface.co/zai-org/GLM-5 |
|
I guess that will be the plan B then:
|
|
Testing Unsloth's UD-Q3_K_XL with this branch now, runs fine but ofc really slow (30-35 pp\11-13 t\g - within 10k context). Answers quality is ok in my very limited testing. Quants now hidden on huggingface, not sure why. |
|
@drrros they basically removed the DSA indexer tensors, as explained above. the quality will be sub-optimal and (for sure) GGUF will need to be re-converted at some points |
|
@drrros sorry to bother, but I wonder, what is the size of Q3_K_XL on GiB and GB? |
|
Alright, so the last version can already load this random weight with weight generated from this transformers PR. I haven't tested it on the real weight though, if could you please give a try @bartowski1182 @danielhanchen ? An important note that the GGUF can be converted correctly, but indexer tensors are left unused by llama.cpp code. I think support for indexer will need to be implemented in a dedicated PR. |
|
Oh ok I'll reconvert the indexer for now. @ngxson I did try the old PR without the indexer and it worked. Hopefully the indexing code won't change anything? |
|
Also testing the convert out. |
|
@ngxson the convert worked cleanly, and I also did a Q4-ish quantization and did a PPL test on it and it looks sane: I did have to Full convert_hf_to_gguf outputStarting conversion for: /mnt/srv/snowdrift/ggml/GLM-5
INFO:hf-to-gguf:Loading model: GLM-5
WARNING:hf-to-gguf:Failed to load model config from /mnt/srv/snowdrift/fp16/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GlmMoeDsaForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/srv/snowdrift/fp16/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00063-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00064-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00065-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00066-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00067-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00068-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00069-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00070-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00071-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00072-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00073-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00074-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00075-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00076-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00077-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00078-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00079-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00080-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00081-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00082-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00083-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00084-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00085-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00086-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00087-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00088-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00089-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00090-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00091-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00092-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00093-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00094-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00095-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00096-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00097-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00098-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00099-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00100-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00101-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00102-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00103-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00104-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00105-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00106-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00107-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00108-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00109-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00110-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00111-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00112-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00113-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00114-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00115-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00116-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00117-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00118-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00119-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00120-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00121-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00122-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00123-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00124-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00125-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00126-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00127-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00128-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00129-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00130-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00131-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00132-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00133-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00134-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00135-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00136-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00137-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00138-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00139-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00140-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00141-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00142-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00143-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00144-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00145-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00146-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00147-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00148-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00149-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00150-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00151-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00152-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00153-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00154-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00155-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00156-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00157-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00158-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00159-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00160-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00161-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00162-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00163-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00164-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00165-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00166-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00167-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00168-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00169-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00170-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00171-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00172-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00173-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00174-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00175-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00176-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00177-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00178-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00179-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00180-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00181-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00182-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00183-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00184-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00185-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00186-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00187-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00188-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00189-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00190-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00191-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00192-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00193-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00194-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00195-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00196-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00197-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00198-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00199-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00200-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00201-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00202-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00203-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00204-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00205-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00206-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00207-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00208-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00209-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00210-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00211-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00212-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00213-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00214-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00215-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00216-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00217-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00218-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00219-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00220-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00221-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00222-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00223-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00224-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00225-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00226-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00227-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00228-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00229-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00230-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00231-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00232-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00233-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00234-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00235-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00236-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00237-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00238-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00239-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00240-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00241-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00242-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00243-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00244-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00245-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00246-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00247-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00248-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00249-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00250-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00251-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00252-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00253-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00254-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00255-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00256-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00257-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00258-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00259-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00260-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00261-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00262-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00263-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00264-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00265-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00266-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00267-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00268-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00269-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00270-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00271-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00272-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00273-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00274-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00275-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00276-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00277-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00278-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00279-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00280-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00281-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00282-of-00282.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.0.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.0.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.0.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.1.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.1.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.1.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.1.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.10.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.10.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.10.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.11.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.11.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.11.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.12.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.12.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.12.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.12.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.12.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.13.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.13.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.13.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.14.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.14.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.14.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.15.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.15.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.15.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.16.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.16.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.16.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.17.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.17.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.17.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.18.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.18.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.18.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.19.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.19.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.19.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.2.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.2.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.2.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.2.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.20.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.20.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.20.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.21.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.21.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.21.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.22.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.22.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.22.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.23.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.23.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.23.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.24.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.24.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.24.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.25.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.25.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.25.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.26.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.26.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.26.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.27.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.27.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.27.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.28.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.28.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.28.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.28.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.29.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.29.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.29.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.29.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.3.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.3.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.3.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.30.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.30.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.30.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.30.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.31.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.31.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.31.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.31.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.32.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.32.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.32.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.32.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.32.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.33.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.33.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.33.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.33.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.33.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.34.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.34.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.34.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.34.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.34.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.35.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.35.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.35.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.35.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.35.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.36.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.36.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.36.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.36.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.36.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.36.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.36.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.37.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.37.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.37.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.37.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.37.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.37.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.37.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.38.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.38.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.38.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.38.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.38.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.38.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.38.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.39.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.39.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.39.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.39.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.39.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.39.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.39.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.4.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.4.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.4.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.40.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.40.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.40.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.40.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.40.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.40.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.40.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.41.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.41.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.41.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.41.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.41.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.41.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.41.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.42.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.42.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.42.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.42.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.42.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.42.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.42.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.43.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.43.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.43.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.43.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.43.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.43.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.43.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.44.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.44.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.44.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.44.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.44.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.44.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.44.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.45.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.45.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.45.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.45.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.45.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.45.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.45.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.46.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.46.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.46.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.46.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.46.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.46.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.46.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.47.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.47.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.47.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.47.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.47.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.47.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.47.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.48.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.48.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.48.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.48.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.48.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.48.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.48.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.49.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.49.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.49.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.49.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.49.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.49.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.49.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.5.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.5.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.5.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.50.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.50.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.50.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.50.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.50.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.50.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.50.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.51.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.51.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.51.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.51.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.51.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.51.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.51.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.52.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.52.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.52.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.52.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.52.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.52.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.52.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.53.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.53.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.53.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.53.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.53.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.53.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.53.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.54.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.54.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.54.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.54.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.54.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.54.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.54.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.55.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.55.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.55.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.55.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.55.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.55.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.55.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.56.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.56.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.56.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.56.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.56.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.56.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.56.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.57.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.57.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.57.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.57.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.57.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.57.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.57.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.58.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.58.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.58.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.58.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.58.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.58.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.58.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.59.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.59.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.59.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.59.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.59.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.59.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.59.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.6.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.6.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.6.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.60.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.60.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.60.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.60.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.60.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.60.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.60.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.61.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.61.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.61.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.61.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.61.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.61.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.61.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.61.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.61.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.61.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.61.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.61.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.61.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.61.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.61.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.61.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.61.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.61.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.61.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.61.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.62.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.62.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.62.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.62.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.62.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.62.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.62.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.62.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.62.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.62.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.62.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.62.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.62.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.62.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.62.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.62.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.62.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.62.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.62.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.62.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.63.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.63.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.63.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.63.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.63.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.63.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.63.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.63.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.63.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.63.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.63.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.63.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.63.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.63.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.63.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.63.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.63.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.63.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.63.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.63.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.64.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.64.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.64.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.64.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.64.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.64.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.64.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.64.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.64.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.64.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.64.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.64.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.64.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.64.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.64.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.64.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.64.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.64.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.64.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.64.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.65.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.65.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.65.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.65.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.65.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.65.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.65.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.65.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.65.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.65.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.65.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.65.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.65.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.65.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.65.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.65.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.65.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.65.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.65.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.65.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.66.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.66.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.66.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.66.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.66.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.66.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.66.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.66.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.66.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.66.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.66.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.66.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.66.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.66.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.66.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.66.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.66.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.66.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.66.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.66.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.67.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.67.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.67.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.67.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.67.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.67.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.67.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.67.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.67.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.67.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.67.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.67.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.67.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.67.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.67.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.67.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.67.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.67.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.67.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.67.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.68.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.68.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.68.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.68.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.68.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.68.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.68.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.68.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.68.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.68.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.68.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.68.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.68.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.68.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.68.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.68.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.68.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.68.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.68.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.68.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.69.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.69.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.69.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.69.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.69.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.69.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.69.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.69.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.69.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.69.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.69.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.69.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.69.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.69.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.69.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.69.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.69.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.69.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.69.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.69.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.7.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.7.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.7.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.70.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.70.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.70.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.70.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.70.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.70.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.70.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.70.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.70.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.70.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.70.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.70.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.70.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.70.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.70.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.70.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.70.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.70.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.70.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.70.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.71.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.71.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.71.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.71.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.71.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.71.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.71.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.71.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.71.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.71.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.71.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.71.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.71.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.71.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.71.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.71.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.71.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.71.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.71.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.71.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.72.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.72.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.72.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.72.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.72.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.72.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.72.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.72.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.72.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.72.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.72.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.72.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.72.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.72.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.72.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.72.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.72.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.72.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.72.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.72.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.73.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.73.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.73.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.73.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.73.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.73.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.73.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.73.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.73.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.73.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.73.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.73.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.73.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.73.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.73.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.73.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.73.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.73.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.73.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.73.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.74.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.74.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.74.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.74.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.74.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.74.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.74.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.74.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.74.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.74.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.74.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.74.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.74.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.74.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.74.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.74.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.74.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.74.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.74.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.74.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.75.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.75.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.75.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.75.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.75.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.75.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.75.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.75.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.75.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.75.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.75.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.75.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.75.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.75.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.75.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.75.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.75.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.75.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.75.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.75.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.76.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.76.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.76.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.76.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.76.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.76.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.76.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.76.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.76.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.76.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.76.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.76.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.76.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.76.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.76.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.76.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.76.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.76.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.76.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.76.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.77.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.77.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.77.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.77.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.77.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.77.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.77.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.77.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.77.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.77.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.77.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.77.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.77.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.77.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.77.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.77.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.77.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.77.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.77.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.77.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.78.nextn.eh_proj.weight, torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.78.nextn.enorm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.nextn.hnorm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.78.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.78.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.78.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.78.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.78.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.78.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.78.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.78.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.78.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.78.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.78.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.78.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.78.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.78.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.78.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.78.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.78.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.78.nextn.shared_head_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.8.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.8.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight, torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight, torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.9.indexer.k_norm.bias, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.indexer.k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.indexer.proj.weight, torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.9.indexer.attn_k.weight, torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.9.indexer.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 202752
INFO:hf-to-gguf:gguf: embedding length = 6144
INFO:hf-to-gguf:gguf: feed forward length = 12288
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
WARNING:hf-to-gguf:Unknown RoPE type: default
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: expert score gating function = sigmoid
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.key_length', overwriting it with new value 576 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.value_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.leading_dense_block_count', overwriting it with new value 3 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.rope.dimension_count', overwriting it with new value 64 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.expert_gating_func', overwriting it with new value 2 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
INFO:gguf.vocab:Adding 321649 merge(s).
INFO:gguf.vocab:Setting special token type eos to 154820
INFO:gguf.vocab:Setting special token type pad to 154820
INFO:gguf.vocab:Setting special token type bos to 154822
INFO:gguf.vocab:Setting special token type eot to 154827
INFO:gguf.vocab:Setting special token type unk to 154820
INFO:gguf.vocab:Setting special token type eom to 154829
INFO:gguf.vocab:Setting chat_template to [gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson(ensure_ascii=False) }}
{% endfor %}
</tools>
For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
{%- if content is string -%}
{{- content }}
{%- elif content is iterable and content is not mapping -%}
{%- for item in content -%}
{%- if item is mapping and item.type == 'text' -%}
{{- item.text }}
{%- elif item is string -%}
{{- item }}
{%- endif -%}
{%- endfor -%}
{%- else -%}
{{- content }}
{%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
{%- if m.role == 'user' %}
{% set ns.last_user_index = loop.index0 -%}
{%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
{%- set reasoning_content = m.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
{{ '<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '</think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
{%- set tc = tc.function %}
{%- endif %}
{{- '<tool_call>' + tc.name -}}
{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|observation|>' }}
{%- endif %}
{{- '<tool_response>' }}
{{- m.content }}
{{- '</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}
<tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
<|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/srv/snowdrift/ggml/GLM-5/GLM-5-BF16.gguf: n_tensors = 1809, total_size = 1.5T - over 1TB, split recommended
Writing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.51T/1.51T [2:48:05<00:00, 150Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mnt/srv/snowdrift/ggml/GLM-5/GLM-5-BF16.gguf |
|
Hit the character limit on the last one, attached is the llama-quantize output and here is the llama-perplexity llama-perplexity output./build/bin/llama-perplexity \
--n-gpu-layers 999 --threads 52 \
--override-tensor "blk\..*_exps\.=CPU" \
--flash-attn on \
--file /mnt/srv/host/resources/KLD/ddh0_imat_calibration_data_v2.txt \
--model /mnt/srv/snowdrift/gguf/GLM-5-GGUF/aes_sedai/GLM-5-Q4_K_M.gguf
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
build: 8006 (4d3daf80f) with GNU 14.2.1 for Linux x86_64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial parameters [MiB]:
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 3090): 24135 total, 13338 used, 10532 free vs. target of 1024
llama_params_fit_impl: - CUDA1 (NVIDIA GeForce RTX 3090): 24135 total, 9786 used, 14085 free vs. target of 1024
llama_params_fit_impl: projected to use 23125 MiB of device memory vs. 47743 MiB of free device memory
llama_params_fit_impl: targets for free memory can be met on all devices, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 0.57 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) (0000:06:10.0) - 23871 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3090) (0000:06:11.0) - 23871 MiB free
llama_model_loader: loaded meta data with 55 key-value pairs and 1809 tensors from /mnt/srv/snowdrift/gguf/GLM-5-GGUF/aes_sedai/GLM-5-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = glm-dsa
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 3: general.sampling.temp f32 = 1.000000
llama_model_loader: - kv 4: general.name str = GLM 5
llama_model_loader: - kv 5: general.version str = 5
llama_model_loader: - kv 6: general.basename str = GLM
llama_model_loader: - kv 7: general.size_label str = 256x22B
llama_model_loader: - kv 8: general.license str = mit
llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
llama_model_loader: - kv 10: general.languages arr[str,2] = ["en", "zh"]
llama_model_loader: - kv 11: glm-dsa.block_count u32 = 79
llama_model_loader: - kv 12: glm-dsa.context_length u32 = 202752
llama_model_loader: - kv 13: glm-dsa.embedding_length u32 = 6144
llama_model_loader: - kv 14: glm-dsa.feed_forward_length u32 = 12288
llama_model_loader: - kv 15: glm-dsa.attention.head_count u32 = 64
llama_model_loader: - kv 16: glm-dsa.attention.head_count_kv u32 = 1
llama_model_loader: - kv 17: glm-dsa.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: glm-dsa.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 19: glm-dsa.expert_used_count u32 = 8
llama_model_loader: - kv 20: glm-dsa.expert_group_count u32 = 1
llama_model_loader: - kv 21: glm-dsa.expert_group_used_count u32 = 1
llama_model_loader: - kv 22: glm-dsa.expert_gating_func u32 = 2
llama_model_loader: - kv 23: glm-dsa.attention.key_length u32 = 576
llama_model_loader: - kv 24: glm-dsa.attention.value_length u32 = 512
llama_model_loader: - kv 25: glm-dsa.leading_dense_block_count u32 = 3
llama_model_loader: - kv 26: glm-dsa.vocab_size u32 = 154880
llama_model_loader: - kv 27: glm-dsa.attention.q_lora_rank u32 = 2048
llama_model_loader: - kv 28: glm-dsa.attention.kv_lora_rank u32 = 512
llama_model_loader: - kv 29: glm-dsa.attention.key_length_mla u32 = 256
llama_model_loader: - kv 30: glm-dsa.attention.value_length_mla u32 = 256
llama_model_loader: - kv 31: glm-dsa.expert_feed_forward_length u32 = 2048
llama_model_loader: - kv 32: glm-dsa.expert_count u32 = 256
llama_model_loader: - kv 33: glm-dsa.expert_shared_count u32 = 1
llama_model_loader: - kv 34: glm-dsa.expert_weights_scale f32 = 2.500000
llama_model_loader: - kv 35: glm-dsa.expert_weights_norm bool = true
llama_model_loader: - kv 36: glm-dsa.rope.dimension_count u32 = 64
llama_model_loader: - kv 37: glm-dsa.nextn_predict_layers u32 = 1
llama_model_loader: - kv 38: glm-dsa.attention.indexer.head_count u32 = 32
llama_model_loader: - kv 39: glm-dsa.attention.indexer.key_length u32 = 128
llama_model_loader: - kv 40: glm-dsa.attention.indexer.top_k u32 = 2048
llama_model_loader: - kv 41: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 42: tokenizer.ggml.pre str = glm4
llama_model_loader: - kv 43: tokenizer.ggml.tokens arr[str,154880] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 44: tokenizer.ggml.token_type arr[i32,154880] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 45: tokenizer.ggml.merges arr[str,321649] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 46: tokenizer.ggml.eos_token_id u32 = 154820
llama_model_loader: - kv 47: tokenizer.ggml.padding_token_id u32 = 154820
llama_model_loader: - kv 48: tokenizer.ggml.bos_token_id u32 = 154822
llama_model_loader: - kv 49: tokenizer.ggml.eot_token_id u32 = 154827
llama_model_loader: - kv 50: tokenizer.ggml.unknown_token_id u32 = 154820
llama_model_loader: - kv 51: tokenizer.ggml.eom_token_id u32 = 154829
llama_model_loader: - kv 52: tokenizer.chat_template str = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
llama_model_loader: - kv 53: general.quantization_version u32 = 2
llama_model_loader: - kv 54: general.file_type u32 = 7
llama_model_loader: - type f32: 630 tensors
llama_model_loader: - type q8_0: 951 tensors
llama_model_loader: - type q4_K: 152 tensors
llama_model_loader: - type q5_K: 76 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 432.80 GiB (4.93 BPW)
load: 0 unused tokens
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load: - 154820 ('<|endoftext|>')
load: - 154827 ('<|user|>')
load: - 154829 ('<|observation|>')
load: special tokens cache size = 36
load: token to piece cache size = 0.9811 MB
print_info: arch = glm-dsa
print_info: vocab_only = 0
print_info: no_alloc = 0
print_info: n_ctx_train = 202752
print_info: n_embd = 6144
print_info: n_embd_inp = 6144
print_info: n_layer = 79
print_info: n_head = 64
print_info: n_head_kv = 1
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 576
print_info: n_embd_head_v = 512
print_info: n_gqa = 64
print_info: n_embd_k_gqa = 576
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 12288
print_info: n_expert = 256
print_info: n_expert_used = 8
print_info: n_expert_groups = 1
print_info: n_group_used = 1
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 202752
print_info: rope_yarn_log_mul = 0.0000
print_info: rope_finetuned = unknown
print_info: model type = ?B
print_info: model params = 753.86 B
print_info: general.name = GLM 5
print_info: n_layer_dense_lead = 3
print_info: n_lora_q = 2048
print_info: n_lora_kv = 512
print_info: n_embd_head_k_mla = 256
print_info: n_embd_head_v_mla = 256
print_info: n_ff_exp = 2048
print_info: n_expert_shared = 1
print_info: expert_weights_scale = 2.5
print_info: expert_weights_norm = 1
print_info: expert_gating_func = sigmoid
print_info: vocab type = BPE
print_info: n_vocab = 154880
print_info: n_merges = 321649
print_info: BOS token = 154822 '[gMASK]'
print_info: EOS token = 154820 '<|endoftext|>'
print_info: EOT token = 154827 '<|user|>'
print_info: EOM token = 154829 '<|observation|>'
print_info: UNK token = 154820 '<|endoftext|>'
print_info: PAD token = 154820 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 154838 '<|code_prefix|>'
print_info: FIM SUF token = 154840 '<|code_suffix|>'
print_info: FIM MID token = 154839 '<|code_middle|>'
print_info: EOG token = 154820 '<|endoftext|>'
print_info: EOG token = 154827 '<|user|>'
print_info: EOG token = 154829 '<|observation|>'
print_info: max token length = 1024
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
model has unused tensor blk.78.attn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a_norm.weight (size = 8192 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_norm.weight (size = 2048 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.attn_q_b.weight (size = 35651584 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_mqa.weight (size = 3760128 bytes) -- ignoring
model has unused tensor blk.78.attn_k_b.weight (size = 6684672 bytes) -- ignoring
model has unused tensor blk.78.attn_v_b.weight (size = 8912896 bytes) -- ignoring
model has unused tensor blk.78.attn_output.weight (size = 106954752 bytes) -- ignoring
model has unused tensor blk.78.ffn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.bias (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.proj.weight (size = 208896 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_k.weight (size = 835584 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_q_b.weight (size = 8912896 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_inp.weight (size = 6291456 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_exps.weight (size = 1811939328 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_exps.weight (size = 2214592512 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_exps.weight (size = 1811939328 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.nextn.eh_proj.weight (size = 80216064 bytes) -- ignoring
model has unused tensor blk.78.nextn.enorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.hnorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.shared_head_norm.weight (size = 24576 bytes) -- ignoring
load_tensors: offloading output layer to GPU
load_tensors: offloading 78 repeating layers to GPU
load_tensors: offloaded 80/80 layers to GPU
load_tensors: CPU_Mapped model buffer size = 436336.94 MiB
load_tensors: CUDA0 model buffer size = 9396.39 MiB
load_tensors: CUDA1 model buffer size = 9362.85 MiB
....................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|user|> logit bias = -inf
common_init_result: added <|observation|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max = 4
llama_context: n_ctx = 2048
llama_context: n_ctx_seq = 512
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = enabled
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (512) < n_ctx_train (202752) -- the full capacity of the model will not be utilized
llama_context: CUDA_Host output buffer size = 2.36 MiB
llama_kv_cache: CUDA0 KV buffer size = 90.00 MiB
llama_kv_cache: CUDA1 KV buffer size = 85.50 MiB
llama_kv_cache: size = 175.50 MiB ( 512 cells, 78 layers, 4/4 seqs), K (f16): 175.50 MiB, V (f16): 0.00 MiB
sched_reserve: reserving ...
sched_reserve: CUDA0 compute buffer size = 3852.50 MiB
sched_reserve: CUDA1 compute buffer size = 338.50 MiB
sched_reserve: CUDA_Host compute buffer size = 25.01 MiB
sched_reserve: graph nodes = 6142
sched_reserve: graph splits = 266 (with bs=512), 153 (with bs=1)
sched_reserve: reserve took 11.90 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
system_info: n_threads = 52 (n_threads_batch = 52) / 56 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
perplexity: tokenizing the input ..
perplexity: tokenization took 76.986 ms
perplexity: calculating perplexity over 95 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 72.99 seconds per pass - ETA 28.88 minutes
[1]16.8614,[2]16.3951,[3]15.5025,[4]14.5560,[5]15.5232,[6]15.5516,[7]15.6259,[8]16.1710,[9]12.2898,[10]9.8678,[11]8.1579,[12]7.0037,[13]6.1849,[14]5.5276,[15]6.0347,[16]6.2716,[17]6.5604,[18]7.3360,[19]8.7123,[20]10.1797,[21]12.0817,[22]13.6140,[23]14.4465,[24]15.2823,[25]16.3053,[26]16.1397,[27]16.5744,[28]17.4151,[29]17.8676,[30]17.3381,[31]17.0221,[32]17.1958,[33]17.1416,[34]17.9895,[35]17.6223,[36]16.8053,[37]16.1290,[38]15.4024,[39]14.7367,[40]14.1682,[41]13.7139,[42]12.9933,[43]12.3054,[44]11.6620,[45]11.0681,[46]10.5427,[47]10.0451,[48]9.6240,[49]9.2423,[50]8.9006,[51]9.0106,[52]8.9349,[53]8.8982,[54]8.8900,[55]8.8600,[56]8.5223,[57]8.2086,[58]7.9179,[59]7.6820,[60]7.4261,[61]7.2661,[62]7.3937,[63]7.7534,[64]8.1809,[65]8.6196,[66]9.0449,[67]9.5237,[68]9.5204,[69]9.5468,[70]9.6281,[71]9.5872,[72]9.4034,[73]9.3273,[74]9.3742,[75]9.2922,[76]9.3381,[77]9.2312,[78]9.1765,[79]9.1866,[80]9.2248,[81]9.0802,[82]9.2538,[83]9.3001,[84]9.4838,[85]9.3700,[86]9.4216,[87]9.4827,[88]9.5441,[89]9.5825,[90]9.4355,[91]9.2334,[92]9.0611,[93]8.8964,[94]8.7225,[95]8.7486,
Final estimate: PPL = 8.7486 +/- 0.17123
llama_perf_context_print: load time = 158548.00 ms
llama_perf_context_print: prompt eval time = 1653944.47 ms / 48640 tokens ( 34.00 ms per token, 29.41 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 1728082.65 ms / 48641 tokens
llama_perf_context_print: graphs reused = 0
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (RTX 3090) | 24135 = 10455 + ( 13338 = 9396 + 90 + 3852) + 340 |
llama_memory_breakdown_print: | - CUDA1 (RTX 3090) | 24135 = 14017 + ( 9786 = 9362 + 85 + 338) + 330 |
llama_memory_breakdown_print: | - Host | 436361 = 436336 + 0 + 25 | |
@Panchovix It's 336GB |
|
Yep the conversion works great! I made some at https://huggingface.co/unsloth/GLM-5-GGUF using this PR - nice work @ngxson! |
| // TODO @ngxson : TENSOR_NOT_REQUIRED was a hack, need to remove it later | ||
| flags |= TENSOR_SKIP | TENSOR_NOT_REQUIRED; |
There was a problem hiding this comment.
The mtp layer doesn't have certain tensors compared to a normal layer. But I'm just quite lazy to fix it rn because they re unused anyway.
Nevertheless, the gguf still has the mtp layer
Was this using 512 token chunks? It looks to use the same index top-k as "index_head_dim": 128,
"index_n_heads": 32,
"index_topk": 2048,https://huggingface.co/zai-org/GLM-5/blob/main/config.json So this perplexity should be the same as with the DSA stuff working? It would be interesting if we could get the perplexity from the Transformers version for much larger chunk sizes and then comment out the DSA stuff (or just set top-k to a huge value?) to see if the "just pretend it's dense attention" is a valid thing to try. Alternatively we could try and use the callback mechanism in |
Yes I can try with the dummy weight. Btw, since the |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Yeah, it might be the case that it makes more and more difference as the context increases too (both from the flatter distribution and the autoregressive feedback loop). |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@jukofyork yes it was using the default which I think is 512-sized ctx chunks: I can re-run it later today with a larger context size and something like bart's v5 calibration set (this one is madison's which has some randomized utf data in it for funsies, it's short so it was quick to eval) |
|
recompiled and testing 👈 Details$ uv pip freeze | grep -i trans
transformers==5.1.0
$ export SOCKET=0
$ numactl -N ${SOCKET} -m ${SOCKET} \
python \
convert_hf_to_gguf.py \
--outtype bf16 \
--split-max-size 50G \
--outfile /mnt/data/models/ubergarm/GLM-5-GGUF/ \
/mnt/data/models/zai-org/GLM-5/
INFO:hf-to-gguf:Loading model: GLM-5
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/zai-org/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GlmMoeDsaForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/zai-org/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00282.safetensors'
.
.
.
INFO:hf-to-gguf:gguf: indexing model part 'model-00281-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00282-of-00282.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {6144, 12288}
.
.
.
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 202752
INFO:hf-to-gguf:gguf: embedding length = 6144
INFO:hf-to-gguf:gguf: feed forward length = 12288
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
WARNING:hf-to-gguf:Unknown RoPE type: default
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: expert score gating function = sigmoid
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.key_length', overwriting it with new value 576 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.value_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.rope.dimension_count', overwriting it with new value 64 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
INFO:gguf.vocab:Adding 321649 merge(s).
INFO:gguf.vocab:Setting special token type eos to 154820
INFO:gguf.vocab:Setting special token type pad to 154820
INFO:gguf.vocab:Setting special token type bos to 154822
INFO:gguf.vocab:Setting special token type eot to 154827
INFO:gguf.vocab:Setting special token type unk to 154820
INFO:gguf.vocab:Setting special token type eom to 154829
INFO:gguf.vocab:Setting chat_template to [gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools
.
.
.
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
<|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf: n_tensors = 85, total_size = 44.9G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00002-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00003-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00004-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00005-of-00033.gguf: n_tensors = 65, total_size = 46.8G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00006-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00007-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00008-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00009-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00010-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00011-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00012-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00013-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00014-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00015-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00016-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00017-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00018-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00019-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00020-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00021-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00022-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00023-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00024-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00025-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00026-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00027-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00028-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00029-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00030-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00031-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00032-of-00033.gguf: n_tensors = 51, total_size = 46.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00033-of-00033.gguf: n_tensors = 45, total_size = 33.1G
Shard (3/33): 14%|█▍ | 6.44G/46.0G [00:39<02:48, 235Mbyte/s]
Writing: 6%|▋ | 97.3G/1.51T [05:10<1:44:31, 225Mbyte/s]Should have time late tonight or tomorrow to test it out and could try llama-perplexity with different context lengths etc. UPDATE Perplexity of GLM-5-BF16.gguf on wiki.test.raw
👈 Perplexity Logsmodel=/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf
numactl --interleave=all \
./build/bin/llama-perplexity \
--model "$model"\
-f wiki.test.raw \
--seed 1337 \
--ctx-size 512 \
-ub 4096 -b 4096 \
--threads 96 \
--threads-batch 128 \
--no-mmap \
-fit off \
--numa distribute
build: 7987 (7b23cd920) with GNU 13.3.0 for Linux x86_64
llama_model_loader: additional 32 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 58 key-value pairs and 1809 tensors from /mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = glm-dsa
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 3: general.sampling.temp f32 = 1.000000
llama_model_loader: - kv 4: general.name str = GLM 5
llama_model_loader: - kv 5: general.version str = 5
llama_model_loader: - kv 6: general.basename str = GLM
llama_model_loader: - kv 7: general.size_label str = 256x22B
llama_model_loader: - kv 8: general.license str = mit
llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
llama_model_loader: - kv 10: general.languages arr[str,2] = ["en", "zh"]
llama_model_loader: - kv 11: glm-dsa.block_count u32 = 79
llama_model_loader: - kv 12: glm-dsa.context_length u32 = 202752
llama_model_loader: - kv 13: glm-dsa.embedding_length u32 = 6144
llama_model_loader: - kv 14: glm-dsa.feed_forward_length u32 = 12288
llama_model_loader: - kv 15: glm-dsa.attention.head_count u32 = 64
llama_model_loader: - kv 16: glm-dsa.attention.head_count_kv u32 = 1
llama_model_loader: - kv 17: glm-dsa.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: glm-dsa.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 19: glm-dsa.expert_used_count u32 = 8
llama_model_loader: - kv 20: glm-dsa.expert_group_count u32 = 1
llama_model_loader: - kv 21: glm-dsa.expert_group_used_count u32 = 1
llama_model_loader: - kv 22: glm-dsa.expert_gating_func u32 = 2
llama_model_loader: - kv 23: glm-dsa.attention.key_length u32 = 576
llama_model_loader: - kv 24: glm-dsa.attention.value_length u32 = 512
llama_model_loader: - kv 25: general.file_type u32 = 32
llama_model_loader: - kv 26: glm-dsa.leading_dense_block_count u32 = 3
llama_model_loader: - kv 27: glm-dsa.vocab_size u32 = 154880
llama_model_loader: - kv 28: glm-dsa.attention.q_lora_rank u32 = 2048
llama_model_loader: - kv 29: glm-dsa.attention.kv_lora_rank u32 = 512
llama_model_loader: - kv 30: glm-dsa.attention.key_length_mla u32 = 256
llama_model_loader: - kv 31: glm-dsa.attention.value_length_mla u32 = 256
llama_model_loader: - kv 32: glm-dsa.expert_feed_forward_length u32 = 2048
llama_model_loader: - kv 33: glm-dsa.expert_count u32 = 256
llama_model_loader: - kv 34: glm-dsa.expert_shared_count u32 = 1
llama_model_loader: - kv 35: glm-dsa.expert_weights_scale f32 = 2.500000
llama_model_loader: - kv 36: glm-dsa.expert_weights_norm bool = true
llama_model_loader: - kv 37: glm-dsa.rope.dimension_count u32 = 64
llama_model_loader: - kv 38: glm-dsa.nextn_predict_layers u32 = 1
llama_model_loader: - kv 39: glm-dsa.attention.indexer.head_count u32 = 32
llama_model_loader: - kv 40: glm-dsa.attention.indexer.key_length u32 = 128
llama_model_loader: - kv 41: glm-dsa.attention.indexer.top_k u32 = 2048
llama_model_loader: - kv 42: general.quantization_version u32 = 2
llama_model_loader: - kv 43: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 44: tokenizer.ggml.pre str = glm4
llama_model_loader: - kv 45: tokenizer.ggml.tokens arr[str,154880] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 46: tokenizer.ggml.token_type arr[i32,154880] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 47: tokenizer.ggml.merges arr[str,321649] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 48: tokenizer.ggml.eos_token_id u32 = 154820
llama_model_loader: - kv 49: tokenizer.ggml.padding_token_id u32 = 154820
llama_model_loader: - kv 50: tokenizer.ggml.bos_token_id u32 = 154822
llama_model_loader: - kv 51: tokenizer.ggml.eot_token_id u32 = 154827
llama_model_loader: - kv 52: tokenizer.ggml.unknown_token_id u32 = 154820
llama_model_loader: - kv 53: tokenizer.ggml.eom_token_id u32 = 154829
llama_model_loader: - kv 54: tokenizer.chat_template str = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
llama_model_loader: - kv 55: split.no u16 = 0
llama_model_loader: - kv 56: split.count u16 = 33
llama_model_loader: - kv 57: split.tensors.count i32 = 1809
llama_model_loader: - type f32: 630 tensors
llama_model_loader: - type bf16: 1179 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = BF16
print_info: file size = 1404.41 GiB (16.00 BPW)
load: 0 unused tokens
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load: - 154820 ('<|endoftext|>')
load: - 154827 ('<|user|>')
load: - 154829 ('<|observation|>')
load: special tokens cache size = 36
load: token to piece cache size = 0.9811 MB
print_info: arch = glm-dsa
print_info: vocab_only = 0
print_info: no_alloc = 0
print_info: n_ctx_train = 202752
print_info: n_embd = 6144
print_info: n_embd_inp = 6144
print_info: n_layer = 79
print_info: n_head = 64
print_info: n_head_kv = 1
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 576
print_info: n_embd_head_v = 512
print_info: n_gqa = 64
print_info: n_embd_k_gqa = 576
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 12288
print_info: n_expert = 256
print_info: n_expert_used = 8
print_info: n_expert_groups = 1
print_info: n_group_used = 1
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 202752
print_info: rope_yarn_log_mul = 0.0000
print_info: rope_finetuned = unknown
print_info: model type = 744B.A40B
print_info: model params = 753.86 B
print_info: general.name = GLM 5
print_info: n_layer_dense_lead = 3
print_info: n_lora_q = 2048
print_info: n_lora_kv = 512
print_info: n_embd_head_k_mla = 256
print_info: n_embd_head_v_mla = 256
print_info: n_ff_exp = 2048
print_info: n_expert_shared = 1
print_info: expert_weights_scale = 2.5
print_info: expert_weights_norm = 1
print_info: expert_gating_func = sigmoid
print_info: vocab type = BPE
print_info: n_vocab = 154880
print_info: n_merges = 321649
print_info: BOS token = 154822 '[gMASK]'
print_info: EOS token = 154820 '<|endoftext|>'
print_info: EOT token = 154827 '<|user|>'
print_info: EOM token = 154829 '<|observation|>'
print_info: UNK token = 154820 '<|endoftext|>'
print_info: PAD token = 154820 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 154838 '<|code_prefix|>'
print_info: FIM SUF token = 154840 '<|code_suffix|>'
print_info: FIM MID token = 154839 '<|code_middle|>'
print_info: EOG token = 154820 '<|endoftext|>'
print_info: EOG token = 154827 '<|user|>'
print_info: EOG token = 154829 '<|observation|>'
print_info: max token length = 1024
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
model has unused tensor blk.78.attn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a_norm.weight (size = 8192 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_norm.weight (size = 2048 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.attn_q_b.weight (size = 67108864 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_mqa.weight (size = 7077888 bytes) -- ignoring
model has unused tensor blk.78.attn_k_b.weight (size = 12582912 bytes) -- ignoring
model has unused tensor blk.78.attn_v_b.weight (size = 16777216 bytes) -- ignoring
model has unused tensor blk.78.attn_output.weight (size = 201326592 bytes) -- ignoring
model has unused tensor blk.78.ffn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.bias (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.proj.weight (size = 393216 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_k.weight (size = 1572864 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_q_b.weight (size = 16777216 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_inp.weight (size = 6291456 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.nextn.eh_proj.weight (size = 150994944 bytes) -- ignoring
model has unused tensor blk.78.nextn.enorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.hnorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.shared_head_norm.weight (size = 24576 bytes) -- ignoring
load_tensors: CPU model buffer size = 1419125.34 MiB
....................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|user|> logit bias = -inf
common_init_result: added <|observation|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max = 8
llama_context: n_ctx = 4096
llama_context: n_ctx_seq = 512
llama_context: n_batch = 4096
llama_context: n_ubatch = 4096
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (512) < n_ctx_train (202752) -- the full capacity of the model will not be utilized
llama_context: CPU output buffer size = 4.73 MiB
llama_kv_cache: CPU KV buffer size = 351.00 MiB
llama_kv_cache: size = 351.00 MiB ( 512 cells, 78 layers, 8/8 seqs), K (f16): 351.00 MiB, V (f16): 0.00 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: CPU compute buffer size = 2812.08 MiB
sched_reserve: graph nodes = 6142
sched_reserve: graph splits = 1
sched_reserve: reserve took 16.46 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
perplexity: tokenizing the input ..
perplexity: tokenization took 640.663 ms
perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
perplexity: 70.77 seconds per pass - ETA 1 hours 23.30 minutes
[1]1.2357,[2]1.9927,[3]1.7750,[4]1.5647,[5]1.4559,[6]1.3997,[7]1.3700,[8]1.3376,[9]1.3269,[10]1.3065,[11]1.2977,[12]1.3379,[13]1.3350,[14]1.3943,[15]1.4928,[16]1.5953,[17]1.7099,[18]1.8518,[19]1.8477,[20]1.8447,[21]1.9198,[22]1.9402,[23]1.9293,[24]1.9153,[25]1.9054,[26]1.9033,[27]1.9159,[28]1.9434,[29]1.9573,[30]2.0131,[31]2.0672,[32]2.1056,[33]2.1480,[34]2.1778,[35]2.2229,[36]2.2623,[37]2.2885,[38]2.3734,[39]2.4117,[40]2.4554,[41]2.5189,[42]2.5108,[43]2.5282,[44]2.5558,[45]2.6263,[46]2.6778,[47]2.6401,[48]2.6036,[49]2.5770,[50]2.5644,[51]2.5814,[52]2.6092,[53]2.6467,[54]2.6713,[55]2.6982,[56]2.7253,[57]2.7241,[58]2.7472,[59]2.7624,[60]2.7969,[61]2.8294,[62]2.8765,[63]2.9137,[64]2.9403,[65]2.9578,[66]2.9543,[67]2.9297,[68]2.9118,[69]2.9337,[70]2.9180,[71]2.9024,[72]2.9028,[73]2.9087,[74]2.9343,[75]2.9370,[76]2.9038,[77]2.8703,[78]2.8418,[79]2.8141,[80]2.7915,[81]2.7685,[82]2.7566,[83]2.7556,[84]2.7319,[85]2.7192,[86]2.7095,[87]2.7006,[88]2.6854,[89]2.6652,[90]2.6532,[91]2.6343,[92]2.6140,[93]2.6040,[94]2.5891,[95]2.5748,[96]2.5671,[97]2.5749,[98]2.5681,[99]2.5545,[100]2.5355,[101]2.5441,[102]2.5294,[103]2.5191,[104]2.5126,[105]2.5219,[106]2.5452,[107]2.5945,[108]2.6043,[109]2.6138,[110]2.6481,[111]2.6703,[112]2.6503,[113]2.6379,[114]2.6379,[115]2.6356,[116]2.6412,[117]2.6428,[118]2.6467,[119]2.6509,[120]2.6473,[121]2.6385,[122]2.6417,[123]2.6293,[124]2.6287,[125]2.6297,[126]2.6298,[127]2.6289,[128]2.6429,[129]2.6493,[130]2.6485,[131]2.6605,[132]2.6603,[133]2.6589,[134]2.6730,[135]2.6908,[136]2.6844,[137]2.6801,[138]2.6760,[139]2.6642,[140]2.6775,[141]2.6792,[142]2.6712,[143]2.6699,[144]2.6708,[145]2.6700,[146]2.6652,[147]2.6527,[148]2.6472,[149]2.6434,[150]2.6403,[151]2.6332,[152]2.6322,[153]2.6359,[154]2.6355,[155]2.6361,[156]2.6397,[157]2.6422,[158]2.6448,[159]2.6551,[160]2.6643,[161]2.6701,[162]2.6604,[163]2.6486,[164]2.6518,[165]2.6428,[166]2.6402,[167]2.6524,[168]2.6524,[169]2.6759,[170]2.6912,[171]2.7016,[172]2.7193,[173]2.7110,[174]2.6991,[175]2.6869,[176]2.6758,[177]2.6633,[178]2.6498,[179]2.6393,[180]2.6279,[181]2.6229,[182]2.6362,[183]2.6532,[184]2.6769,[185]2.6942,[186]2.7044,[187]2.7228,[188]2.7454,[189]2.7668,[190]2.7826,[191]2.7970,[192]2.8064,[193]2.8134,[194]2.8176,[195]2.8154,[196]2.8178,[197]2.8305,[198]2.8450,[199]2.8450,[200]2.8522,[201]2.8546,[202]2.8580,[203]2.8571,[204]2.8656,[205]2.8737,[206]2.8803,[207]2.8871,[208]2.8875,[209]2.8906,[210]2.8869,[211]2.8913,[212]2.8925,[213]2.8968,[214]2.9017,[215]2.9053,[216]2.9096,[217]2.9140,[218]2.9215,[219]2.9167,[220]2.9159,[221]2.9139,[222]2.9164,[223]2.9161,[224]2.9228,[225]2.9251,[226]2.9315,[227]2.9293,[228]2.9289,[229]2.9205,[230]2.9124,[231]2.9085,[232]2.9087,[233]2.9070,[234]2.9003,[235]2.8901,[236]2.8829,[237]2.8751,[238]2.8783,[239]2.8923,[240]2.9065,[241]2.9185,[242]2.9290,[243]2.9407,[244]2.9533,[245]2.9668,[246]2.9779,[247]2.9914,[248]3.0021,[249]3.0034,[250]3.0041,[251]2.9937,[252]2.9849,[253]2.9777,[254]2.9747,[255]2.9770,[256]2.9765,[257]2.9720,[258]2.9704,[259]2.9613,[260]2.9554,[261]2.9486,[262]2.9432,[263]2.9373,[264]2.9332,[265]2.9290,[266]2.9262,[267]2.9193,[268]2.9133,[269]2.9092,[270]2.9077,[271]2.9056,[272]2.9009,[273]2.8984,[274]2.8906,[275]2.8837,[276]2.8733,[277]2.8655,[278]2.8565,[279]2.8578,[280]2.8610,[281]2.8646,[282]2.8697,[283]2.8746,[284]2.8762,[285]2.8775,[286]2.8851,[287]2.8958,[288]2.8970,[289]2.8982,[290]2.9028,[291]2.9056,[292]2.9017,[293]2.8934,[294]2.8882,[295]2.8863,[296]2.8801,[297]2.8757,[298]2.8714,[299]2.8674,[300]2.8668,[301]2.8656,[302]2.8623,[303]2.8597,[304]2.8558,[305]2.8499,[306]2.8454,[307]2.8479,[308]2.8539,[309]2.8653,[310]2.8566,[311]2.8511,[312]2.8441,[313]2.8406,[314]2.8364,[315]2.8353,[316]2.8328,[317]2.8315,[318]2.8309,[319]2.8279,[320]2.8261,[321]2.8284,[322]2.8291,[323]2.8235,[324]2.8200,[325]2.8183,[326]2.8159,[327]2.8181,[328]2.8165,[329]2.8167,[330]2.8158,[331]2.8120,[332]2.8137,[333]2.8162,[334]2.8192,[335]2.8192,[336]2.8203,[337]2.8215,[338]2.8218,[339]2.8218,[340]2.8243,[341]2.8268,[342]2.8286,[343]2.8341,[344]2.8390,[345]2.8475,[346]2.8471,[347]2.8404,[348]2.8342,[349]2.8293,[350]2.8233,[351]2.8170,[352]2.8146,[353]2.8111,[354]2.8055,[355]2.8002,[356]2.7963,[357]2.7911,[358]2.7862,[359]2.7854,[360]2.7807,[361]2.7747,[362]2.7688,[363]2.7637,[364]2.7618,[365]2.7578,[366]2.7553,[367]2.7504,[368]2.7448,[369]2.7402,[370]2.7380,[371]2.7340,[372]2.7339,[373]2.7330,[374]2.7348,[375]2.7318,[376]2.7281,[377]2.7245,[378]2.7227,[379]2.7238,[380]2.7190,[381]2.7163,[382]2.7132,[383]2.7158,[384]2.7220,[385]2.7269,[386]2.7345,[387]2.7390,[388]2.7449,[389]2.7526,[390]2.7551,[391]2.7484,[392]2.7429,[393]2.7368,[394]2.7362,[395]2.7309,[396]2.7264,[397]2.7204,[398]2.7139,[399]2.7089,[400]2.7033,[401]2.6971,[402]2.6916,[403]2.6854,[404]2.6789,[405]2.6736,[406]2.6673,[407]2.6615,[408]2.6554,[409]2.6505,[410]2.6449,[411]2.6397,[412]2.6359,[413]2.6325,[414]2.6310,[415]2.6284,[416]2.6260,[417]2.6210,[418]2.6156,[419]2.6211,[420]2.6169,[421]2.6149,[422]2.6169,[423]2.6143,[424]2.6101,[425]2.6067,[426]2.6043,[427]2.6026,[428]2.5998,[429]2.5955,[430]2.5922,[431]2.5934,[432]2.5897,[433]2.5860,[434]2.5829,[435]2.5796,[436]2.5748,[437]2.5696,[438]2.5656,[439]2.5648,[440]2.5616,[441]2.5598,[442]2.5558,[443]2.5612,[444]2.5687,[445]2.5667,[446]2.5657,[447]2.5680,[448]2.5698,[449]2.5759,[450]2.5774,[451]2.5795,[452]2.5834,[453]2.5908,[454]2.5962,[455]2.5992,[456]2.6046,[457]2.6036,[458]2.6076,[459]2.6101,[460]2.6165,[461]2.6226,[462]2.6257,[463]2.6258,[464]2.6247,[465]2.6243,[466]2.6291,[467]2.6286,[468]2.6259,[469]2.6315,[470]2.6331,[471]2.6357,[472]2.6391,[473]2.6410,[474]2.6426,[475]2.6447,[476]2.6475,[477]2.6508,[478]2.6533,[479]2.6558,[480]2.6579,[481]2.6614,[482]2.6636,[483]2.6665,[484]2.6639,[485]2.6683,[486]2.6705,[487]2.6766,[488]2.6818,[489]2.6875,[490]2.6870,[491]2.6927,[492]2.6974,[493]2.7014,[494]2.7060,[495]2.7115,[496]2.7117,[497]2.7130,[498]2.7150,[499]2.7175,[500]2.7209,[501]2.7221,[502]2.7237,[503]2.7287,[504]2.7342,[505]2.7351,[506]2.7355,[507]2.7372,[508]2.7409,[509]2.7468,[510]2.7498,[511]2.7544,[512]2.7492,[513]2.7446,[514]2.7399,[515]2.7367,[516]2.7339,[517]2.7313,[518]2.7278,[519]2.7236,[520]2.7218,[521]2.7183,[522]2.7145,[523]2.7114,[524]2.7140,[525]2.7114,[526]2.7080,[527]2.7080,[528]2.7060,[529]2.7024,[530]2.6994,[531]2.6969,[532]2.6956,[533]2.6932,[534]2.6924,[535]2.6903,[536]2.6886,[537]2.6839,[538]2.6803,[539]2.6764,[540]2.6760,[541]2.6758,[542]2.6737,[543]2.6720,[544]2.6717,[545]2.6698,[546]2.6696,[547]2.6666,[548]2.6644,[549]2.6617,[550]2.6575,[551]2.6529,[552]2.6493,[553]2.6454,[554]2.6417,[555]2.6375,[556]2.6340,[557]2.6299,[558]2.6296,[559]2.6264,[560]2.6256,[561]2.6263,[562]2.6267,[563]2.6296,[564]2.6316,[565]2.6301,
Final estimate: PPL = 2.6301 +/- 0.01396
llama_perf_context_print: load time = 578111.23 ms
llama_perf_context_print: prompt eval time = 3290589.59 ms / 289280 tokens ( 11.38 ms per token, 87.91 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 3372645.88 ms / 289281 tokens
llama_perf_context_print: graphs reused = 0
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - Host | 1422288 = 1419125 + 351 + 2812 | |
…gml-org#19460) * model: support GLM MoE DSA arch * working version * pyright * keep indexer tensors * add indexer gguf params * loaded now * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * update * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * minor fix and cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
checks daily for new llama.cpp releases. auto-rebases cherry-picks (audio ggml-org#18641, outetss ggml-org#12794, eagle-3 ggml-org#18039). creates tagged release on clean rebase, PR on conflicts. PR ggml-org#19460 (GLM-5 DSA) already merged upstream, not in cherry-pick list.
Ref upstream vllm PR: vllm-project/vllm#34124
Important
This PR allows converting safetensors to GGUF while keeping the indexer tensors (for deepseek sparse attention), but they are left unused by the cpp code. The quality will be suboptimal
Support for indexer tensor will be in a follow-up PR. The GGUF will NOT need to be generated again
The arch should be exactly the same as GlmMoeLite (aka GLM 4.7 Flash, PR: #18936), but I'm taking time to properly moving it to a new arch while preserving the MTP tensors
Tested via the random weight: https://huggingface.co/ngxson/GLM-5-small-test