Skip to content

model: support GLM MoE DSA arch (NOTE: indexer is not yet supported)#19460

Merged
ngxson merged 10 commits intoggml-org:masterfrom
ngxson:xsn/glm_dsa
Feb 13, 2026
Merged

model: support GLM MoE DSA arch (NOTE: indexer is not yet supported)#19460
ngxson merged 10 commits intoggml-org:masterfrom
ngxson:xsn/glm_dsa

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 9, 2026

Ref upstream vllm PR: vllm-project/vllm#34124

Important

This PR allows converting safetensors to GGUF while keeping the indexer tensors (for deepseek sparse attention), but they are left unused by the cpp code. The quality will be suboptimal
Support for indexer tensor will be in a follow-up PR. The GGUF will NOT need to be generated again

The arch should be exactly the same as GlmMoeLite (aka GLM 4.7 Flash, PR: #18936), but I'm taking time to properly moving it to a new arch while preserving the MTP tensors

Tested via the random weight: https://huggingface.co/ngxson/GLM-5-small-test

@pwilkin
Copy link
Collaborator

pwilkin commented Feb 9, 2026

Well, the upstream PR has:

    "GlmMoeDsaForCausalLM": _HfExamplesInfo(
        "zai-org/GLM-5", min_transformers_version="5.0.1", is_available_online=False
    ),

:D

@DocShotgun
Copy link
Contributor

How different is GLM’s DSA compared to DeepSeek V3.2’s DSA? Curious if it would be possible to backport because DeepSeek V3.2 DSA never got llama.cpp support.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 9, 2026

@DocShotgun From the mentioned PR, it seems to be exactly the same as DSv3.2. I assume they need to separate into a new arch just because they have some specific config, like activation func or MTP layer.

I seems like DSv3.2 already supported by llama.cpp, example: https://huggingface.co/unsloth/DeepSeek-V3.2-GGUF , I'm not quite sure what's the differences between DSA and MLA implemented in DSv3 though

@DocShotgun
Copy link
Contributor

DocShotgun commented Feb 9, 2026

I seems like DSv3.2 already supported by llama.cpp, example: https://huggingface.co/unsloth/DeepSeek-V3.2-GGUF , I'm not quite sure what's the differences between DSA and MLA implemented in DSv3 though

It's possible to hack DS V3.2 to work in llama.cpp by treating it as a regular DeepSeekV3 MLA model rather than a DSA model, which is likely what the Unsloth folks did here, but proper DSA support would still be interesting.

This is where there's some discussion of how to hack non-DSA DS v3.2:
https://huggingface.co/sszymczyk/DeepSeek-V3.2-nolight-GGUF/discussions/1#695d65bd101f9d06bfd2961b
#18849
Essentially it just skips the lightning indexer tensors. There's a minor patch to how the tokenizer handles things because they ditched HF jinja templates for their own python prompt processor, but this isn't related to DSA vs MLA.

@github-actions github-actions bot added model Model specific python python script changes labels Feb 9, 2026
@pwilkin
Copy link
Collaborator

pwilkin commented Feb 10, 2026

Are we planning on implementing the sparse attention with the indexer? @ngxson

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 10, 2026

@pwilkin not yet, but I will have a look. It sounds like something gemma3n already had. gemma3n uses activation sparsity, not sparse attention

for now, I think one safe way could be to modify my PR to keep these indexer, so if glm-5 is released suddenly, then we still at least has the gguf

@pwilkin
Copy link
Collaborator

pwilkin commented Feb 10, 2026

@ngxson I was taking a look at the DSA, obviously it's black magic out there (it uses a dedicated optimized kernel for the newest CUDA devices to calculate the pre-attention logits in 8-bit), but I'll try to extract the naive logic somewhere so that we can think about how to implement it in llama.cpp.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 10, 2026

@pwilkin Yeah I'm doing the same way, extracting the logic and then let gemini + claude to compete who get the first pytorch-only version without all the optimization stuff.

But so far that takes me too much time, not yet success. So I would appreciate if you can get a naive version of DSA.

@ddh0
Copy link
Contributor

ddh0 commented Feb 11, 2026

GLM-5 is released: https://huggingface.co/zai-org/GLM-5

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 11, 2026

I guess that will be the plan B then:

for now, I think one safe way could be to modify my PR to keep these indexer, so if glm-5 is released suddenly, then we still at least has the gguf

@drrros
Copy link

drrros commented Feb 11, 2026

Testing Unsloth's UD-Q3_K_XL with this branch now, runs fine but ofc really slow (30-35 pp\11-13 t\g - within 10k context). Answers quality is ok in my very limited testing. Quants now hidden on huggingface, not sure why.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 11, 2026

@drrros they basically removed the DSA indexer tensors, as explained above. the quality will be sub-optimal and (for sure) GGUF will need to be re-converted at some points

@Panchovix
Copy link

@drrros sorry to bother, but I wonder, what is the size of Q3_K_XL on GiB and GB?

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 12, 2026

Alright, so the last version can already load this random weight with weight generated from this transformers PR.

I haven't tested it on the real weight though, if could you please give a try @bartowski1182 @danielhanchen ?

An important note that the GGUF can be converted correctly, but indexer tensors are left unused by llama.cpp code. I think support for indexer will need to be implemented in a dedicated PR.

@ngxson ngxson marked this pull request as ready for review February 12, 2026 00:15
@ngxson ngxson changed the title model: support GLM MoE DSA arch model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) Feb 12, 2026
@danielhanchen
Copy link
Contributor

danielhanchen commented Feb 12, 2026

Oh ok I'll reconvert the indexer for now. @ngxson I did try the old PR without the indexer and it worked. Hopefully the indexing code won't change anything?

@AesSedai
Copy link
Contributor

Also testing the convert out.

@AesSedai
Copy link
Contributor

@ngxson the convert worked cleanly, and I also did a Q4-ish quantization and did a PPL test on it and it looks sane:

Final estimate: PPL = 8.7486 +/- 0.17123

I did have to pip install --upgrade transformers to get the convert_hf_to_gguf to work.

Full convert_hf_to_gguf output
Starting conversion for: /mnt/srv/snowdrift/ggml/GLM-5
INFO:hf-to-gguf:Loading model: GLM-5
WARNING:hf-to-gguf:Failed to load model config from /mnt/srv/snowdrift/fp16/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GlmMoeDsaForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/srv/snowdrift/fp16/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00063-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00064-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00065-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00066-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00067-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00068-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00069-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00070-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00071-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00072-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00073-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00074-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00075-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00076-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00077-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00078-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00079-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00080-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00081-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00082-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00083-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00084-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00085-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00086-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00087-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00088-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00089-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00090-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00091-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00092-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00093-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00094-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00095-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00096-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00097-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00098-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00099-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00100-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00101-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00102-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00103-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00104-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00105-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00106-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00107-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00108-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00109-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00110-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00111-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00112-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00113-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00114-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00115-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00116-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00117-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00118-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00119-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00120-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00121-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00122-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00123-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00124-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00125-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00126-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00127-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00128-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00129-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00130-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00131-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00132-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00133-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00134-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00135-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00136-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00137-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00138-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00139-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00140-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00141-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00142-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00143-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00144-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00145-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00146-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00147-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00148-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00149-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00150-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00151-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00152-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00153-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00154-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00155-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00156-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00157-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00158-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00159-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00160-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00161-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00162-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00163-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00164-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00165-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00166-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00167-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00168-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00169-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00170-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00171-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00172-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00173-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00174-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00175-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00176-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00177-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00178-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00179-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00180-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00181-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00182-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00183-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00184-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00185-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00186-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00187-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00188-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00189-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00190-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00191-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00192-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00193-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00194-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00195-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00196-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00197-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00198-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00199-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00200-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00201-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00202-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00203-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00204-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00205-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00206-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00207-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00208-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00209-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00210-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00211-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00212-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00213-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00214-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00215-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00216-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00217-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00218-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00219-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00220-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00221-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00222-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00223-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00224-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00225-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00226-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00227-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00228-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00229-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00230-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00231-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00232-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00233-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00234-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00235-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00236-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00237-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00238-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00239-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00240-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00241-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00242-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00243-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00244-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00245-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00246-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00247-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00248-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00249-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00250-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00251-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00252-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00253-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00254-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00255-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00256-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00257-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00258-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00259-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00260-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00261-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00262-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00263-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00264-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00265-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00266-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00267-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00268-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00269-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00270-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00271-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00272-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00273-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00274-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00275-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00276-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00277-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00278-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00279-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00280-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00281-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00282-of-00282.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:output.weight,                        torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:token_embd.weight,                    torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:blk.0.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.ffn_down.weight,                torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,                torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.0.ffn_up.weight,                  torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.0.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.0.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.1.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.1.ffn_down.weight,                torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,                torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.1.ffn_up.weight,                  torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.1.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.1.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.1.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.1.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.10.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.10.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.10.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.10.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.10.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.11.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.11.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.11.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.11.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.11.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.12.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.12.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.12.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.12.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.12.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.12.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.12.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.13.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.13.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.13.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.13.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.13.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.14.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.14.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.14.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.14.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.14.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.15.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.15.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.15.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.15.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.15.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.16.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.16.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.16.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.16.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.16.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.17.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.17.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.17.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.17.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.17.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.18.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.18.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.18.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.18.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.18.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.19.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.19.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.19.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.19.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.19.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.2.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.2.ffn_down.weight,                torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,                torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.2.ffn_up.weight,                  torch.bfloat16 --> BF16, shape = {6144, 12288}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.2.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.2.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.2.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.2.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.20.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.20.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.20.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.20.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.20.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.21.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.21.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.21.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.21.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.21.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.22.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.22.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.22.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.22.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.22.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.23.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.23.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.23.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.23.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.23.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.24.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.24.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.24.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.24.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.24.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.25.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.25.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.25.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.25.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.25.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.26.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.26.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.26.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.26.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.26.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.27.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.27.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.27.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.27.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.27.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.28.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.28.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.28.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.28.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.28.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.28.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.29.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.29.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.29.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.29.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.29.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.29.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.3.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.3.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.3.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.3.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.3.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.30.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.30.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.30.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.30.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.30.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.30.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.31.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.31.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.31.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.31.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.31.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.31.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.32.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.32.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.32.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.32.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.32.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.32.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.33.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.33.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.33.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.33.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.33.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.33.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.34.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.34.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.34.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.34.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.34.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.34.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.35.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.35.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.35.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.35.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.35.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.35.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.36.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.36.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.36.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.36.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.36.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.36.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.36.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.36.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.37.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.37.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.37.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.37.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.37.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.37.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.37.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.37.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.38.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.38.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.38.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.38.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.38.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.38.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.38.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.38.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.39.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.39.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.39.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.39.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.39.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.39.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.39.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.39.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.4.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.4.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.4.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.4.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.4.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.40.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.40.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.40.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.40.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.40.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.40.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.40.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.40.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.41.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.41.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.41.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.41.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.41.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.41.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.41.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.41.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.42.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.42.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.42.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.42.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.42.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.42.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.42.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.42.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.43.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.43.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.43.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.43.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.43.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.43.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.43.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.43.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.44.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.44.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.44.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.44.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.44.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.44.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.44.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.44.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.45.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.45.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.45.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.45.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.45.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.45.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.45.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.45.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.46.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.46.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.46.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.46.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.46.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.46.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.46.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.46.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.47.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.47.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.47.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.47.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.47.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.47.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.47.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.47.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.48.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.48.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.48.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.48.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.48.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.48.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.48.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.48.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.49.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.49.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.49.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.49.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.49.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.49.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.49.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.49.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.5.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.5.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.5.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.5.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.5.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.50.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.50.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.50.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.50.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.50.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.50.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.50.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.50.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.51.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.51.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.51.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.51.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.51.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.51.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.51.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.51.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.52.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.52.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.52.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.52.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.52.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.52.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.52.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.52.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.53.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.53.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.53.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.53.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.53.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.53.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.53.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.53.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.54.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.54.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.54.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.54.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.54.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.54.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.54.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.54.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.55.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.55.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.55.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.55.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.55.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.55.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.55.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.55.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.56.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.56.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.56.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.56.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.56.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.56.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.56.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.56.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.57.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.57.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.57.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.57.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.57.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.57.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.57.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.57.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.58.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.58.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.58.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.58.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.58.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.58.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.58.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.58.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.59.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.59.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.59.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.59.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.59.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.59.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.59.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.59.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.6.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.6.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.6.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.6.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.6.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.60.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.60.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.60.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.60.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.60.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.60.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.61.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.61.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.61.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.61.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.61.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.61.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.61.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.61.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.61.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.61.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.61.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.61.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.61.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.61.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.61.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.61.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.61.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.61.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.61.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.61.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.61.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.62.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.62.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.62.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.62.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.62.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.62.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.62.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.62.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.62.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.62.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.62.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.62.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.62.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.62.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.62.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.62.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.62.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.62.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.62.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.62.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.62.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.63.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.63.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.63.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.63.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.63.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.63.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.63.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.63.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.63.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.63.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.63.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.63.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.63.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.63.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.63.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.63.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.63.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.63.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.63.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.63.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.63.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.64.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.64.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.64.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.64.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.64.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.64.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.64.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.64.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.64.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.64.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.64.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.64.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.64.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.64.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.64.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.64.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.64.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.64.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.64.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.64.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.64.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.65.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.65.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.65.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.65.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.65.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.65.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.65.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.65.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.65.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.65.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.65.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.65.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.65.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.65.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.65.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.65.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.65.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.65.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.65.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.65.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.65.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.66.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.66.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.66.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.66.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.66.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.66.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.66.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.66.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.66.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.66.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.66.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.66.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.66.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.66.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.66.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.66.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.66.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.66.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.66.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.66.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.66.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.67.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.67.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.67.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.67.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.67.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.67.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.67.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.67.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.67.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.67.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.67.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.67.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.67.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.67.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.67.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.67.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.67.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.67.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.67.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.67.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.67.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.68.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.68.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.68.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.68.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.68.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.68.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.68.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.68.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.68.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.68.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.68.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.68.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.68.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.68.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.68.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.68.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.68.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.68.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.68.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.68.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.68.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.69.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.69.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.69.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.69.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.69.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.69.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.69.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.69.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.69.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.69.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.69.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.69.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.69.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.69.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.69.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.69.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.69.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.69.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.69.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.69.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.69.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.7.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.7.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.7.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.7.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.7.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.70.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.70.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.70.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.70.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.70.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.70.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.70.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.70.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.70.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.70.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.70.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.70.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.70.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.70.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.70.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.70.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.70.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.70.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.70.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.70.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.70.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.71.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.71.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.71.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.71.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.71.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.71.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.71.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.71.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.71.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.71.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.71.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.71.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.71.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.71.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.71.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.71.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.71.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.71.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.71.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.71.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.71.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.72.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.72.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.72.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.72.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.72.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.72.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.72.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.72.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.72.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.72.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.72.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.72.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.72.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.72.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.72.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.72.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.72.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.72.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.72.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.72.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.72.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.73.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.73.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.73.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.73.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.73.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.73.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.73.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.73.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.73.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.73.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.73.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.73.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.73.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.73.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.73.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.73.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.73.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.73.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.73.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.73.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.73.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.74.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.74.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.74.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.74.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.74.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.74.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.74.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.74.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.74.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.74.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.74.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.74.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.74.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.74.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.74.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.74.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.74.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.74.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.74.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.74.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.74.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.75.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.75.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.75.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.75.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.75.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.75.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.75.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.75.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.75.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.75.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.75.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.75.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.75.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.75.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.75.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.75.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.75.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.75.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.75.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.75.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.75.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.76.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.76.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.76.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.76.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.76.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.76.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.76.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.76.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.76.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.76.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.76.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.76.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.76.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.76.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.76.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.76.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.76.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.76.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.76.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.76.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.76.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.77.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.77.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.77.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.77.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.77.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.77.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.77.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.77.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.77.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.77.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.77.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.77.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.77.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.77.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.77.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.77.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.77.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.77.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.77.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.77.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.77.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.78.nextn.eh_proj.weight,          torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.78.nextn.enorm.weight,            torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.nextn.hnorm.weight,            torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.attn_norm.weight,              torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.ffn_down_exps.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.78.ffn_gate_exps.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.78.ffn_up_exps.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.78.exp_probs_b.bias,              torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.78.ffn_gate_inp.weight,           torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.78.ffn_down_shexp.weight,         torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.78.ffn_gate_shexp.weight,         torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.ffn_up_shexp.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.ffn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.78.indexer.k_norm.bias,           torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.78.indexer.k_norm.weight,         torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.78.indexer.proj.weight,           torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.78.indexer.attn_k.weight,         torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.78.indexer.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.78.attn_kv_a_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.78.attn_kv_a_mqa.weight,          torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.78.attn_k_b.weight,               torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.78.attn_v_b.weight,               torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.78.attn_output.weight,            torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.78.attn_q_a_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.78.attn_q_a.weight,               torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.78.attn_q_b.weight,               torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.78.nextn.shared_head_norm.weight, torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.8.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.8.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.8.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.8.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:blk.9.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,           torch.bfloat16 --> BF16, shape = {2048, 6144, 256}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,           torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,             torch.bfloat16 --> BF16, shape = {6144, 2048, 256}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,               torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,            torch.bfloat16 --> F32, shape = {6144, 256}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight,          torch.bfloat16 --> BF16, shape = {2048, 6144}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight,          torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight,            torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,                torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.9.indexer.k_norm.bias,            torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.indexer.k_norm.weight,          torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.indexer.proj.weight,            torch.bfloat16 --> BF16, shape = {6144, 32}
INFO:hf-to-gguf:blk.9.indexer.attn_k.weight,          torch.bfloat16 --> BF16, shape = {6144, 128}
INFO:hf-to-gguf:blk.9.indexer.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {2048, 4096}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight,           torch.bfloat16 --> BF16, shape = {6144, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight,                torch.bfloat16 --> BF16, shape = {192, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight,                torch.bfloat16 --> BF16, shape = {512, 256, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight,             torch.bfloat16 --> BF16, shape = {16384, 6144}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:output_norm.weight,                   torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 202752
INFO:hf-to-gguf:gguf: embedding length = 6144
INFO:hf-to-gguf:gguf: feed forward length = 12288
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
WARNING:hf-to-gguf:Unknown RoPE type: default
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: expert score gating function = sigmoid
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.key_length', overwriting it with new value 576 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.value_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.leading_dense_block_count', overwriting it with new value 3 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.rope.dimension_count', overwriting it with new value 64 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.expert_gating_func', overwriting it with new value 2 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
INFO:gguf.vocab:Adding 321649 merge(s).
INFO:gguf.vocab:Setting special token type eos to 154820
INFO:gguf.vocab:Setting special token type pad to 154820
INFO:gguf.vocab:Setting special token type bos to 154822
INFO:gguf.vocab:Setting special token type eot to 154827
INFO:gguf.vocab:Setting special token type unk to 154820
INFO:gguf.vocab:Setting special token type eom to 154829
INFO:gguf.vocab:Setting chat_template to [gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson(ensure_ascii=False) }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {% set ns.last_user_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
    {%- endif %}
{%- endif %}
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
{{ '<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '</think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
    {%- set tc = tc.function %}
{%- endif %}
{{- '<tool_call>' + tc.name -}}
{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '<tool_response>' }}
{{- m.content }}
{{- '</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}
<tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/srv/snowdrift/ggml/GLM-5/GLM-5-BF16.gguf: n_tensors = 1809, total_size = 1.5T - over 1TB, split recommended
Writing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.51T/1.51T [2:48:05<00:00, 150Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mnt/srv/snowdrift/ggml/GLM-5/GLM-5-BF16.gguf

@AesSedai
Copy link
Contributor

AesSedai commented Feb 12, 2026

Hit the character limit on the last one, attached is the llama-quantize output and here is the llama-perplexity

llama-quantize.txt

llama-perplexity output
./build/bin/llama-perplexity \
    --n-gpu-layers 999 --threads 52 \
    --override-tensor "blk\..*_exps\.=CPU" \
    --flash-attn on \
    --file /mnt/srv/host/resources/KLD/ddh0_imat_calibration_data_v2.txt \ 
    --model /mnt/srv/snowdrift/gguf/GLM-5-GGUF/aes_sedai/GLM-5-Q4_K_M.gguf
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
build: 8006 (4d3daf80f) with GNU 14.2.1 for Linux x86_64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial parameters [MiB]:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 3090):  24135 total,  13338 used,  10532 free vs. target of   1024
llama_params_fit_impl:   - CUDA1 (NVIDIA GeForce RTX 3090):  24135 total,   9786 used,  14085 free vs. target of   1024
llama_params_fit_impl: projected to use 23125 MiB of device memory vs. 47743 MiB of free device memory
llama_params_fit_impl: targets for free memory can be met on all devices, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 0.57 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) (0000:06:10.0) - 23871 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3090) (0000:06:11.0) - 23871 MiB free
llama_model_loader: loaded meta data with 55 key-value pairs and 1809 tensors from /mnt/srv/snowdrift/gguf/GLM-5-GGUF/aes_sedai/GLM-5-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = glm-dsa
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   3:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   4:                               general.name str              = GLM 5
llama_model_loader: - kv   5:                            general.version str              = 5
llama_model_loader: - kv   6:                           general.basename str              = GLM
llama_model_loader: - kv   7:                         general.size_label str              = 256x22B
llama_model_loader: - kv   8:                            general.license str              = mit
llama_model_loader: - kv   9:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv  10:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv  11:                        glm-dsa.block_count u32              = 79
llama_model_loader: - kv  12:                     glm-dsa.context_length u32              = 202752
llama_model_loader: - kv  13:                   glm-dsa.embedding_length u32              = 6144
llama_model_loader: - kv  14:                glm-dsa.feed_forward_length u32              = 12288
llama_model_loader: - kv  15:               glm-dsa.attention.head_count u32              = 64
llama_model_loader: - kv  16:            glm-dsa.attention.head_count_kv u32              = 1
llama_model_loader: - kv  17:                     glm-dsa.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  18:   glm-dsa.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  19:                  glm-dsa.expert_used_count u32              = 8
llama_model_loader: - kv  20:                 glm-dsa.expert_group_count u32              = 1
llama_model_loader: - kv  21:            glm-dsa.expert_group_used_count u32              = 1
llama_model_loader: - kv  22:                 glm-dsa.expert_gating_func u32              = 2
llama_model_loader: - kv  23:               glm-dsa.attention.key_length u32              = 576
llama_model_loader: - kv  24:             glm-dsa.attention.value_length u32              = 512
llama_model_loader: - kv  25:          glm-dsa.leading_dense_block_count u32              = 3
llama_model_loader: - kv  26:                         glm-dsa.vocab_size u32              = 154880
llama_model_loader: - kv  27:              glm-dsa.attention.q_lora_rank u32              = 2048
llama_model_loader: - kv  28:             glm-dsa.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  29:           glm-dsa.attention.key_length_mla u32              = 256
llama_model_loader: - kv  30:         glm-dsa.attention.value_length_mla u32              = 256
llama_model_loader: - kv  31:         glm-dsa.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  32:                       glm-dsa.expert_count u32              = 256
llama_model_loader: - kv  33:                glm-dsa.expert_shared_count u32              = 1
llama_model_loader: - kv  34:               glm-dsa.expert_weights_scale f32              = 2.500000
llama_model_loader: - kv  35:                glm-dsa.expert_weights_norm bool             = true
llama_model_loader: - kv  36:               glm-dsa.rope.dimension_count u32              = 64
llama_model_loader: - kv  37:               glm-dsa.nextn_predict_layers u32              = 1
llama_model_loader: - kv  38:       glm-dsa.attention.indexer.head_count u32              = 32
llama_model_loader: - kv  39:       glm-dsa.attention.indexer.key_length u32              = 128
llama_model_loader: - kv  40:            glm-dsa.attention.indexer.top_k u32              = 2048
llama_model_loader: - kv  41:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  42:                         tokenizer.ggml.pre str              = glm4
llama_model_loader: - kv  43:                      tokenizer.ggml.tokens arr[str,154880]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  44:                  tokenizer.ggml.token_type arr[i32,154880]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  45:                      tokenizer.ggml.merges arr[str,321649]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  46:                tokenizer.ggml.eos_token_id u32              = 154820
llama_model_loader: - kv  47:            tokenizer.ggml.padding_token_id u32              = 154820
llama_model_loader: - kv  48:                tokenizer.ggml.bos_token_id u32              = 154822
llama_model_loader: - kv  49:                tokenizer.ggml.eot_token_id u32              = 154827
llama_model_loader: - kv  50:            tokenizer.ggml.unknown_token_id u32              = 154820
llama_model_loader: - kv  51:                tokenizer.ggml.eom_token_id u32              = 154829
llama_model_loader: - kv  52:                    tokenizer.chat_template str              = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
llama_model_loader: - kv  53:               general.quantization_version u32              = 2
llama_model_loader: - kv  54:                          general.file_type u32              = 7
llama_model_loader: - type  f32:  630 tensors
llama_model_loader: - type q8_0:  951 tensors
llama_model_loader: - type q4_K:  152 tensors
llama_model_loader: - type q5_K:   76 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 432.80 GiB (4.93 BPW) 
load: 0 unused tokens
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 154820 ('<|endoftext|>')
load:   - 154827 ('<|user|>')
load:   - 154829 ('<|observation|>')
load: special tokens cache size = 36
load: token to piece cache size = 0.9811 MB
print_info: arch                  = glm-dsa
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 202752
print_info: n_embd                = 6144
print_info: n_embd_inp            = 6144
print_info: n_layer               = 79
print_info: n_head                = 64
print_info: n_head_kv             = 1
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 576
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = 64
print_info: n_embd_k_gqa          = 576
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-05
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 12288
print_info: n_expert              = 256
print_info: n_expert_used         = 8
print_info: n_expert_groups       = 1
print_info: n_group_used          = 1
print_info: causal attn           = 1
print_info: pooling type          = 0
print_info: rope type             = 0
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 202752
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = ?B
print_info: model params          = 753.86 B
print_info: general.name          = GLM 5
print_info: n_layer_dense_lead    = 3
print_info: n_lora_q              = 2048
print_info: n_lora_kv             = 512
print_info: n_embd_head_k_mla     = 256
print_info: n_embd_head_v_mla     = 256
print_info: n_ff_exp              = 2048
print_info: n_expert_shared       = 1
print_info: expert_weights_scale  = 2.5
print_info: expert_weights_norm   = 1
print_info: expert_gating_func    = sigmoid
print_info: vocab type            = BPE
print_info: n_vocab               = 154880
print_info: n_merges              = 321649
print_info: BOS token             = 154822 '[gMASK]'
print_info: EOS token             = 154820 '<|endoftext|>'
print_info: EOT token             = 154827 '<|user|>'
print_info: EOM token             = 154829 '<|observation|>'
print_info: UNK token             = 154820 '<|endoftext|>'
print_info: PAD token             = 154820 '<|endoftext|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 154838 '<|code_prefix|>'
print_info: FIM SUF token         = 154840 '<|code_suffix|>'
print_info: FIM MID token         = 154839 '<|code_middle|>'
print_info: EOG token             = 154820 '<|endoftext|>'
print_info: EOG token             = 154827 '<|user|>'
print_info: EOG token             = 154829 '<|observation|>'
print_info: max token length      = 1024
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
model has unused tensor blk.78.attn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a_norm.weight (size = 8192 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_norm.weight (size = 2048 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.attn_q_b.weight (size = 35651584 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_mqa.weight (size = 3760128 bytes) -- ignoring
model has unused tensor blk.78.attn_k_b.weight (size = 6684672 bytes) -- ignoring
model has unused tensor blk.78.attn_v_b.weight (size = 8912896 bytes) -- ignoring
model has unused tensor blk.78.attn_output.weight (size = 106954752 bytes) -- ignoring
model has unused tensor blk.78.ffn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.bias (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.proj.weight (size = 208896 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_k.weight (size = 835584 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_q_b.weight (size = 8912896 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_inp.weight (size = 6291456 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_exps.weight (size = 1811939328 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_exps.weight (size = 2214592512 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_exps.weight (size = 1811939328 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_shexp.weight (size = 13369344 bytes) -- ignoring
model has unused tensor blk.78.nextn.eh_proj.weight (size = 80216064 bytes) -- ignoring
model has unused tensor blk.78.nextn.enorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.hnorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.shared_head_norm.weight (size = 24576 bytes) -- ignoring
load_tensors: offloading output layer to GPU
load_tensors: offloading 78 repeating layers to GPU
load_tensors: offloaded 80/80 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 436336.94 MiB
load_tensors:        CUDA0 model buffer size =  9396.39 MiB
load_tensors:        CUDA1 model buffer size =  9362.85 MiB
....................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|user|> logit bias = -inf
common_init_result: added <|observation|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 2048
llama_context: n_ctx_seq     = 512
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (512) < n_ctx_train (202752) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     2.36 MiB
llama_kv_cache:      CUDA0 KV buffer size =    90.00 MiB
llama_kv_cache:      CUDA1 KV buffer size =    85.50 MiB
llama_kv_cache: size =  175.50 MiB (   512 cells,  78 layers,  4/4 seqs), K (f16):  175.50 MiB, V (f16):    0.00 MiB
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =  3852.50 MiB
sched_reserve:      CUDA1 compute buffer size =   338.50 MiB
sched_reserve:  CUDA_Host compute buffer size =    25.01 MiB
sched_reserve: graph nodes  = 6142
sched_reserve: graph splits = 266 (with bs=512), 153 (with bs=1)
sched_reserve: reserve took 11.90 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 52 (n_threads_batch = 52) / 56 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
perplexity: tokenizing the input ..
perplexity: tokenization took 76.986 ms
perplexity: calculating perplexity over 95 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 72.99 seconds per pass - ETA 28.88 minutes
[1]16.8614,[2]16.3951,[3]15.5025,[4]14.5560,[5]15.5232,[6]15.5516,[7]15.6259,[8]16.1710,[9]12.2898,[10]9.8678,[11]8.1579,[12]7.0037,[13]6.1849,[14]5.5276,[15]6.0347,[16]6.2716,[17]6.5604,[18]7.3360,[19]8.7123,[20]10.1797,[21]12.0817,[22]13.6140,[23]14.4465,[24]15.2823,[25]16.3053,[26]16.1397,[27]16.5744,[28]17.4151,[29]17.8676,[30]17.3381,[31]17.0221,[32]17.1958,[33]17.1416,[34]17.9895,[35]17.6223,[36]16.8053,[37]16.1290,[38]15.4024,[39]14.7367,[40]14.1682,[41]13.7139,[42]12.9933,[43]12.3054,[44]11.6620,[45]11.0681,[46]10.5427,[47]10.0451,[48]9.6240,[49]9.2423,[50]8.9006,[51]9.0106,[52]8.9349,[53]8.8982,[54]8.8900,[55]8.8600,[56]8.5223,[57]8.2086,[58]7.9179,[59]7.6820,[60]7.4261,[61]7.2661,[62]7.3937,[63]7.7534,[64]8.1809,[65]8.6196,[66]9.0449,[67]9.5237,[68]9.5204,[69]9.5468,[70]9.6281,[71]9.5872,[72]9.4034,[73]9.3273,[74]9.3742,[75]9.2922,[76]9.3381,[77]9.2312,[78]9.1765,[79]9.1866,[80]9.2248,[81]9.0802,[82]9.2538,[83]9.3001,[84]9.4838,[85]9.3700,[86]9.4216,[87]9.4827,[88]9.5441,[89]9.5825,[90]9.4355,[91]9.2334,[92]9.0611,[93]8.8964,[94]8.7225,[95]8.7486,
Final estimate: PPL = 8.7486 +/- 0.17123

llama_perf_context_print:        load time =  158548.00 ms
llama_perf_context_print: prompt eval time = 1653944.47 ms / 48640 tokens (   34.00 ms per token,    29.41 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time = 1728082.65 ms / 48641 tokens
llama_perf_context_print:    graphs reused =          0
llama_memory_breakdown_print: | memory breakdown [MiB] | total    free      self    model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (RTX 3090)   | 24135 = 10455 + ( 13338 =   9396 +      90 +    3852) +         340 |
llama_memory_breakdown_print: |   - CUDA1 (RTX 3090)   | 24135 = 14017 + (  9786 =   9362 +      85 +     338) +         330 |
llama_memory_breakdown_print: |   - Host               |                  436361 = 436336 +       0 +      25                |

@drrros
Copy link

drrros commented Feb 12, 2026

@drrros sorry to bother, but I wonder, what is the size of Q3_K_XL on GiB and GB?

@Panchovix It's 336GB

@danielhanchen
Copy link
Contributor

Yep the conversion works great! I made some at https://huggingface.co/unsloth/GLM-5-GGUF using this PR - nice work @ngxson!

Comment on lines +5511 to +5512
// TODO @ngxson : TENSOR_NOT_REQUIRED was a hack, need to remove it later
flags |= TENSOR_SKIP | TENSOR_NOT_REQUIRED;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mtp layer doesn't have certain tensors compared to a normal layer. But I'm just quite lazy to fix it rn because they re unused anyway.

Nevertheless, the gguf still has the mtp layer

@jukofyork
Copy link
Collaborator

@ngxson the convert worked cleanly, and I also did a Q4-ish quantization and did a PPL test on it and it looks sane:

Final estimate: PPL = 8.7486 +/- 0.17123

I did have to pip install --upgrade transformers to get the convert_hf_to_gguf to work.
Full convert_hf_to_gguf output

Was this using 512 token chunks? It looks to use the same index top-k as deepseek-3.2:

  "index_head_dim": 128,
  "index_n_heads": 32,
  "index_topk": 2048,

https://huggingface.co/zai-org/GLM-5/blob/main/config.json

So this perplexity should be the same as with the DSA stuff working?

It would be interesting if we could get the perplexity from the Transformers version for much larger chunk sizes and then comment out the DSA stuff (or just set top-k to a huge value?) to see if the "just pretend it's dense attention" is a valid thing to try.

Alternatively we could try and use the callback mechanism in llama.cpp to dump the categorical distribution from the Softmax output to file and then analyse it?

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 12, 2026

Alternatively we could try and use the callback mechanism in llama.cpp to dump the categorical distribution from the Softmax output to file and then analyse it?

Yes I can try with the dummy weight. Btw, since the top_k is 2048, I assume that there are always at maximum 2048 tokens being masked for attention (please correct if I'm wrong here.) If that's the case, that mean we need to calculate ppl of context length of at least 4096 tokens to see if it makes any differences.

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@jukofyork
Copy link
Collaborator

Alternatively we could try and use the callback mechanism in llama.cpp to dump the categorical distribution from the Softmax output to file and then analyse it?

Yes I can try with the dummy weight. Btw, since the top_k is 2048, I assume that there are always at maximum 2048 tokens being masked for attention (please correct if I'm wrong here.) If that's the case, that mean we need to calculate ppl of context length of at least 4096 tokens to see if it makes any differences.

Yeah, it might be the case that it makes more and more difference as the context increases too (both from the flatter distribution and the autoregressive feedback loop).

@ngxson ngxson requested a review from CISC February 12, 2026 15:25
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@AesSedai
Copy link
Contributor

@jukofyork yes it was using the default which I think is 512-sized ctx chunks:

./build/bin/llama-perplexity \
    --n-gpu-layers 999 --threads 52 \
    --override-tensor "blk\..*_exps\.=CPU" \
    --flash-attn on \
    --file /mnt/srv/host/resources/KLD/ddh0_imat_calibration_data_v2.txt \ 
    --model /mnt/srv/snowdrift/gguf/GLM-5-GGUF/aes_sedai/GLM-5-Q4_K_M.gguf

I can re-run it later today with a larger context size and something like bart's v5 calibration set (this one is madison's which has some randomized utf data in it for funsies, it's short so it was quick to eval)

@ubergarm
Copy link
Contributor

ubergarm commented Feb 12, 2026

recompiled and testing convert_hf_to_gguf.py from this PRngxson:xsn/glm_dsa@7b23cd920 currently and seems to be running. Similar output log messages as Aes

👈 Details
$ uv pip freeze | grep -i trans
transformers==5.1.0

$ export SOCKET=0
$ numactl -N ${SOCKET} -m ${SOCKET} \
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/data/models/ubergarm/GLM-5-GGUF/ \
    /mnt/data/models/zai-org/GLM-5/

INFO:hf-to-gguf:Loading model: GLM-5
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/zai-org/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GlmMoeDsaForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/zai-org/GLM-5: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00282.safetensors'

.
.
.

INFO:hf-to-gguf:gguf: indexing model part 'model-00281-of-00282.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00282-of-00282.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:output.weight,                        torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:token_embd.weight,                    torch.bfloat16 --> BF16, shape = {6144, 154880}
INFO:hf-to-gguf:blk.0.attn_norm.weight,               torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:blk.0.ffn_down.weight,                torch.bfloat16 --> BF16, shape = {12288, 6144}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,                torch.bfloat16 --> BF16, shape = {6144, 12288}

.
.
.

INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_q_a.weight,                torch.bfloat16 --> BF16, shape = {6144, 2048}
INFO:hf-to-gguf:blk.9.attn_q_b.weight,                torch.bfloat16 --> BF16, shape = {2048, 16384}
INFO:hf-to-gguf:output_norm.weight,                   torch.bfloat16 --> F32, shape = {6144}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 202752
INFO:hf-to-gguf:gguf: embedding length = 6144
INFO:hf-to-gguf:gguf: feed forward length = 12288
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
WARNING:hf-to-gguf:Unknown RoPE type: default
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: expert score gating function = sigmoid
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.key_length', overwriting it with new value 576 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.attention.value_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'glm-dsa.rope.dimension_count', overwriting it with new value 64 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type glm_moe_dsa to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
INFO:gguf.vocab:Adding 321649 merge(s).
INFO:gguf.vocab:Setting special token type eos to 154820
INFO:gguf.vocab:Setting special token type pad to 154820
INFO:gguf.vocab:Setting special token type bos to 154822
INFO:gguf.vocab:Setting special token type eot to 154827
INFO:gguf.vocab:Setting special token type unk to 154820
INFO:gguf.vocab:Setting special token type eom to 154829
INFO:gguf.vocab:Setting chat_template to [gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

.
.
.

{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf: n_tensors = 85, total_size = 44.9G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00002-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00003-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00004-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00005-of-00033.gguf: n_tensors = 65, total_size = 46.8G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00006-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00007-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00008-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00009-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00010-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00011-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00012-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00013-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00014-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00015-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00016-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00017-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00018-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00019-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00020-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00021-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00022-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00023-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00024-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00025-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00026-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00027-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00028-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00029-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00030-of-00033.gguf: n_tensors = 47, total_size = 46.0G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00031-of-00033.gguf: n_tensors = 67, total_size = 46.4G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00032-of-00033.gguf: n_tensors = 51, total_size = 46.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00033-of-00033.gguf: n_tensors = 45, total_size = 33.1G

Shard (3/33):  14%|█▍        | 6.44G/46.0G [00:39<02:48, 235Mbyte/s]
Writing:   6%|| 97.3G/1.51T [05:10<1:44:31, 225Mbyte/s]

Should have time late tonight or tomorrow to test it out and could try llama-perplexity with different context lengths etc.

UPDATE

Perplexity of GLM-5-BF16.gguf on wiki.test.raw

  • ctx 512 : Final estimate: PPL = 2.6301 +/- 0.01396
  • ctx 2048: Final estimate: PPL = 2.3803 +/- 0.01157
  • ctx 4096: Final estimate: PPL = 2.4005 +/- 0.01170
👈 Perplexity Logs
model=/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf

numactl --interleave=all \
./build/bin/llama-perplexity \
    --model "$model"\
    -f wiki.test.raw \
    --seed 1337 \
    --ctx-size 512 \
    -ub 4096 -b 4096 \
    --threads 96 \
    --threads-batch 128 \
    --no-mmap \
    -fit off \
    --numa distribute

build: 7987 (7b23cd920) with GNU 13.3.0 for Linux x86_64
llama_model_loader: additional 32 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 58 key-value pairs and 1809 tensors from /mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = glm-dsa
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   3:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   4:                               general.name str              = GLM 5
llama_model_loader: - kv   5:                            general.version str              = 5
llama_model_loader: - kv   6:                           general.basename str              = GLM
llama_model_loader: - kv   7:                         general.size_label str              = 256x22B
llama_model_loader: - kv   8:                            general.license str              = mit
llama_model_loader: - kv   9:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv  10:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv  11:                        glm-dsa.block_count u32              = 79
llama_model_loader: - kv  12:                     glm-dsa.context_length u32              = 202752
llama_model_loader: - kv  13:                   glm-dsa.embedding_length u32              = 6144
llama_model_loader: - kv  14:                glm-dsa.feed_forward_length u32              = 12288
llama_model_loader: - kv  15:               glm-dsa.attention.head_count u32              = 64
llama_model_loader: - kv  16:            glm-dsa.attention.head_count_kv u32              = 1
llama_model_loader: - kv  17:                     glm-dsa.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  18:   glm-dsa.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  19:                  glm-dsa.expert_used_count u32              = 8
llama_model_loader: - kv  20:                 glm-dsa.expert_group_count u32              = 1
llama_model_loader: - kv  21:            glm-dsa.expert_group_used_count u32              = 1
llama_model_loader: - kv  22:                 glm-dsa.expert_gating_func u32              = 2
llama_model_loader: - kv  23:               glm-dsa.attention.key_length u32              = 576
llama_model_loader: - kv  24:             glm-dsa.attention.value_length u32              = 512
llama_model_loader: - kv  25:                          general.file_type u32              = 32
llama_model_loader: - kv  26:          glm-dsa.leading_dense_block_count u32              = 3
llama_model_loader: - kv  27:                         glm-dsa.vocab_size u32              = 154880
llama_model_loader: - kv  28:              glm-dsa.attention.q_lora_rank u32              = 2048
llama_model_loader: - kv  29:             glm-dsa.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  30:           glm-dsa.attention.key_length_mla u32              = 256
llama_model_loader: - kv  31:         glm-dsa.attention.value_length_mla u32              = 256
llama_model_loader: - kv  32:         glm-dsa.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  33:                       glm-dsa.expert_count u32              = 256
llama_model_loader: - kv  34:                glm-dsa.expert_shared_count u32              = 1
llama_model_loader: - kv  35:               glm-dsa.expert_weights_scale f32              = 2.500000
llama_model_loader: - kv  36:                glm-dsa.expert_weights_norm bool             = true
llama_model_loader: - kv  37:               glm-dsa.rope.dimension_count u32              = 64
llama_model_loader: - kv  38:               glm-dsa.nextn_predict_layers u32              = 1
llama_model_loader: - kv  39:       glm-dsa.attention.indexer.head_count u32              = 32
llama_model_loader: - kv  40:       glm-dsa.attention.indexer.key_length u32              = 128
llama_model_loader: - kv  41:            glm-dsa.attention.indexer.top_k u32              = 2048
llama_model_loader: - kv  42:               general.quantization_version u32              = 2
llama_model_loader: - kv  43:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  44:                         tokenizer.ggml.pre str              = glm4
llama_model_loader: - kv  45:                      tokenizer.ggml.tokens arr[str,154880]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  46:                  tokenizer.ggml.token_type arr[i32,154880]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  47:                      tokenizer.ggml.merges arr[str,321649]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  48:                tokenizer.ggml.eos_token_id u32              = 154820
llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 154820
llama_model_loader: - kv  50:                tokenizer.ggml.bos_token_id u32              = 154822
llama_model_loader: - kv  51:                tokenizer.ggml.eot_token_id u32              = 154827
llama_model_loader: - kv  52:            tokenizer.ggml.unknown_token_id u32              = 154820
llama_model_loader: - kv  53:                tokenizer.ggml.eom_token_id u32              = 154829
llama_model_loader: - kv  54:                    tokenizer.chat_template str              = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
llama_model_loader: - kv  55:                                   split.no u16              = 0
llama_model_loader: - kv  56:                                split.count u16              = 33
llama_model_loader: - kv  57:                        split.tensors.count i32              = 1809
llama_model_loader: - type  f32:  630 tensors
llama_model_loader: - type bf16: 1179 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = BF16
print_info: file size   = 1404.41 GiB (16.00 BPW) 
load: 0 unused tokens
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 154820 ('<|endoftext|>')
load:   - 154827 ('<|user|>')
load:   - 154829 ('<|observation|>')
load: special tokens cache size = 36
load: token to piece cache size = 0.9811 MB
print_info: arch                  = glm-dsa
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 202752
print_info: n_embd                = 6144
print_info: n_embd_inp            = 6144
print_info: n_layer               = 79
print_info: n_head                = 64
print_info: n_head_kv             = 1
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 576
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = 64
print_info: n_embd_k_gqa          = 576
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-05
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 12288
print_info: n_expert              = 256
print_info: n_expert_used         = 8
print_info: n_expert_groups       = 1
print_info: n_group_used          = 1
print_info: causal attn           = 1
print_info: pooling type          = 0
print_info: rope type             = 0
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 202752
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 744B.A40B
print_info: model params          = 753.86 B
print_info: general.name          = GLM 5
print_info: n_layer_dense_lead    = 3
print_info: n_lora_q              = 2048
print_info: n_lora_kv             = 512
print_info: n_embd_head_k_mla     = 256
print_info: n_embd_head_v_mla     = 256
print_info: n_ff_exp              = 2048
print_info: n_expert_shared       = 1
print_info: expert_weights_scale  = 2.5
print_info: expert_weights_norm   = 1
print_info: expert_gating_func    = sigmoid
print_info: vocab type            = BPE
print_info: n_vocab               = 154880
print_info: n_merges              = 321649
print_info: BOS token             = 154822 '[gMASK]'
print_info: EOS token             = 154820 '<|endoftext|>'
print_info: EOT token             = 154827 '<|user|>'
print_info: EOM token             = 154829 '<|observation|>'
print_info: UNK token             = 154820 '<|endoftext|>'
print_info: PAD token             = 154820 '<|endoftext|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 154838 '<|code_prefix|>'
print_info: FIM SUF token         = 154840 '<|code_suffix|>'
print_info: FIM MID token         = 154839 '<|code_middle|>'
print_info: EOG token             = 154820 '<|endoftext|>'
print_info: EOG token             = 154827 '<|user|>'
print_info: EOG token             = 154829 '<|observation|>'
print_info: max token length      = 1024
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
model has unused tensor blk.78.attn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a_norm.weight (size = 8192 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_norm.weight (size = 2048 bytes) -- ignoring
model has unused tensor blk.78.attn_q_a.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.attn_q_b.weight (size = 67108864 bytes) -- ignoring
model has unused tensor blk.78.attn_kv_a_mqa.weight (size = 7077888 bytes) -- ignoring
model has unused tensor blk.78.attn_k_b.weight (size = 12582912 bytes) -- ignoring
model has unused tensor blk.78.attn_v_b.weight (size = 16777216 bytes) -- ignoring
model has unused tensor blk.78.attn_output.weight (size = 201326592 bytes) -- ignoring
model has unused tensor blk.78.ffn_norm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.k_norm.bias (size = 512 bytes) -- ignoring
model has unused tensor blk.78.indexer.proj.weight (size = 393216 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_k.weight (size = 1572864 bytes) -- ignoring
model has unused tensor blk.78.indexer.attn_q_b.weight (size = 16777216 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_inp.weight (size = 6291456 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_exps.weight (size = 6442450944 bytes) -- ignoring
model has unused tensor blk.78.ffn_gate_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.ffn_down_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.ffn_up_shexp.weight (size = 25165824 bytes) -- ignoring
model has unused tensor blk.78.nextn.eh_proj.weight (size = 150994944 bytes) -- ignoring
model has unused tensor blk.78.nextn.enorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.hnorm.weight (size = 24576 bytes) -- ignoring
model has unused tensor blk.78.nextn.shared_head_norm.weight (size = 24576 bytes) -- ignoring
load_tensors:          CPU model buffer size = 1419125.34 MiB
....................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|user|> logit bias = -inf
common_init_result: added <|observation|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 8
llama_context: n_ctx         = 4096
llama_context: n_ctx_seq     = 512
llama_context: n_batch       = 4096
llama_context: n_ubatch      = 4096
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (512) < n_ctx_train (202752) -- the full capacity of the model will not be utilized
llama_context:        CPU  output buffer size =     4.73 MiB
llama_kv_cache:        CPU KV buffer size =   351.00 MiB
llama_kv_cache: size =  351.00 MiB (   512 cells,  78 layers,  8/8 seqs), K (f16):  351.00 MiB, V (f16):    0.00 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve:        CPU compute buffer size =  2812.08 MiB
sched_reserve: graph nodes  = 6142
sched_reserve: graph splits = 1
sched_reserve: reserve took 16.46 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 96 (n_threads_batch = 128) / 512 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
perplexity: tokenizing the input ..
perplexity: tokenization took 640.663 ms
perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
perplexity: 70.77 seconds per pass - ETA 1 hours 23.30 minutes
[1]1.2357,[2]1.9927,[3]1.7750,[4]1.5647,[5]1.4559,[6]1.3997,[7]1.3700,[8]1.3376,[9]1.3269,[10]1.3065,[11]1.2977,[12]1.3379,[13]1.3350,[14]1.3943,[15]1.4928,[16]1.5953,[17]1.7099,[18]1.8518,[19]1.8477,[20]1.8447,[21]1.9198,[22]1.9402,[23]1.9293,[24]1.9153,[25]1.9054,[26]1.9033,[27]1.9159,[28]1.9434,[29]1.9573,[30]2.0131,[31]2.0672,[32]2.1056,[33]2.1480,[34]2.1778,[35]2.2229,[36]2.2623,[37]2.2885,[38]2.3734,[39]2.4117,[40]2.4554,[41]2.5189,[42]2.5108,[43]2.5282,[44]2.5558,[45]2.6263,[46]2.6778,[47]2.6401,[48]2.6036,[49]2.5770,[50]2.5644,[51]2.5814,[52]2.6092,[53]2.6467,[54]2.6713,[55]2.6982,[56]2.7253,[57]2.7241,[58]2.7472,[59]2.7624,[60]2.7969,[61]2.8294,[62]2.8765,[63]2.9137,[64]2.9403,[65]2.9578,[66]2.9543,[67]2.9297,[68]2.9118,[69]2.9337,[70]2.9180,[71]2.9024,[72]2.9028,[73]2.9087,[74]2.9343,[75]2.9370,[76]2.9038,[77]2.8703,[78]2.8418,[79]2.8141,[80]2.7915,[81]2.7685,[82]2.7566,[83]2.7556,[84]2.7319,[85]2.7192,[86]2.7095,[87]2.7006,[88]2.6854,[89]2.6652,[90]2.6532,[91]2.6343,[92]2.6140,[93]2.6040,[94]2.5891,[95]2.5748,[96]2.5671,[97]2.5749,[98]2.5681,[99]2.5545,[100]2.5355,[101]2.5441,[102]2.5294,[103]2.5191,[104]2.5126,[105]2.5219,[106]2.5452,[107]2.5945,[108]2.6043,[109]2.6138,[110]2.6481,[111]2.6703,[112]2.6503,[113]2.6379,[114]2.6379,[115]2.6356,[116]2.6412,[117]2.6428,[118]2.6467,[119]2.6509,[120]2.6473,[121]2.6385,[122]2.6417,[123]2.6293,[124]2.6287,[125]2.6297,[126]2.6298,[127]2.6289,[128]2.6429,[129]2.6493,[130]2.6485,[131]2.6605,[132]2.6603,[133]2.6589,[134]2.6730,[135]2.6908,[136]2.6844,[137]2.6801,[138]2.6760,[139]2.6642,[140]2.6775,[141]2.6792,[142]2.6712,[143]2.6699,[144]2.6708,[145]2.6700,[146]2.6652,[147]2.6527,[148]2.6472,[149]2.6434,[150]2.6403,[151]2.6332,[152]2.6322,[153]2.6359,[154]2.6355,[155]2.6361,[156]2.6397,[157]2.6422,[158]2.6448,[159]2.6551,[160]2.6643,[161]2.6701,[162]2.6604,[163]2.6486,[164]2.6518,[165]2.6428,[166]2.6402,[167]2.6524,[168]2.6524,[169]2.6759,[170]2.6912,[171]2.7016,[172]2.7193,[173]2.7110,[174]2.6991,[175]2.6869,[176]2.6758,[177]2.6633,[178]2.6498,[179]2.6393,[180]2.6279,[181]2.6229,[182]2.6362,[183]2.6532,[184]2.6769,[185]2.6942,[186]2.7044,[187]2.7228,[188]2.7454,[189]2.7668,[190]2.7826,[191]2.7970,[192]2.8064,[193]2.8134,[194]2.8176,[195]2.8154,[196]2.8178,[197]2.8305,[198]2.8450,[199]2.8450,[200]2.8522,[201]2.8546,[202]2.8580,[203]2.8571,[204]2.8656,[205]2.8737,[206]2.8803,[207]2.8871,[208]2.8875,[209]2.8906,[210]2.8869,[211]2.8913,[212]2.8925,[213]2.8968,[214]2.9017,[215]2.9053,[216]2.9096,[217]2.9140,[218]2.9215,[219]2.9167,[220]2.9159,[221]2.9139,[222]2.9164,[223]2.9161,[224]2.9228,[225]2.9251,[226]2.9315,[227]2.9293,[228]2.9289,[229]2.9205,[230]2.9124,[231]2.9085,[232]2.9087,[233]2.9070,[234]2.9003,[235]2.8901,[236]2.8829,[237]2.8751,[238]2.8783,[239]2.8923,[240]2.9065,[241]2.9185,[242]2.9290,[243]2.9407,[244]2.9533,[245]2.9668,[246]2.9779,[247]2.9914,[248]3.0021,[249]3.0034,[250]3.0041,[251]2.9937,[252]2.9849,[253]2.9777,[254]2.9747,[255]2.9770,[256]2.9765,[257]2.9720,[258]2.9704,[259]2.9613,[260]2.9554,[261]2.9486,[262]2.9432,[263]2.9373,[264]2.9332,[265]2.9290,[266]2.9262,[267]2.9193,[268]2.9133,[269]2.9092,[270]2.9077,[271]2.9056,[272]2.9009,[273]2.8984,[274]2.8906,[275]2.8837,[276]2.8733,[277]2.8655,[278]2.8565,[279]2.8578,[280]2.8610,[281]2.8646,[282]2.8697,[283]2.8746,[284]2.8762,[285]2.8775,[286]2.8851,[287]2.8958,[288]2.8970,[289]2.8982,[290]2.9028,[291]2.9056,[292]2.9017,[293]2.8934,[294]2.8882,[295]2.8863,[296]2.8801,[297]2.8757,[298]2.8714,[299]2.8674,[300]2.8668,[301]2.8656,[302]2.8623,[303]2.8597,[304]2.8558,[305]2.8499,[306]2.8454,[307]2.8479,[308]2.8539,[309]2.8653,[310]2.8566,[311]2.8511,[312]2.8441,[313]2.8406,[314]2.8364,[315]2.8353,[316]2.8328,[317]2.8315,[318]2.8309,[319]2.8279,[320]2.8261,[321]2.8284,[322]2.8291,[323]2.8235,[324]2.8200,[325]2.8183,[326]2.8159,[327]2.8181,[328]2.8165,[329]2.8167,[330]2.8158,[331]2.8120,[332]2.8137,[333]2.8162,[334]2.8192,[335]2.8192,[336]2.8203,[337]2.8215,[338]2.8218,[339]2.8218,[340]2.8243,[341]2.8268,[342]2.8286,[343]2.8341,[344]2.8390,[345]2.8475,[346]2.8471,[347]2.8404,[348]2.8342,[349]2.8293,[350]2.8233,[351]2.8170,[352]2.8146,[353]2.8111,[354]2.8055,[355]2.8002,[356]2.7963,[357]2.7911,[358]2.7862,[359]2.7854,[360]2.7807,[361]2.7747,[362]2.7688,[363]2.7637,[364]2.7618,[365]2.7578,[366]2.7553,[367]2.7504,[368]2.7448,[369]2.7402,[370]2.7380,[371]2.7340,[372]2.7339,[373]2.7330,[374]2.7348,[375]2.7318,[376]2.7281,[377]2.7245,[378]2.7227,[379]2.7238,[380]2.7190,[381]2.7163,[382]2.7132,[383]2.7158,[384]2.7220,[385]2.7269,[386]2.7345,[387]2.7390,[388]2.7449,[389]2.7526,[390]2.7551,[391]2.7484,[392]2.7429,[393]2.7368,[394]2.7362,[395]2.7309,[396]2.7264,[397]2.7204,[398]2.7139,[399]2.7089,[400]2.7033,[401]2.6971,[402]2.6916,[403]2.6854,[404]2.6789,[405]2.6736,[406]2.6673,[407]2.6615,[408]2.6554,[409]2.6505,[410]2.6449,[411]2.6397,[412]2.6359,[413]2.6325,[414]2.6310,[415]2.6284,[416]2.6260,[417]2.6210,[418]2.6156,[419]2.6211,[420]2.6169,[421]2.6149,[422]2.6169,[423]2.6143,[424]2.6101,[425]2.6067,[426]2.6043,[427]2.6026,[428]2.5998,[429]2.5955,[430]2.5922,[431]2.5934,[432]2.5897,[433]2.5860,[434]2.5829,[435]2.5796,[436]2.5748,[437]2.5696,[438]2.5656,[439]2.5648,[440]2.5616,[441]2.5598,[442]2.5558,[443]2.5612,[444]2.5687,[445]2.5667,[446]2.5657,[447]2.5680,[448]2.5698,[449]2.5759,[450]2.5774,[451]2.5795,[452]2.5834,[453]2.5908,[454]2.5962,[455]2.5992,[456]2.6046,[457]2.6036,[458]2.6076,[459]2.6101,[460]2.6165,[461]2.6226,[462]2.6257,[463]2.6258,[464]2.6247,[465]2.6243,[466]2.6291,[467]2.6286,[468]2.6259,[469]2.6315,[470]2.6331,[471]2.6357,[472]2.6391,[473]2.6410,[474]2.6426,[475]2.6447,[476]2.6475,[477]2.6508,[478]2.6533,[479]2.6558,[480]2.6579,[481]2.6614,[482]2.6636,[483]2.6665,[484]2.6639,[485]2.6683,[486]2.6705,[487]2.6766,[488]2.6818,[489]2.6875,[490]2.6870,[491]2.6927,[492]2.6974,[493]2.7014,[494]2.7060,[495]2.7115,[496]2.7117,[497]2.7130,[498]2.7150,[499]2.7175,[500]2.7209,[501]2.7221,[502]2.7237,[503]2.7287,[504]2.7342,[505]2.7351,[506]2.7355,[507]2.7372,[508]2.7409,[509]2.7468,[510]2.7498,[511]2.7544,[512]2.7492,[513]2.7446,[514]2.7399,[515]2.7367,[516]2.7339,[517]2.7313,[518]2.7278,[519]2.7236,[520]2.7218,[521]2.7183,[522]2.7145,[523]2.7114,[524]2.7140,[525]2.7114,[526]2.7080,[527]2.7080,[528]2.7060,[529]2.7024,[530]2.6994,[531]2.6969,[532]2.6956,[533]2.6932,[534]2.6924,[535]2.6903,[536]2.6886,[537]2.6839,[538]2.6803,[539]2.6764,[540]2.6760,[541]2.6758,[542]2.6737,[543]2.6720,[544]2.6717,[545]2.6698,[546]2.6696,[547]2.6666,[548]2.6644,[549]2.6617,[550]2.6575,[551]2.6529,[552]2.6493,[553]2.6454,[554]2.6417,[555]2.6375,[556]2.6340,[557]2.6299,[558]2.6296,[559]2.6264,[560]2.6256,[561]2.6263,[562]2.6267,[563]2.6296,[564]2.6316,[565]2.6301,
Final estimate: PPL = 2.6301 +/- 0.01396

llama_perf_context_print:        load time =  578111.23 ms
llama_perf_context_print: prompt eval time = 3290589.59 ms / 289280 tokens (   11.38 ms per token,    87.91 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time = 3372645.88 ms / 289281 tokens
llama_perf_context_print:    graphs reused =          0
llama_memory_breakdown_print: | memory breakdown [MiB] | total   free       self     model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - Host               |                 1422288 = 1419125 +     351 +    2812                |

@ngxson ngxson merged commit 752584d into ggml-org:master Feb 13, 2026
78 of 79 checks passed
francesco-cattoglio pushed a commit to francesco-cattoglio/llama.cpp that referenced this pull request Feb 13, 2026
…gml-org#19460)

* model: support GLM MoE DSA arch

* working version

* pyright

* keep indexer tensors

* add indexer gguf params

* loaded now

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* update

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* minor fix and cleanup

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
TimPietruskyRunPod pushed a commit to runpod-workers/openclaw2go-llamacpp that referenced this pull request Feb 14, 2026
checks daily for new llama.cpp releases.
auto-rebases cherry-picks (audio ggml-org#18641, outetss ggml-org#12794, eagle-3 ggml-org#18039).
creates tagged release on clean rebase, PR on conflicts.
PR ggml-org#19460 (GLM-5 DSA) already merged upstream, not in cherry-pick list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.