Skip to content

Conversation

@ikawrakow
Copy link
Owner

ffn_down_exps row sizes are not a multiple of 256 in DeepSeek-Lite. When using --pure with llama-quantize this leads to a crash. I got tired of having to do custom quantization overrides in that case, so this PR adds the check for divisibility by the quantization block size also for --pure, and uses the fallback quantization type if necessary.

Iwan Kawrakow added 10 commits March 26, 2025 12:32
I often want to quantize with --pure to see quantization performance
without quantization mixes. But for models where there qre tensors
with row sizes that are not multiple of 256, this results in a crash
for k- and i-quants. Hence, lets add a check if the quant selected
via --pure is applicable, and change it if not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants