UPSTREAM PR #19575: ggml-cpu: arm64: Fix wrong memcpy length for q4_K block_interleave == 4#1173
UPSTREAM PR #19575: ggml-cpu: arm64: Fix wrong memcpy length for q4_K block_interleave == 4#1173
Conversation
OverviewAnalysis of 115,000 functions across 15 binaries revealed 7 modified functions (0.006%), 0 new, 0 removed, and 114,993 unchanged. The changes represent a critical correctness fix in quantization repacking that prevents buffer overflow when Power consumption changes:
Function Analysis
Other analyzed functions showed negligible changes. Additional FindingsThe quantization bug fix has universal correctness benefits across all backends (CPU, CUDA, Metal, HIP, Vulkan). While the modified functions are CPU-specific, corrupted weight repacking would affect model accuracy regardless of inference backend. The fix ensures reliable quantized inference for LLM workloads across all hardware platforms, with secondary performance benefits from improved memory layout and cache efficiency. 🔎 Full breakdown: Loci Inspector. |
10f8f26 to
a6ecec6
Compare
Note
Source pull request: ggml-org/llama.cpp#19575
ggml-org/llama.cpp#19561 reports issues with the stack for Q4_K.
I can't reproduce the issue locally, but the
make_block_q4_Kx8function would write past the buffer size 4 extra bytes, which could be the issue.@taronaeo, since you found the problem, are you able to check if this patch fixes it?
Q6_K and Q5_K (ggml-org/llama.cpp#19356, still opened at the moment this description was written) already address this problem.