Skip to content

Move duplicated imatrix code into single common imatrix-loader.cpp#22445

Merged
pwilkin merged 2 commits into
ggml-org:masterfrom
bartowski1182:imatrix-dup
Jun 4, 2026
Merged

Move duplicated imatrix code into single common imatrix-loader.cpp#22445
pwilkin merged 2 commits into
ggml-org:masterfrom
bartowski1182:imatrix-dup

Conversation

@bartowski1182
Copy link
Copy Markdown
Contributor

Overview

quantize.cpp and imatrix.cpp duplicated the same code for loading the imatrix

This change pulls those functions out to a common file with the same imatrix and legacy imatrix loading functions

Each handled post-loading differently with imatrix.cpp keeping the raw sums and counts so they can be merged (with --in-file), whereas quantize.cpp normalized immediately, so the new common_imatrix struct stores the information as it's loaded and each can do with it as they need

Also fixed a minor bug that caused --show-statistics to fail (params weren't set before being used) so that I could use it to compare before and after

Additional information

Tested parity of before and after by running the following scenarios and running sha256sum on the results:

./build/bin/llama-imatrix --in-file ../google_gemma-4-E4B-it-imatrix.gguf -m ../google_gemma-4-E4B-it-bf16.gguf -f ../groups_merged.txt -o test.gguf
./build/bin/llama-imatrix -m ../google_gemma-4-E4B-it-bf16.gguf -f ../groups_merged.txt -o test-legacy.dat --output-format dat
./build/bin/llama-imatrix --in-file test-legacy.dat -m ../google_gemma-4-E4B-it-bf16.gguf -f ../groups_merged.txt -o test-legacy-in.dat--output-format dat
./build/bin/llama-quantize --imatrix test.gguf ../google_gemma-4-E4B-it-bf16.gguf ./gemma-4-test-Q4_K_M.gguf Q4_K_M
./build/bin/llama-quantize --imatrix test-legacy.dat ../google_gemma-4-E4B-it-bf16.gguf ./gemma-4-test-Q4_K_M.gguf Q4_K_M

I also ran --show-statistics since that's the other place the loader is used with imatrix.cpp and the results are the same before and after as well

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, primary method of deduplication, all testing ran manually

(ironically a duplicated PR because I'm bad at git)

@bartowski1182 bartowski1182 marked this pull request as ready for review April 28, 2026 16:14
@bartowski1182 bartowski1182 requested review from a team and ggerganov as code owners April 28, 2026 16:14
@bartowski1182
Copy link
Copy Markdown
Contributor Author

@ggerganov can I get another review on this one?

@pwilkin pwilkin merged commit e7bcf1c into ggml-org:master Jun 4, 2026
40 of 46 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 4, 2026
* origin/master: (57 commits)
server : disable on-device spec checkpoints (ggml-org#24108)
arg: fix double mtp downloads (ggml-org#24128)
webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (ggml-org#23132)
Move duplicated imatrix code into single common imatrix-loader.cpp (ggml-org#22445)
ui: Fixed packages (ggml-org#24119)
ui: added single line reasoning preview (ggml-org#23601)
return filter to save memory (ggml-org#24125)
convert: Fix Gemma 4 Unified conversion (ggml-org#24118)
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (ggml-org#22209)
server: avoid unnecessary checkpoint restore when new tokens are present (ggml-org#24110)
agents: refactor, include more guidelines (ggml-org#24111)
webui: fix tool selector toggle/counter, key tools by stable identity (ggml-org#24065)
build : use umbrella Headers directory for XCFramework module map (ggml-org#23974)
server : add header to tools/server/server-http.h (ggml-org#24089)
cmake: skip cvector-generator and export-lora when CPU backend is disabled (ggml-org#24053)
fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091)
readme : add status badges (ggml-org#24104)
tests : refactor test-save-load-state to accept token input (ggml-org#24073)
metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074)
ggml-webgpu: FlashAttention refactor + standardize quantization support (ggml-org#23834)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants