UPSTREAM PR #19278: ggml: added cleanups in ggml_quantize_free by loci-dev · Pull Request #1139 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-02T23:41:46Z

Note

Source pull request: ggml-org/llama.cpp#19278

Add missing cleanup calls for IQ2_S, IQ1_M quantization types and IQ3XS with 512 blocks during quantization cleanup.

loci-review · 2026-02-03T01:06:28Z

Overview

Analysis of llama.cpp across 115,472 functions (6 modified, 0 new, 0 removed) reveals minimal performance impact from a single commit fixing memory leaks in quantization cleanup. Power consumption changes are negligible across all 15 binaries:

Binaries analyzed:

build.bin.libggml-base.so: +0.062%
build.bin.llama-tts: -0.000%
build.bin.libmtmd.so: +0.000%
build.bin.llama-cvector-generator: +0.000%
build.bin.libllama.so: -0.000%
build.bin.llama-bench: 0.000%
build.bin.llama-tokenize: 0.000%
build.bin.llama-quantize: 0.000%
build.bin.llama-qwen2vl-cli: 0.000%
build.bin.libggml-cpu.so: 0.000%
build.bin.libggml.so: 0.000%
build.bin.llama-gemma3-cli: 0.000%
build.bin.llama-gguf-split: 0.000%
build.bin.llama-llava-cli: 0.000%
build.bin.llama-minicpmv-cli: 0.000%

Critical inference paths (llama_decode, matrix operations, attention, KV cache) remain unchanged.

Function Analysis

ggml_quantize_free (libggml-base.so): Response time increased 2,787ns → 4,656ns (+67%, +1,869ns), throughput time 26ns → 34ns (+32%, +8ns). This intentional regression adds cleanup for IQ2_S, IQ1_M, and IQ3_S-512 quantization formats, fixing memory leaks. Impact occurs only at program shutdown, not during inference.

std::map::_M_emplace_hint_unique (libggml-base.so): Response time improved 3,512ns → 3,456ns (-1.6%, -57ns), throughput time 195ns → 139ns (-29%, -57ns). Used in graph construction for tensor relationship tracking. Improvement likely from reduced heap fragmentation after leak fixes.

std::vector<gguf_kv>::cbegin (libggml-base.so): Response time increased 84ns → 172ns (+105%, +88ns), throughput time 62ns → 151ns (+141%, +88ns). Standard library accessor showing compiler optimization artifact during GGUF metadata parsing (one-time model loading operation).

Other analyzed functions (gguf_type_name, std::vector::resize, std::vector::_M_realloc_insert) showed changes under ±26ns with no meaningful impact.

Additional Findings

The commit (d3f8406) successfully prevents memory leaks for three quantization formats without affecting inference performance. Fixed leaks improve heap allocator efficiency, yielding beneficial side effects in container operations. All GPU backends (CUDA, Metal, HIP, Vulkan) and performance-critical operations remain unmodified. The 45.49 nanojoule power increase represents unmeasurable energy cost in any deployment scenario.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

ggml: added cleanups in ggml_quantize_free

d3f8406

Add missing cleanup calls for IQ2_S, IQ1_M quantization types and IQ3XS with 512 blocks during quantization cleanup.

loci-dev temporarily deployed to PROD__AL_DEMO February 2, 2026 23:41 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from cbda11a to 03fef13 Compare February 3, 2026 00:46

loci-dev force-pushed the main branch 12 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

loci-dev force-pushed the main branch 4 times, most recently from 9ea4a65 to c001e9f Compare February 22, 2026 02:17

loci-dev force-pushed the main branch 9 times, most recently from ef246cc to 8c889a6 Compare March 2, 2026 02:17

loci-dev force-pushed the main branch 8 times, most recently from 17452e3 to 551dfb5 Compare March 10, 2026 02:17

loci-dev force-pushed the main branch 10 times, most recently from 3c7b997 to 5ac00d6 Compare March 17, 2026 02:18

loci-dev force-pushed the main branch from 5ac00d6 to 998dd7a Compare March 18, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19278: ggml: added cleanups in ggml_quantize_free#1139

UPSTREAM PR #19278: ggml: added cleanups in ggml_quantize_free#1139
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19278-master

loci-dev commented Feb 2, 2026

Uh oh!

loci-review bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 2, 2026

Uh oh!

loci-review bot commented Feb 3, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants