Skip to content

UPSTREAM PR #17728: ggml-cpu: aarm64: remove asserts always evaluating to false in repack#413

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17728-branch_Alcpz-Alcpz/fix_wrong_asserts
Open

UPSTREAM PR #17728: ggml-cpu: aarm64: remove asserts always evaluating to false in repack#413
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17728-branch_Alcpz-Alcpz/fix_wrong_asserts

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 3, 2025

Mirrored from ggml-org/llama.cpp#17728

I got an email reporting the issue. I can't find the original comment, but for gemv these asserts don't make sense, as nr will always be 1

@loci-review
Copy link

loci-review bot commented Dec 3, 2025

Explore the complete analysis inside the Version Insights

Performance Review Summary - PR #413

Analysis: This PR removes two incorrect assertions from ARM64 GEMV functions (ggml_gemv_q4_K_8x4_q8_K and ggml_gemv_q4_K_8x8_q8_K) in repack.cpp. The assertions assert(nr % 4 == 0) were semantically wrong for vector operations where nr = 1 is expected.

Performance Impact: The modified GEMV functions are not among the top performance-impacted functions. The observed performance changes in the version comparison (parameter handling functions showing 7-12 ns increases in throughput) are unrelated to this PR. The assertion removal eliminates 1-2 instructions per call, resulting in negligible improvement (<0.1 ns).

Inference Impact: No impact on tokens per second. The core inference functions (llama_decode, llama_encode, llama_tokenize) are unchanged. GEMV operations are utility functions in the quantized computation pipeline, not direct inference entry points.

Power Consumption: The 0.21% reduction in libggml-cpu.so power consumption (244 nJ decrease) is driven by the 88 ns throughput improvement in ggml_compute_forward_silu_back_f16, not by this PR's changes.

Conclusion: This is a correctness fix with no measurable performance impact. The change enables proper execution of GEMV operations without affecting inference throughput or power consumption.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from ca4155f to b86b588 Compare December 5, 2025 22:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants