UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337

loci-dev · 2025-11-26T18:40:22Z

For small shapes where the number of columns is small (i.e. 16), the current logic skipped some chunks due to rounding.

The issue was observed with NB_COLS 8 and ne01 16, and could potentially happen with NB_COLS 4 and other combinations threads/shape.
This is also affected the corner case where chunking is disabled.

@max-krasnyansky I checked the performance here and didn't see any issue. Let me know if you'd like me to perform any particular test

Performance

RPI5

model	test	2f416b265 (7162) t/s	`3e18dba` (7161) t/s
lfm2 350M Q4_0	pp256	174.46 ± 0.07	173.41 ± 0.64
lfm2 350M Q4_0	tg128	51.58 ± 0.03	51.38 ± 0.26
lfm2 700M Q4_0	pp256	81.79 ± 0.01	82.55 ± 0.03
lfm2 700M Q4_0	tg128	25.78 ± 0.00	25.86 ± 0.00

M4 max

model	test	2f416b265 (7162) t/s	`3e18dba` (7161) t/s
lfm2 1.2B Q4_K Medium	pp256	682.39 ± 3.23	682.82 ± 2.97
lfm2 1.2B Q4_K Medium	tg128	233.77 ± 4.45	234.96 ± 0.57
lfm2 700M Q4_K Medium	pp256	1070.08 ± 2.77	1067.29 ± 7.14
lfm2 700M Q4_K Medium	tg128	331.12 ± 1.27	333.13 ± 1.32
llama 8B Q4_K Medium	pp256	100.26 ± 0.11	96.65 ± 1.75
llama 8B Q4_K Medium	tg128	43.10 ± 0.50	41.69 ± 0.72
qwen3 8B Q4_K Medium	pp256	94.40 ± 0.33	90.45 ± 0.34
qwen3 8B Q4_K Medium	tg128	40.92 ± 0.33	40.29 ± 0.27

loci-review · 2025-11-26T19:18:37Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #337

Overview

This PR implements a chunking safety fix in the REPACK matrix multiplication module (ggml/src/ggml-cpu/repack.cpp). The change adds a validation check to prevent creating chunks smaller than the minimum alignment requirement (NB_COLS) when distributing work across threads in forward_mul_mat.

Code Change Analysis

The modification affects the chunking logic by adding a condition that verifies chunk size before increasing nchunk0 to match thread count. Specifically, it calculates dr0 early and checks if (nr0 + nth - 1) / nth >= min_chunk_size before setting nchunk0 = nth. This prevents chunk overlap issues that occur with small matrix shapes (e.g., ne01=16 with NB_COLS=8 and 8 threads), where the original code would create 2-element chunks that, after alignment, would overlap and cause incorrect computations.

The fix is a correctness improvement that addresses edge cases in small matrix operations without modifying the core computation logic. The change is localized to approximately 7 lines within a single function template in one file.

Performance Impact Assessment

Based on the analysis context provided, this PR shows no measurable performance changes in the metrics. The modification is a logic fix that only affects the chunking strategy for edge cases involving small matrices. The PR author's benchmarks on RPI5 and M4 Max show variations within ±1.75%, which falls within measurement noise.

Inference Impact: No functions related to tokenization or inference (llama_decode, llama_encode, llama_tokenize) are modified by this PR. The change affects only the internal chunking mechanism in matrix multiplication operations. Therefore, there is no expected impact on tokens per second for inference workloads.

Power Consumption: No changes reported in power consumption metrics, as the fix does not alter the computational workload or instruction count for typical matrix sizes.

Fix chunks being too small with small matrix sizes

15d640e

loci-dev temporarily deployed to PROD__AL_DEMO November 26, 2025 18:40 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 50d76f4 to cbd9848 Compare December 1, 2025 11:08

loci-dev force-pushed the main branch 30 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337

UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337

Uh oh!

loci-dev commented Nov 26, 2025

Uh oh!

loci-review bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337

Are you sure you want to change the base?

UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337

Uh oh!

Conversation

loci-dev commented Nov 26, 2025

Performance

RPI5

M4 max

Uh oh!

loci-review bot commented Nov 26, 2025

Performance Analysis Summary: PR #337

Overview

Code Change Analysis

Performance Impact Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants