UPSTREAM PR #18724: Corrected: In ggml_cuda_mul_mat_q(), s13 is declared incorrectly to nb[2] instead of nb[3] by loci-dev · Pull Request #871 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-09T19:35:02Z

In ggml_cuda_mul_mat_q(), when in the MoE path and when quantizing src1, s13 is incorrectly declared to nb[2] instead of nb[3].

Earlier in the function:
const int64_t s03 = src0->nb[3] / ts_src0;
const int64_t s3 = dst->nb[3] / ts_dst;
...

if (!ids) {
...
{
const int64_t s11 = src1->nb[1] / ts_src1;
const int64_t s12 = src1->nb[2] / ts_src1;
const int64_t s13 = src1->nb[3] / ts_src1;
}
Later, in the ids path:
const int64_t s11 = src1->nb[1] / ts_src1;
const int64_t s12 = src1->nb[2] / ts_src1;
const int64_t s13 = src1->nb[2] / ts_src1; <--- [2] here, repeated, should be [3].

ne13 is currently asserted to 1 , so this very likely doesn't affect anything yet, but it should still be fixed for correctness.

Copilot summary:
This pull request fixes an indexing bug in the ggml_cuda_mul_mat_q function in mmq.cu. The change corrects the calculation of the s13 variable to use the correct dimension of the src1 tensor, which improves the accuracy of subsequent operations.

Bug fix:
- Corrected the calculation of s13 to use src1->nb[3] instead of src1->nb[2], ensuring the correct tensor dimension is used in matrix multiplication operations.

loci-review · 2026-01-09T20:19:29Z

Explore the complete analysis inside the Version Insights

I've successfully generated a summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #871

Good News: ✅ No significant performance regressions were detected in this pull request.

Key Points:

The analysis compared two versions of the code (base vs. target)
No modified functions showed performance changes greater than 2% threshold
Both Response Time (execution time per function) and Throughput Time (time including callees) remained stable

Recommendation: This pull request appears safe to merge from a performance perspective, as all functions maintained stable performance within acceptable thresholds.

The analysis indicates that the changes in PR #871 for the auroralabs-loci/llama.cpp repository have not introduced any measurable performance degradation.

Corrected: changed s13 = src1->nb[3] instead of nb[2]

9d13905

loci-dev temporarily deployed to PROD__AL_DEMO January 9, 2026 19:35 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 4d071b3 to 8e509d5 Compare January 13, 2026 16:13

loci-dev force-pushed the main branch 30 times, most recently from d664a5a to 48924ee Compare January 21, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18724: Corrected: In ggml_cuda_mul_mat_q(), s13 is declared incorrectly to nb[2] instead of nb[3]#871

UPSTREAM PR #18724: Corrected: In ggml_cuda_mul_mat_q(), s13 is declared incorrectly to nb[2] instead of nb[3]#871
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18724-branch_michaelw9999-master

loci-dev commented Jan 9, 2026

Uh oh!

loci-review bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 9, 2026

Uh oh!

loci-review bot commented Jan 9, 2026

Summary Report for llama.cpp PR #871

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants