Skip to content

UPSTREAM PR #18724: Corrected: In ggml_cuda_mul_mat_q(), s13 is declared incorrectly to nb[2] instead of nb[3]#871

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18724-branch_michaelw9999-master
Open

UPSTREAM PR #18724: Corrected: In ggml_cuda_mul_mat_q(), s13 is declared incorrectly to nb[2] instead of nb[3]#871
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18724-branch_michaelw9999-master

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Jan 9, 2026

Mirrored from ggml-org/llama.cpp#18724

In ggml_cuda_mul_mat_q(), when in the MoE path and when quantizing src1, s13 is incorrectly declared to nb[2] instead of nb[3].

Earlier in the function:
const int64_t s03 = src0->nb[3] / ts_src0;
const int64_t s3 = dst->nb[3] / ts_dst;
...

if (!ids) {
...
{
const int64_t s11 = src1->nb[1] / ts_src1;
const int64_t s12 = src1->nb[2] / ts_src1;
const int64_t s13 = src1->nb[3] / ts_src1;
}
Later, in the ids path:
const int64_t s11 = src1->nb[1] / ts_src1;
const int64_t s12 = src1->nb[2] / ts_src1;
const int64_t s13 = src1->nb[2] / ts_src1; <--- [2] here, repeated, should be [3].

ne13 is currently asserted to 1 , so this very likely doesn't affect anything yet, but it should still be fixed for correctness.

Copilot summary:
This pull request fixes an indexing bug in the ggml_cuda_mul_mat_q function in mmq.cu. The change corrects the calculation of the s13 variable to use the correct dimension of the src1 tensor, which improves the accuracy of subsequent operations.

  • Bug fix:
    • Corrected the calculation of s13 to use src1->nb[3] instead of src1->nb[2], ensuring the correct tensor dimension is used in matrix multiplication operations.

@loci-review
Copy link

loci-review bot commented Jan 9, 2026

Explore the complete analysis inside the Version Insights

I've successfully generated a summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #871

Good News: ✅ No significant performance regressions were detected in this pull request.

Key Points:

  • The analysis compared two versions of the code (base vs. target)
  • No modified functions showed performance changes greater than 2% threshold
  • Both Response Time (execution time per function) and Throughput Time (time including callees) remained stable

Recommendation: This pull request appears safe to merge from a performance perspective, as all functions maintained stable performance within acceptable thresholds.

The analysis indicates that the changes in PR #871 for the auroralabs-loci/llama.cpp repository have not introduced any measurable performance degradation.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 4d071b3 to 8e509d5 Compare January 13, 2026 16:13
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from d664a5a to 48924ee Compare January 21, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants