CUDA: loop over ne2*ne3 in case it overflows by am17an · Pull Request #19538 · ggml-org/llama.cpp

am17an · 2026-02-12T06:51:28Z

No description provided.

JohannesGaessler · 2026-02-12T10:33:42Z

ggml/src/ggml-cuda/convert.cu

+        const int64_t i02 = i0203 % ne02;
+        const int64_t i03 = i0203 / ne02;


Preferably use fastdiv here.

JohannesGaessler · 2026-02-12T10:34:17Z

ggml/src/ggml-cuda/convert.cu

+        const int64_t i02 = i0203 % ne02;
+        const int64_t i03 = i0203 / ne02;


Same as above.

JohannesGaessler · 2026-02-12T11:38:57Z

Since this PR is a draft: is there something that is still missing?

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv (cherry picked from commit 5065da5)

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv

CUDA: loop over ne2*ne3 in case it overflows

3b93d39

am17an mentioned this pull request Feb 12, 2026

test: mul_mat tests with huge batch size #19519

Merged

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 12, 2026

JohannesGaessler reviewed Feb 12, 2026

View reviewed changes

use fastdiv

0fd79e5

am17an marked this pull request as ready for review February 12, 2026 11:41

JohannesGaessler approved these changes Feb 12, 2026

View reviewed changes

am17an merged commit 5065da5 into ggml-org:master Feb 13, 2026
75 of 78 checks passed

am17an deleted the convert-cublas-fix branch February 13, 2026 11:38

ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Feb 16, 2026

CUDA: loop over ne2*ne3 in case it overflows (ggml-org#19538)

81a85bb

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv (cherry picked from commit 5065da5)

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

CUDA: loop over ne2*ne3 in case it overflows (ggml-org#19538)

792d5cf

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

CUDA: loop over ne2*ne3 in case it overflows (ggml-org#19538)

307f6b6

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

CUDA: loop over ne2*ne3 in case it overflows (ggml-org#19538)

99c34a5

* CUDA: loop over ne2*ne3 in case it overflows * use fastdiv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: loop over ne2*ne3 in case it overflows#19538

CUDA: loop over ne2*ne3 in case it overflows#19538
am17an merged 2 commits intoggml-org:masterfrom
am17an:convert-cublas-fix

am17an commented Feb 12, 2026

Uh oh!

JohannesGaessler Feb 12, 2026

Uh oh!

JohannesGaessler Feb 12, 2026

Uh oh!

JohannesGaessler commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const int64_t i02 = i0203 % ne02;
		const int64_t i03 = i0203 / ne02;

Conversation

am17an commented Feb 12, 2026

Uh oh!

JohannesGaessler Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants