Skip to content

CUDA: loop over ne2*ne3 in case it overflows#19538

Merged
am17an merged 2 commits intoggml-org:masterfrom
am17an:convert-cublas-fix
Feb 13, 2026
Merged

CUDA: loop over ne2*ne3 in case it overflows#19538
am17an merged 2 commits intoggml-org:masterfrom
am17an:convert-cublas-fix

Conversation

@am17an
Copy link
Contributor

@am17an am17an commented Feb 12, 2026

No description provided.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 12, 2026
Comment on lines +21 to +22
const int64_t i02 = i0203 % ne02;
const int64_t i03 = i0203 / ne02;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably use fastdiv here.

Comment on lines +632 to +633
const int64_t i02 = i0203 % ne02;
const int64_t i03 = i0203 / ne02;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@JohannesGaessler
Copy link
Contributor

Since this PR is a draft: is there something that is still missing?

@am17an am17an marked this pull request as ready for review February 12, 2026 11:41
@am17an am17an merged commit 5065da5 into ggml-org:master Feb 13, 2026
75 of 78 checks passed
@am17an am17an deleted the convert-cublas-fix branch February 13, 2026 11:38
ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Feb 16, 2026
* CUDA: loop over ne2*ne3 in case it overflows

* use fastdiv

(cherry picked from commit 5065da5)
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* CUDA: loop over ne2*ne3 in case it overflows

* use fastdiv
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* CUDA: loop over ne2*ne3 in case it overflows

* use fastdiv
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
* CUDA: loop over ne2*ne3 in case it overflows

* use fastdiv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants