Skip to content

ggml-cuda: refactor fusion code#22468

Merged
am17an merged 2 commits into
ggml-org:masterfrom
am17an:cuda-fusion-detect-dispatch
Apr 29, 2026
Merged

ggml-cuda: refactor fusion code#22468
am17an merged 2 commits into
ggml-org:masterfrom
am17an:cuda-fusion-detect-dispatch

Conversation

@am17an

@am17an am17an commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Overview

Refactor the fusion code to be a single function. Also fix a bug in the fusion code where it does not check the value of the env variable to disable fusion.

Additional information

Requirements

@am17an am17an requested a review from a team as a code owner April 28, 2026 11:20
Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated
// try and fuse nodes and return the number of nodes to skip
static int ggml_cuda_try_fuse(ggml_backend_cuda_context * cuda_ctx, ggml_cgraph * cgraph, int i) {

static bool disable_fusion = getenv("GGML_CUDA_DISABLE_FUSION") != nullptr && std::atoi(getenv("GGML_CUDA_DISABLE_FUSION")) == 1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with doing an explicit check for the value, but it seems this is inconsistent with how GGML checks env variables in different backends and within CUDA.

For example, consider GGML_VK_DISABLE_FUSION or GGML_CUDA_DISABLE_GRAPHS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just generally inconsistent across the codebase, I prefer explicitly checking the value because it allows me to bench using 0,1. There are cases where it is done like that (e.g. GGML_CUDA_GRAPH_OPT, LLAMA_ATTN_ROT_DISABLE) already, so there is no convention per se.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I don't have a strong opinion on this.
So, the PR is fine as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely would not expect you to refactor environment variables in this PR but my preference would be to have the same semantics as for the truth values of integers in C/C++. Meaning that a value of 0 is false and all other values are true. With this implementation a value of 2 would be evaluated as false.

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 28, 2026

@JohannesGaessler JohannesGaessler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the formatting: my preference is for there to be a visual distinction between the () and {} blocks of a conditional statement but I don't feel particularly strongly about it either way.

Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated
// try and fuse nodes and return the number of nodes to skip
static int ggml_cuda_try_fuse(ggml_backend_cuda_context * cuda_ctx, ggml_cgraph * cgraph, int i) {

static bool disable_fusion = getenv("GGML_CUDA_DISABLE_FUSION") != nullptr && std::atoi(getenv("GGML_CUDA_DISABLE_FUSION")) == 1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely would not expect you to refactor environment variables in this PR but my preference would be to have the same semantics as for the truth values of integers in C/C++. Meaning that a value of 0 is false and all other values are true. With this implementation a value of 2 would be evaluated as false.

Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated
Comment thread ggml/src/ggml-cuda/ggml-cuda.cu
Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated
@am17an am17an merged commit 3142f1d into ggml-org:master Apr 29, 2026
47 checks passed
@am17an am17an deleted the cuda-fusion-detect-dispatch branch April 29, 2026 08:19
cnsiva added a commit to saas-home/llama.cpp that referenced this pull request May 1, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* ggml-cuda: refactor fusion code

* apply formatting + make env variable truthy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants