ggml-cuda: refactor fusion code#22468
Conversation
| // try and fuse nodes and return the number of nodes to skip | ||
| static int ggml_cuda_try_fuse(ggml_backend_cuda_context * cuda_ctx, ggml_cgraph * cgraph, int i) { | ||
|
|
||
| static bool disable_fusion = getenv("GGML_CUDA_DISABLE_FUSION") != nullptr && std::atoi(getenv("GGML_CUDA_DISABLE_FUSION")) == 1; |
There was a problem hiding this comment.
I'm fine with doing an explicit check for the value, but it seems this is inconsistent with how GGML checks env variables in different backends and within CUDA.
For example, consider GGML_VK_DISABLE_FUSION or GGML_CUDA_DISABLE_GRAPHS.
There was a problem hiding this comment.
I think it's just generally inconsistent across the codebase, I prefer explicitly checking the value because it allows me to bench using 0,1. There are cases where it is done like that (e.g. GGML_CUDA_GRAPH_OPT, LLAMA_ATTN_ROT_DISABLE) already, so there is no convention per se.
There was a problem hiding this comment.
Yes, I agree. I don't have a strong opinion on this.
So, the PR is fine as is.
There was a problem hiding this comment.
I definitely would not expect you to refactor environment variables in this PR but my preference would be to have the same semantics as for the truth values of integers in C/C++. Meaning that a value of 0 is false and all other values are true. With this implementation a value of 2 would be evaluated as false.
JohannesGaessler
left a comment
There was a problem hiding this comment.
Regarding the formatting: my preference is for there to be a visual distinction between the () and {} blocks of a conditional statement but I don't feel particularly strongly about it either way.
| // try and fuse nodes and return the number of nodes to skip | ||
| static int ggml_cuda_try_fuse(ggml_backend_cuda_context * cuda_ctx, ggml_cgraph * cgraph, int i) { | ||
|
|
||
| static bool disable_fusion = getenv("GGML_CUDA_DISABLE_FUSION") != nullptr && std::atoi(getenv("GGML_CUDA_DISABLE_FUSION")) == 1; |
There was a problem hiding this comment.
I definitely would not expect you to refactor environment variables in this PR but my preference would be to have the same semantics as for the truth values of integers in C/C++. Meaning that a value of 0 is false and all other values are true. With this implementation a value of 2 would be evaluated as false.
This reverts commit 3142f1d.
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
* ggml-cuda: refactor fusion code * apply formatting + make env variable truthy
Overview
Refactor the fusion code to be a single function. Also fix a bug in the fusion code where it does not check the value of the env variable to disable fusion.
Additional information
Requirements