[CUDA] Increase number of output elements per-thread block if the K-dimension is small#20635
Open
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
Open
[CUDA] Increase number of output elements per-thread block if the K-dimension is small#20635gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom