Skip to content

Commit 1fc2dcc

Browse files
slarenJohannesGaessler
authored andcommitted
cuda : improve cuda pool efficiency using virtual memory (ggml-org#4606)
* cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <[email protected]> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <[email protected]> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <[email protected]>
1 parent aff07f0 commit 1fc2dcc

File tree

8 files changed

+1046
-1542
lines changed

8 files changed

+1046
-1542
lines changed

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,8 @@ if (LLAMA_CUBLAS)
302302
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} CUDA::cudart CUDA::cublas CUDA::cublasLt)
303303
endif()
304304

305+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} CUDA::cuda_driver)
306+
305307
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
306308
# 52 == lowest CUDA 12 standard
307309
# 60 == f16 CUDA intrinsics

0 commit comments

Comments
 (0)