GGML_CUDA_ENABLE_UNIFIED_MEMORY=1　 behavior is strange. #1720

Enchante503 · 2024-08-31T18:55:47Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Prioritize use of VRAM, and start using shared memory when memory is exceeded
and
Fast inference

Current Behavior

export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1
When you use this option, RAM will be used first instead of VRAM.
Also, the specified GPU will not be used first.
llama_print_timings: total time = 56361.73 ms / 45 tokens

Hiding the option makes it super fast
llama_print_timings: total time = 40.95 ms / 143 tokens

Environment and Context

Windows11 WSL2 Ubuntu 22.04.4 LTS
CUDA12.1

Python 3.10.11
GNU Make 4.3     x86_64-pc-linux-gnu
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1　 behavior is strange. #1720

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1　 behavior is strange. #1720

Enchante503 commented Aug 31, 2024

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 behavior is strange. #1720

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 behavior is strange. #1720

Comments

Enchante503 commented Aug 31, 2024

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1　 behavior is strange. #1720

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1　 behavior is strange. #1720