Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : fix defrag bugs + add parameter #5735

Merged
merged 5 commits into from
Feb 27, 2024
Merged

llama : fix defrag bugs + add parameter #5735

merged 5 commits into from
Feb 27, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Feb 26, 2024

fix #3380

KV cache defragmentation can be done in 2 ways:

  • on demand by the user code via llama_kv_cache_defrag()
  • automatically when a fragmentation threshold is exceeded

float defrag_thold; // defragment the KV cache if holes/size > thold, < 0 disabled (default)

Examples:

# parallel without defragmentation enabled
./parallel -m ./models/llama-7b-v2/ggml-model-f16.gguf -n 128 -ngl 99 -c 2048 -s 1 -np 8 -ns 128 -cb

# with defragmentation enabled (thold = 10%)
./parallel -m ./models/llama-7b-v2/ggml-model-f16.gguf -n 128 -ngl 99 -c 2048 -s 1 -np 8 -ns 128 -cb -dt 0.1

@ggerganov ggerganov changed the title llama : fix defrag bugs + enable by default llama : fix defrag bugs + add parameter Feb 26, 2024
@ggerganov ggerganov marked this pull request as ready for review February 26, 2024 16:56
@ggerganov ggerganov merged commit 9d533a7 into master Feb 27, 2024
27 of 29 checks passed
@ggerganov ggerganov deleted the gg/defrag branch February 27, 2024 12:35
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* llama : fix defrag bugs + enable by default

ggml-ci

* llama : add defrag_thold parameter

ggml-ci

* llama : cont

* llama : disable log message

ggml-ci

* llama : fix graph size check during defrag
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* llama : fix defrag bugs + enable by default

ggml-ci

* llama : add defrag_thold parameter

ggml-ci

* llama : cont

* llama : disable log message

ggml-ci

* llama : fix graph size check during defrag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama : mitigate KV cache fragmentation
1 participant