ggml-alloc : make gallocr prefer chunks that allow memory reuse #16788

Acly · 2025-10-26T18:25:56Z

Small improvement to graph allocation with multiple buffers/chunks:

In the case where a tensor is allocated, and no free block fits, the current implementation allocates additional memory in the first chunk that can fit the tensor into the max size. The last block can contain both reusable (previously allocated then freed) memory, as well as memory not allocated yet. This PR prioritizes chunks with reusable memory that fits the tensor to reduce total allocation size.

See #16759 for an example.

Vulkan compute buffer size for llama-bench --model llama-2-7b.Q4_0.gguf --n-gpu-layers 19 --ubatch-size 512:

n-prompt	master	PR
`--n-prompt 12200`	1003.88 MiB	1003.88 MiB
`--n-prompt 12500`	1711.19 MiB	1026.94 MiB
`--n-prompt 13500`	1844.88 MiB	1106.38 MiB
`--n-prompt 14500`	1188.38 MiB	1188.38 MiB
`--n-prompt 15500`	1267.81 MiB	1267.81 MiB

I tested some other models and they show similar behavior around the 1024 MiB threshold.

…-org#16788)

ggml-alloc : make gallocr prefer chunks that allow memory reuse

90c2f0c

Acly requested a review from slaren as a code owner October 26, 2025 18:25

Acly mentioned this pull request Oct 26, 2025

Vulkan: Odd compute buffer behaviors at specific context breakpoints version b6568 and above #16759

Closed

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 26, 2025

slaren approved these changes Oct 26, 2025

View reviewed changes

slaren linked an issue Oct 26, 2025 that may be closed by this pull request

Vulkan: Odd compute buffer behaviors at specific context breakpoints version b6568 and above #16759

Closed

slaren merged commit 3470a5c into ggml-org:master Oct 26, 2025
72 checks passed

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 27, 2025

ggml-alloc : make gallocr prefer chunks that allow memory reuse (ggml…

3838596

…-org#16788)

theo77186 pushed a commit to theo77186/llama.cpp that referenced this pull request Oct 28, 2025

ggml-alloc : make gallocr prefer chunks that allow memory reuse (ggml…

8ff7b75

…-org#16788)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-alloc : make gallocr prefer chunks that allow memory reuse #16788

ggml-alloc : make gallocr prefer chunks that allow memory reuse #16788

Uh oh!

Acly commented Oct 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-alloc : make gallocr prefer chunks that allow memory reuse #16788

ggml-alloc : make gallocr prefer chunks that allow memory reuse #16788

Uh oh!

Conversation

Acly commented Oct 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants