llama: fit ctx size for CPU only by JohannesGaessler · Pull Request #21568 · ggml-org/llama.cpp

JohannesGaessler · 2026-04-07T15:58:10Z

Alternative to #19711 (comment) .

I think the correct way to reduce context size for CPU-only builds is to accumulate the host buffer types and to compare those vs. total system memory. This PR is currently only partially tested (and thus a draft) because I don't have a convenient combination of model and system memory sizes ready.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

JohannesGaessler · 2026-04-07T16:03:41Z

Fixes #19646 .

taronaeo · 2026-04-08T05:10:20Z

I'll test it in a few days and get back once I have the results. Thanks for taking a look at it! :)

taronaeo

Tested on 2 GB and 32 GB memory systems respectively. Both working as intended.

JohannesGaessler requested a review from taronaeo April 7, 2026 15:58

taronaeo approved these changes Apr 13, 2026

View reviewed changes

JohannesGaessler marked this pull request as ready for review April 13, 2026 14:33

JohannesGaessler requested a review from ggerganov as a code owner April 13, 2026 14:33

ggerganov approved these changes Apr 14, 2026

View reviewed changes

llama: fit ctx size for CPU only

893ab8e

JohannesGaessler force-pushed the llama-fit-cpu-only branch from a9033b1 to 893ab8e Compare April 17, 2026 19:52

taronaeo approved these changes Apr 18, 2026

View reviewed changes

JohannesGaessler merged commit fd1c0ec into ggml-org:master Apr 18, 2026
49 of 51 checks passed

taronaeo mentioned this pull request Apr 18, 2026

llama: allow llama_params_fit to account host memory information for context size reduction #19711

Closed

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request Apr 19, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

bb67836

mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

eb6ddc1

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

fed737b

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

e6f758a

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

5d98e07

jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

bb2b4cd

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

llama: fit ctx size for CPU only (ggml-org#21568)

2bcef62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: fit ctx size for CPU only#21568

llama: fit ctx size for CPU only#21568
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
JohannesGaessler:llama-fit-cpu-only

JohannesGaessler commented Apr 7, 2026

Uh oh!

JohannesGaessler commented Apr 7, 2026

Uh oh!

taronaeo commented Apr 8, 2026

Uh oh!

taronaeo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JohannesGaessler commented Apr 7, 2026

Requirements

Uh oh!

JohannesGaessler commented Apr 7, 2026

Uh oh!

taronaeo commented Apr 8, 2026

Uh oh!

taronaeo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants