sampling : remove sampling branching in output_reserve by danbev · Pull Request #18811 · ggml-org/llama.cpp

danbev · 2026-01-13T15:10:18Z

This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch.

The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.

This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch. The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.

danbev · 2026-01-26T06:50:42Z

@ggerganov When you get a chance could you take a look at this and see if this is what you had in mind?

This reverts commit 4c6dcc4.

This commit always allocates backend buffers for all sequences if backend samplers have been configured in the context regardless if the current batch has any sequences that require backend sampling or not.

This commit adds a new function `needs_cpu_logits` to determine whether CPU logits are needed for sampling. This avoids unnecessary copying of logits when all sequences use backend sampling.

ggerganov

Should be OK to merge after the CI pass

* sampling : remove sampling branching in output_reserve This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch. The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.

danbev requested a review from ggerganov as a code owner January 13, 2026 15:10

loci-dev mentioned this pull request Jan 13, 2026

UPSTREAM PR #18811: sampling : remove sampling branching in output_reserve auroralabs-loci/llama.cpp#907

Open

danbev marked this pull request as draft January 14, 2026 12:39

danbev force-pushed the sampling-output-reserve branch from 58b5299 to 9d9be8b Compare January 15, 2026 05:58

danbev marked this pull request as ready for review January 15, 2026 06:47

danbev mentioned this pull request Jan 15, 2026

llama : remove write/read of output ids/logits/embeddings #18862

Merged

loci-dev mentioned this pull request Jan 15, 2026

UPSTREAM PR #18862: sampling : add support for saving/loading backend sampling state auroralabs-loci/llama.cpp#933

Open

danbev force-pushed the sampling-output-reserve branch from 9d9be8b to 4c6dcc4 Compare January 20, 2026 11:44

ggerganov reviewed Jan 26, 2026

View reviewed changes

Comment thread src/llama-context.cpp Outdated

danbev added 3 commits January 26, 2026 10:08

Revert "sampling : remove sampling branching in output_reserve"

9fee947

This reverts commit 4c6dcc4.

sampling : remove sampling branching in output_reserve

6e61d40

This commit always allocates backend buffers for all sequences if backend samplers have been configured in the context regardless if the current batch has any sequences that require backend sampling or not.

add needs_cpu_logits function to avoid copying logits

fe91fe2

This commit adds a new function `needs_cpu_logits` to determine whether CPU logits are needed for sampling. This avoids unnecessary copying of logits when all sequences use backend sampling.

ggerganov reviewed Jan 27, 2026

View reviewed changes

Comment thread src/llama-context.cpp Outdated

ggerganov reviewed Jan 27, 2026

View reviewed changes

Comment thread src/llama-context.cpp Outdated

danbev added 2 commits January 27, 2026 11:50

rename needs_cpu_logits to needs_raw_logits

9553af0

remove batch parameter from output_reserve function

6c025b7

ggerganov approved these changes Jan 27, 2026

View reviewed changes

danbev merged commit eef375c into ggml-org:master Jan 28, 2026
77 of 78 checks passed

danbev deleted the sampling-output-reserve branch January 28, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling : remove sampling branching in output_reserve#18811

sampling : remove sampling branching in output_reserve#18811
danbev merged 6 commits into
ggml-org:masterfrom
danbev:sampling-output-reserve

danbev commented Jan 13, 2026

Uh oh!

danbev commented Jan 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danbev commented Jan 13, 2026

Uh oh!

danbev commented Jan 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants