Skip to content

sampling : remove sampling branching in output_reserve#18811

Merged
danbev merged 6 commits into
ggml-org:masterfrom
danbev:sampling-output-reserve
Jan 28, 2026
Merged

sampling : remove sampling branching in output_reserve#18811
danbev merged 6 commits into
ggml-org:masterfrom
danbev:sampling-output-reserve

Conversation

@danbev
Copy link
Copy Markdown
Member

@danbev danbev commented Jan 13, 2026

This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch.

The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
@danbev danbev force-pushed the sampling-output-reserve branch from 9d9be8b to 4c6dcc4 Compare January 20, 2026 11:44
@danbev
Copy link
Copy Markdown
Member Author

danbev commented Jan 26, 2026

@ggerganov When you get a chance could you take a look at this and see if this is what you had in mind?

Comment thread src/llama-context.cpp Outdated
This commit always allocates backend buffers for all sequences if
backend samplers have been configured in the context regardless if the
current batch has any sequences that require backend sampling or not.
This commit adds a new function `needs_cpu_logits` to determine whether
CPU logits are needed for sampling. This avoids unnecessary copying of
logits when all sequences use backend sampling.
Comment thread src/llama-context.cpp Outdated
Comment thread src/llama-context.cpp Outdated
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be OK to merge after the CI pass

@danbev danbev merged commit eef375c into ggml-org:master Jan 28, 2026
77 of 78 checks passed
@danbev danbev deleted the sampling-output-reserve branch January 28, 2026 04:59
shaofeiqi pushed a commit to qualcomm/llama.cpp that referenced this pull request Feb 6, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants