implement adaptive-p sampler by ddh0 · Pull Request #17927 · ggml-org/llama.cpp

ddh0 · 2025-12-11T04:23:13Z

This PR implements a new sampler called adaptive-p that selects tokens near a configurable target probability over time.

How it works

The adaptive-p sampler transforms the token probability distribution to favor tokens that fall near a user-configurable probability target. Internally, the sampler maintains an exponential moving average of the original probabilities of selected tokens. It uses this, along with the user's set target, to compute an adapted target at each sampling step, steering the running average toward the configured target over time. If recent selections have been higher-probability than target, the sampler compensates by temporarily favoring lower-probability tokens, and vice versa.

Parameters

This sampler exposes two parameters:

Parameter name	Description	CLI argument	Valid range	Default value	Notes
`target`	Select tokens near this probability	`--adaptive-target N`	0.0 - 1.0	-1.0	When set to a negative number, the adaptive probability transform is disabled, and instead it just samples normally. Note that since the default value is -1.0, the sampler is disabled by default. This is intentional.
`decay`	Decay value for exponential moving average - lower values are more reactive, higher values are more stable	`--adaptive-decay N`	0.0 - 0.99	0.90	Clamped to <=0.99 at init to avoid unbounded accumulation

In most cases, you can just play with --adaptive-target. The default decay of 0.9 (for a ~10 token history) works well. A good starting value is --adaptive-target 0.55. It is suggested to raise or lower the target in increments of 0.05 as needed.

Usage notes

adaptive-p selects a token ID rather than just mutating candidates, so it must be last in the sampler chain. It shares this behaviour with some existing samplers like mirostat, dist, and greedy (mirostat being the closest relative).

Only mild truncation before this sampler is recommended. We suggest applying min-p before adaptive-p as the only other active sampler in the chain (optionally with top-k as well).

Example usage:

./build/bin/llama-server -m ~/gguf/my-model.gguf --samplers "top-k;min-p;adaptive-p" --top-k 128 --min-p 0.05 --adaptive-target 0.55

Other notes

This sampler was previously called "power law" in earlier versions of this PR, named for the power law transform we were applying to logits. We are no longer applying the power law transform. We also experimented with gaussian, but ultimately settled on the current formula.

Acknowledgements

@MrJackSpade - original "power law" sampler idea and implementation as can be seen here - they continued to help guide me as I worked through the implementation of this sampler. They deserve an enormous! amount of credit for the work they put in on this.
@Geechan for advice on sampler parameters and collaboration during development
@AesSedai for testing, collaboration, and hosting, as well as coming up with the final name "adaptive-p"

ddh0 · 2025-12-12T02:55:30Z

Nevermind, sorry, I think we want to do a little more testing. I'm going to mark this as draft again temporarily.

pnb

This looks very interesting! I wish the original compared to XTC, since the goals seem highly similar.

As an aside, I am curious if there is some way to make it work without selecting a token (i.e., only steps 1-3). I see why token selection is necessary, given the need to save the original probability to the history for the adaptive adjustment part. But, for example, maybe it would suffice instead to save the original probability of the highest-probability token after transforming, regardless of which one is eventually selected by a downstream sampler.

pnb · 2025-12-12T18:40:14Z

src/llama-sampling.cpp

+
+    // fixed power law transform parameters (from original implementation)
+    const float distribution_width = 0.2f;
+    const float peak_logit_value   = 3.0f;


Should these parameters be configurable like in the original implementation? There is probably a tradeoff with feature creep, having too many options for users to control, but some of these seem potentially important (especially distribution_width). Also, I noticed peak_logit_value is outside the range suggested in the original implementation; is that intentional?

Myself and the original author are discussing the parameters over the next few days, I agree that the current implementation is probably not ideal, which is why I marked it back as draft.

I will post a comment in the main thread with an update once we've got it more figured out. Thank you!

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

Beinsezii · 2026-01-12T23:51:10Z

thing I worry is someone else will pick a different magic number than 1e9 in the future and then adaptive-p breaks again. that's why I think adaptive-p should calculate its own zero point against -inf or -1e9 or whatever instead of using a second softmax

ddh0 · 2026-01-12T23:58:04Z

thing I worry is someone else will pick a different magic number than 1e9 in the future and then adaptive-p breaks again. that's why I think adaptive-p should calculate its own zero point against -inf or -1e9 or whatever.

Yes, I'm not sure. -1e9f is fine for now as far as I'm aware but I'll wait for @ggerganov to give his input on this. Maybe we can have a file-level constant for this threshold.

ggerganov · 2026-01-13T08:53:49Z

Haven't tested this, but can we avoid the 1e9 constant and do something like this instead:

diff --git a/src/llama-sampling.cpp b/src/llama-sampling.cpp
index 11f0394c4..19c5c419e 100644
--- a/src/llama-sampling.cpp
+++ b/src/llama-sampling.cpp
@@ -1495,6 +1495,10 @@ static void llama_sampler_top_p_backend_apply(
     struct ggml_tensor * mask = ggml_step(ctx, cdf_scaled);
     ggml_set_name(mask, "top_p_mask");
 
+    // mask_inf = (mask - 1) * -INFINITY
+    struct ggml_tensor * mask_inf = ggml_scale(ctx, ggml_scale_bias(ctx, mask, 1.0f, -1.0f), -INFINITY);
+    ggml_set_name(mask, "top_p_mask_inf");
+
     // Taking the sum of the mask gives us the sum of elements after the threshold
     // we are interested in.
     struct ggml_tensor * idxf = ggml_sum(ctx, mask);
@@ -1517,8 +1521,9 @@ static void llama_sampler_top_p_backend_apply(
     // top_p_bias = (mask * 1e9f) - 1e9f.
     // So entries in the mask that we want to discard will become -1e9f, and
     // others will be 0 (meaning that will not effect the logits).
-    const float large_val = 1e9f;
-    struct ggml_tensor * top_p_bias = ggml_scale_bias(ctx, mask, large_val, -large_val);
+    //const float large_val = 1e9f;
+    //struct ggml_tensor * top_p_bias = ggml_scale_bias(ctx, mask, large_val, -large_val);
+    struct ggml_tensor * top_p_bias = ggml_add(ctx, mask, mask_inf);
     ggml_set_name(top_p_bias, "top_p_bias");
 
     data->logits = ggml_add(ctx, sorted_logits, top_p_bias);

Would this help?

ddh0 · 2026-01-13T17:19:33Z

Would this help?

Thank you! That code specifically didn't work but it was enough to point me in the right direction, and adaptive-p now plays nicely with both CPU and backend sampling, without any more magic numbers. PTAL.

ggerganov · 2026-01-14T06:33:27Z

Nice, this looks like a better solution. Reviewing

ggerganov

After addressing the minor comments we can merge

ggerganov · 2026-01-14T06:40:10Z

src/llama-sampling.cpp

+    auto * ctx = (llama_sampler_adaptive_p *) smpl->ctx;
+
+    if (ctx->target < 0.0f) {
+        // at negative target values, adaptive-p is no-op
+        // we simply sample from the existing distribution
+        llama_sampler_softmax_impl(cur_p, false);
+        cur_p->selected = llama_sample_dist(cur_p, ctx->rng);
+        return;
+    }
+
+    // softmax and store the original probabilities
+    llama_sampler_softmax_impl(cur_p, false);
+    ctx->original_probs.resize(cur_p->size);


The llama_sampler_softmax_impl(cur_p, false); call can be deduplicated here - call once before the if

ggerganov · 2026-01-14T06:42:24Z

src/llama-sampling.cpp

+    float    decay,
+    uint32_t seed
+) {
+    auto seed_cur = get_rng_seed(seed);


This seed_cur logic breaks when the seed is LLAMA_DEFAULT_SEED. In that case we generate a random seed based on the time. But with this implementation, if you clone the sampler, it will inherit the "seed_cur" of the source sampler, instead of generating a new seed.

See how the dist sampler maintains a seed_cur in its context and replicate the logic here.

ddh0 · 2026-01-14T12:25:32Z

Latest commit addresses both review comments. I added a seed_cur member to the struct and clone the seed member now like dist does.

z80maniac · 2026-01-14T16:37:42Z

I think this sampler needs to be described in the server's README. Otherwise the users will have no idea how to use it or that it even exists in the first place.

ddh0 · 2026-01-14T16:49:54Z

Hmm, is there not a README that explains all the samplers somewhere? Since the samplers are not just used for llama-server but also llama-cli and exposed in the shared library.

z80maniac · 2026-01-14T17:10:29Z

It's more about describing the server API for that sampler. For example, that the users need to use adaptive_target and adaptive_decay JSON params in the /completion endpoint and that the values must be of a certain type (float) and within a certain range. And also, what the default values are. All other samplers are described somewhat like this in the server's README.

The CLI README also describes all the CLI arguments separately.

I have noticed that more expanded descriptions of the samplers can be found at tools/completion/README.md. Maybe the full descriptions of this sampler can be written in there.

ddh0 · 2026-01-14T17:25:18Z

I gotcha. Latest commit adds docs to server, CLI, and completion.

ddh0 · 2026-01-14T17:31:51Z

Should be ready to go @ggerganov 🙏 thanks for the review

ggerganov · 2026-01-14T18:28:51Z

src/llama-sampling.cpp

+
+static void llama_sampler_adaptive_p_reset(struct llama_sampler * smpl) {
+    auto * ctx = (llama_sampler_adaptive_p *) smpl->ctx;
+    // ctx->target and ctx->decay never change after init, so it's safe to keep them as is.
+    // original_probs is completely overwritten on every call to _apply.
+    // so we only need to reset the EMA state and pending token.
+    ctx->weighted_sum      = ctx->target / (1.0f - ctx->decay);
+    ctx->total_weight      = 1.0f / (1.0f - ctx->decay);
+    ctx->pending_token_id  = LLAMA_TOKEN_NULL;
+    ctx->pending_token_idx = -1;
+}
+


Should apply the same logic for the RNG seeds here as in the dist reset:

llama.cpp/src/llama-sampling.cpp

Lines 1111 to 1117 in 516a4ca

static void llama_sampler_dist_reset(struct llama_sampler * smpl) {

auto * ctx = (llama_sampler_dist *) smpl->ctx;

ctx->seed_cur = get_rng_seed(ctx->seed);

ctx->rng.seed(ctx->seed_cur);

}

Oops. I addressed this in 40fd48f just now

common/sampling.cpp

ddh0 · 2026-01-15T16:57:49Z

Please excuse my ping @ggerganov - is there anything else needed on my end to push this over the finish line?

* initial commit for branch * simplify constants * add params to `struct common_params_sampling`, add reference to PR * explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]` * add args, rename `queue_size` -> `window_size` * improved comments * minor * remove old unused code from algorithm * minor * add power law case to `common_sampler_init`, add sampler name mappings * clarify behaviour when `window_size = 0` * add missing enums * remove `target_range` param, make `target == 1` no-op, cleanup code * oops, straggler * add missing parameters in `server-task.cpp` * copy from author ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069 * remove old debug log, style nit * fix compiler warning, add commented-out logging per token * re-write + change parameters + simplify * oops forgot args.cpp * fix leftover `window_size` * add missing values to `common_params_sampling::print()` * with logging * does this fix it? * no, but does this? * update default decay * optimize * fix bad merge my git skills are lacking * silence `missing initializer for member` * update default decay to 0.9 * fix logging * format (double) * add power law to the new `samplers` vector * log sampler init values * improve logging messages in llama_sampler_power_law * remove extraneous logging * simplify target computation last commit with debug logging! * remove debug logging, explicitly clamp params at init * add `use_power_law` flag + logic, minor cleanup * update `power-law` -> `adaptive-p` * fix cold start EMA - `ctx->weighted_sum` is now initialized and reset to `target / (1.0f - clamped_decay)` - `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f - clamped_decay)` this fixes a "cold start" problem with the moving average * update `SHARPNESS` constant to `10.0f` * minor style fixes no functional changes * minor style fixes cont. * update `llama_sampler_adaptive_p_i` for backend sampling (ref: ggml-org#17004) * separate into `apply` + `accept` functions * `pending_token_idx`: switch from `llama_token` to `int32` functionally identical (`llama.h` has `typedef int32_t llama_token;`), but its more correct now * don't transform logits <= -1e9f * fix masking in backend top-p, min-p * address review comments * typo in comments `RND` -> `RNG` * add docs * add recommended values in completion docs * address PR feedback * remove trailing whitespace (for CI `editorconfig`) * add to adaptive-p to `common_sampler_types_from_chars`

ddh0 added 2 commits December 10, 2025 22:13

initial commit for branch

774cf23

simplify constants

5ab4ff7

loci-dev mentioned this pull request Dec 11, 2025

UPSTREAM PR #17927: implement Power Law sampling auroralabs-loci/llama.cpp#522

Open

ddh0 and others added 11 commits December 11, 2025 12:52

Merge branch 'ggml-org:master' into power-law-sampler

66e2d17

add params to struct common_params_sampling, add reference to PR

88fb0f3

explicitly clamp min_target and max_target to [0.0, 1.0]

374bfd4

add args, rename queue_size -> window_size

ffe1639

improved comments

4959878

minor

f3457a8

remove old unused code from algorithm

9316959

minor

b3aea57

add power law case to common_sampler_init, add sampler name mappings

cd7de7c

clarify behaviour when window_size = 0

534cb4f

add missing enums

dcada03

This comment was marked as outdated.

Sign in to view

ddh0 marked this pull request as ready for review December 11, 2025 23:59

ddh0 requested a review from ggerganov as a code owner December 11, 2025 23:59

ddh0 marked this pull request as draft December 12, 2025 02:55

ddh0 added 2 commits December 11, 2025 22:43

remove target_range param, make target == 1 no-op, cleanup code

2d62bbe

oops, straggler

5c78b79

pnb reviewed Dec 12, 2025

View reviewed changes

add missing parameters in server-task.cpp

53380c1

github-actions bot added examples server labels Dec 13, 2025

ddh0 and others added 5 commits December 12, 2025 23:19

copy from author

94cb883

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

remove old debug log, style nit

0a19a3f

fix compiler warning, add commented-out logging per token

824bb3a

Merge branch 'ggml-org:master' into power-law-sampler

1879fc6

Merge branch 'ggml-org:master' into power-law-sampler

67a7336

ddh0 and others added 2 commits January 13, 2026 09:52

Merge branch 'ggml-org:master' into power-law-sampler

1eff502

fix masking in backend top-p, min-p

d21c87e

Merge branch 'ggml-org:master' into power-law-sampler

4b92e3a

ggerganov approved these changes Jan 14, 2026

View reviewed changes

address review comments

33c635e

typo in comments RND -> RNG

4b06e08

add docs

42af39d

add recommended values in completion docs

81af54c

ggerganov reviewed Jan 14, 2026

View reviewed changes

ddh0 added 2 commits January 14, 2026 12:30

address PR feedback

40fd48f

remove trailing whitespace (for CI editorconfig)

b6041b1

timwu reviewed Jan 15, 2026

View reviewed changes

common/sampling.cpp Show resolved Hide resolved

ddh0 and others added 2 commits January 15, 2026 10:52

Merge branch 'ggml-org:master' into power-law-sampler

f222e17

add to adaptive-p to common_sampler_types_from_chars

d7e3b86

ggerganov merged commit 13f1e4a into ggml-org:master Jan 15, 2026
69 of 73 checks passed

ddh0 deleted the power-law-sampler branch February 20, 2026 18:17

wallentri88 mentioned this pull request Feb 24, 2026

Eval bug: qwen35 and qwen35moe graph split issues (Severe PP impact, crashes) #19864

Closed


	static void llama_sampler_dist_reset(struct llama_sampler * smpl) {
	auto * ctx = (llama_sampler_dist *) smpl->ctx;
	ctx->seed_cur = get_rng_seed(ctx->seed);
	ctx->rng.seed(ctx->seed_cur);
	}

Conversation

ddh0 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Parameters

Usage notes

Example usage:

Other notes

Acknowledgements

Uh oh!

This comment was marked as outdated.

ddh0 commented Dec 12, 2025

Uh oh!

pnb left a comment

Choose a reason for hiding this comment

Uh oh!

pnb Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ddh0 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Beinsezii commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddh0 commented Jan 12, 2026

Uh oh!

ggerganov commented Jan 13, 2026

Uh oh!

ddh0 commented Jan 13, 2026

Uh oh!

ggerganov commented Jan 14, 2026

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ddh0 commented Jan 14, 2026

Uh oh!

z80maniac commented Jan 14, 2026

Uh oh!

ddh0 commented Jan 14, 2026

Uh oh!

z80maniac commented Jan 14, 2026

Uh oh!

ddh0 commented Jan 14, 2026

Uh oh!

ddh0 commented Jan 14, 2026

Uh oh!

ggerganov Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ddh0 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ddh0 commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

ddh0 commented Dec 11, 2025 •

edited

Loading

Beinsezii commented Jan 12, 2026 •

edited

Loading