spec : discard last drafted token with low prob by ggerganov · Pull Request #22506 · ggml-org/llama.cpp

ggerganov · 2026-04-29T08:50:46Z

Overview

In the majority of drafts, the last token is low-prob. For non-recurrent models, it's not a big issue to keep that token. But for recurrent models, this causes the logic to restore checkpoints and re-evaluate the same draft (minus the last token) quite often.

This PR simply discards the low-prob token from the draft. Should result in significant improvement for draft-based speculative decoding for recurrent models due to having to restore the speculative checkpoint less often.

Also fix the stats for drafted/accepted number of tokens - on master, we always incorrectly report 100% acceptance rate.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

* 'master' of github.com:tekintian/llama.cpp: (659 commits) ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464) Update llama-mmap to use ftello/fseeko (ggml-org#22497) common : check for null getpwuid in hf-cache (ggml-org#22550) vulkan: add get/set tensor 2d functions (ggml-org#22514) spec: fix argument typo (ggml-org#22552) ci : bump ty to 0.0.33 (ggml-org#22535) vendor : update cpp-httplib to 0.43.2 (ggml-org#22548) CUDA: fix tile FA kernel on Pascal (ggml-org#22541) scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513) add fast matmul iquants (ggml-org#22504) spec : fix draft model checkpoints (ggml-org#22521) spec : fix vocab compat checks in spec example (ggml-org#22426) common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488) hexagon: make vmem and buffer-size configurable (ggml-org#22487) CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478) spec : disacard last drafted token with low prob (ggml-org#22506) sync : ggml ggml : bump version to 0.10.1 (ggml/1469) webui: fix slow mic stop and WAV encode (ggml-org#22480) ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293) ... # Conflicts: # .gitignore

spec : disacard last drafted token with low prob

7288f7b

ggerganov requested review from a team as code owners April 29, 2026 08:50

ServeurpersoCom approved these changes Apr 29, 2026

View reviewed changes

ngxson approved these changes Apr 29, 2026

View reviewed changes

github-actions Bot added examples server labels Apr 29, 2026

ggerganov merged commit 683c5ac into master Apr 29, 2026
46 checks passed

ggerganov deleted the gg/spec-draft-discard-low-prob-token branch April 29, 2026 14:00

ggerganov mentioned this pull request Apr 30, 2026

llama: allow partial seq_rm for GDN models for speculative decoding #22400

Draft

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

spec : disacard last drafted token with low prob (ggml-org#22506)

2f7155a

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026

spec : disacard last drafted token with low prob (ggml-org#22506)

f79f9a5

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

spec : disacard last drafted token with low prob (ggml-org#22506)

c13b8dd

meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026

spec : disacard last drafted token with low prob (ggml-org#22506)

b128fdc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec : discard last drafted token with low prob#22506

spec : discard last drafted token with low prob#22506
ggerganov merged 1 commit into
masterfrom
gg/spec-draft-discard-low-prob-token

ggerganov commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Apr 29, 2026 •

edited

Loading