spec - default MTP draft backend_sampling to off (#23903) by ssam18 · Pull Request #23921 · ggml-org/llama.cpp

ssam18 · 2026-05-30T22:07:30Z

PR #23287 turned on backend sampling by default for the MTP draft path, but the per-sequence compute buffer overhead is big enough that configs which ran cleanly on b9246 now OOM at --parallel 2 on b9426 and later. Flipping the default back to off restores the working baseline and anyone wanting backend sampling can still opt in with --spec-draft-backend-sampling. I tested locally with a CPU build and a small non-MTP model since the reporter's exact setup needs a Blackwell card, so the MTP path itself still needs someone with that config to confirm the OOM is gone.

PR ggml-org#23287 enabled backend draft sampling by default for the MTP path, attaching a per-seq_id sampler chain (top_k=10) to the draft context. This adds compute-buffer footprint that scales with n_seq, so configs that fit comfortably in VRAM at --parallel N>1 on b9246 now OOM during the first decode on b9410+ (see ggml-org#23903 for the bisect, b9246 fit two slots in 15.6 GB, b9426 needs essentially the full 16 GB for one slot under the same model and flags). Default the new behavior off so the regression does not fire on configs that worked before. Users wanting backend sampling can opt back in with --spec-draft-backend-sampling (already wired by PR ggml-org#23287). The help text auto-reflects the new default via string_format("default: %s", ... ? "enabled" : "disabled").

am17an · 2026-05-31T04:21:24Z

The poster should use --no-spec-draft-backend-sampling

am17an closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec - default MTP draft backend_sampling to off (#23903)#23921

spec - default MTP draft backend_sampling to off (#23903)#23921
ssam18 wants to merge 1 commit into
ggml-org:masterfrom
ssam18:fix/issue-23903-mtp-backend-sampling-default

ssam18 commented May 30, 2026

Uh oh!

am17an commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ssam18 commented May 30, 2026

Uh oh!

am17an commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

am17an commented May 31, 2026 •

edited

Loading