spec : update CLI arguments for better consistency by ggerganov · Pull Request #22964 · ggml-org/llama.cpp

ggerganov · 2026-05-12T08:33:36Z

Overview

All speculative types that involve drafting from some sort of a model are now named draft-[type]. For example: draft-simple, draft-mtp, draft-eagle3, etc.
They share the same common_params_speculative_draft params
The --spec-draft-* CLI arguments apply to any of the draft-* types

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

am17an · 2026-05-12T08:36:39Z

    std::vector<enum common_speculative_type> types = { COMMON_SPECULATIVE_TYPE_NONE };

+    // used by Simple, MTP, Eagle3, etc. - all methods that require some kind of draft model
    common_params_speculative_draft draft;


I'm wondering if this should be more a flexible type, I guess it's made for the "simple draft" case

It's not a problem to extend it as needed. Anything in mind specific?

Atm, all of the parameters seem applicable to MTP, Eagle, etc.

You're right, I was thinking the cache type wouldn't be applicable to MTP or Eagle but it is required

candrews · 2026-05-12T13:42:15Z

Could the CLI arguments for llama-bench also be harmonized in the same way to address #22947 ?

am17an · 2026-05-12T14:26:21Z

Could the CLI arguments for llama-bench also be harmonized in the same way to address #22947 ?

That unfortunately is a larger refactor, I plan to do it soon

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better consistency") renamed the speculative type enum values: COMMON_SPECULATIVE_TYPE_DRAFT -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3 and the registered name strings flipped from underscore- to dash- separated form (e.g. ngram_simple -> ngram-simple), with the bare draft/eagle3 aliases replaced by draft-simple/draft-eagle3. This broke the build with the new LLAMA_VERSION on every variant (vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461. Update the upstream branch of the speculative-type fallback to use the new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the old name), and normalize spec_type option tokens before passing them to common_speculative_types_from_names so existing model configs that say spec_type:draft / spec_type:ngram_simple keep working. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7

…dfee071c332` (#9809) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): adapt to upstream COMMON_SPECULATIVE_TYPE_DRAFT rename ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better consistency") renamed the speculative type enum values: COMMON_SPECULATIVE_TYPE_DRAFT -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3 and the registered name strings flipped from underscore- to dash- separated form (e.g. ngram_simple -> ngram-simple), with the bare draft/eagle3 aliases replaced by draft-simple/draft-eagle3. This broke the build with the new LLAMA_VERSION on every variant (vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461. Update the upstream branch of the speculative-type fallback to use the new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the old name), and normalize spec_type option tokens before passing them to common_speculative_types_from_names so existing model configs that say spec_type:draft / spec_type:ngram_simple keep working. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

Add two pure-additive C API accessors so language bindings (Go / Rust / Python) can detect whether a loaded model has Multi-Token Prediction heads without spelunking through internal hparams: LLAMA_API bool llama_model_has_mtp (const struct llama_model *); LLAMA_API uint32_t llama_model_n_mtp_layers(const struct llama_model *); Both read the existing hparams.nextn_predict_layers field that the qwen35moe loader (and other MTP-aware model loaders) populate from the GGUF metadata. Returns 0 / false on non-MTP models. No behavior change for existing callers — these are accessors, not mutators. The MTP activation mechanism (setting llama_context_params.ctx_type = LLAMA_CONTEXT_TYPE_MTP and wiring up the common_speculative_* chain) remains opt-in exactly as merged in PR ggml-org#22964. Motivation: high-level bindings around llama.cpp construct contexts with a default ctx_type and have no easy way to decide "should I switch to MTP mode for this model?" — the nextn_predict_layers field isn't part of the public C API surface today. Exposing these helpers lets bindings add an opt-in WithAutoMTP option without forking the model struct.

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

spec : update CLI arguments for better consistency

a0bf006

ggerganov requested review from a team and ngxson as code owners May 12, 2026 08:33

ggerganov requested a review from am17an May 12, 2026 08:34

am17an reviewed May 12, 2026

View reviewed changes

github-actions Bot added examples server labels May 12, 2026

cont : fix CLI arg message

a5a34d3

ServeurpersoCom approved these changes May 12, 2026

View reviewed changes

pwilkin approved these changes May 12, 2026

View reviewed changes

ggerganov merged commit 634275f into master May 13, 2026
46 checks passed

ggerganov deleted the gg/spec-cli-args branch May 13, 2026 06:15

xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 13, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

0eb6c57

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

32b6fc5

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

a1ed2d2

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

51f54b9

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

eabc95d

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

spec : update CLI arguments for better consistency (ggml-org#22964)

5a2cd91

* spec : update CLI arguments for better consistency * cont : fix CLI arg message

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec : update CLI arguments for better consistency#22964

spec : update CLI arguments for better consistency#22964
ggerganov merged 2 commits into
masterfrom
gg/spec-cli-args

ggerganov commented May 12, 2026

Uh oh!

am17an May 12, 2026

Uh oh!

ggerganov May 12, 2026

Uh oh!

am17an May 12, 2026 •

edited

Loading

Uh oh!

candrews commented May 12, 2026

Uh oh!

am17an commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ggerganov commented May 12, 2026

Overview

Requirements

Uh oh!

am17an May 12, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov May 12, 2026

Choose a reason for hiding this comment

Uh oh!

am17an May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

candrews commented May 12, 2026

Uh oh!

am17an commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

am17an May 12, 2026 •

edited

Loading