Skip to content

spec : update CLI arguments for better consistency#22964

Merged
ggerganov merged 2 commits into
masterfrom
gg/spec-cli-args
May 13, 2026
Merged

spec : update CLI arguments for better consistency#22964
ggerganov merged 2 commits into
masterfrom
gg/spec-cli-args

Conversation

@ggerganov
Copy link
Copy Markdown
Member

Overview

  • All speculative types that involve drafting from some sort of a model are now named draft-[type]. For example: draft-simple, draft-mtp, draft-eagle3, etc.
  • They share the same common_params_speculative_draft params
  • The --spec-draft-* CLI arguments apply to any of the draft-* types

Requirements

@ggerganov ggerganov requested review from a team and ngxson as code owners May 12, 2026 08:33
@ggerganov ggerganov requested a review from am17an May 12, 2026 08:34
Comment thread common/common.h
std::vector<enum common_speculative_type> types = { COMMON_SPECULATIVE_TYPE_NONE };

// used by Simple, MTP, Eagle3, etc. - all methods that require some kind of draft model
common_params_speculative_draft draft;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be more a flexible type, I guess it's made for the "simple draft" case

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem to extend it as needed. Anything in mind specific?

Atm, all of the parameters seem applicable to MTP, Eagle, etc.

Copy link
Copy Markdown
Contributor

@am17an am17an May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was thinking the cache type wouldn't be applicable to MTP or Eagle but it is required

@candrews
Copy link
Copy Markdown

Could the CLI arguments for llama-bench also be harmonized in the same way to address #22947 ?

@am17an
Copy link
Copy Markdown
Contributor

am17an commented May 12, 2026

Could the CLI arguments for llama-bench also be harmonized in the same way to address #22947 ?

That unfortunately is a larger refactor, I plan to do it soon

@ggerganov ggerganov merged commit 634275f into master May 13, 2026
46 checks passed
@ggerganov ggerganov deleted the gg/spec-cli-args branch May 13, 2026 06:15
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 13, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
mudler added a commit to ci-forks/LocalAI that referenced this pull request May 13, 2026
ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better
consistency") renamed the speculative type enum values:
  COMMON_SPECULATIVE_TYPE_DRAFT  -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE
  COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3
and the registered name strings flipped from underscore- to dash-
separated form (e.g. ngram_simple -> ngram-simple), with the bare
draft/eagle3 aliases replaced by draft-simple/draft-eagle3.

This broke the build with the new LLAMA_VERSION on every variant
(vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461.

Update the upstream branch of the speculative-type fallback to use the
new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the
old name), and normalize spec_type option tokens before passing them to
common_speculative_types_from_names so existing model configs that say
spec_type:draft / spec_type:ngram_simple keep working.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
mudler added a commit to mudler/LocalAI that referenced this pull request May 14, 2026
…dfee071c332` (#9809)

* ⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama-cpp): adapt to upstream COMMON_SPECULATIVE_TYPE_DRAFT rename

ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better
consistency") renamed the speculative type enum values:
  COMMON_SPECULATIVE_TYPE_DRAFT  -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE
  COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3
and the registered name strings flipped from underscore- to dash-
separated form (e.g. ngram_simple -> ngram-simple), with the bare
draft/eagle3 aliases replaced by draft-simple/draft-eagle3.

This broke the build with the new LLAMA_VERSION on every variant
(vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461.

Update the upstream branch of the speculative-type fallback to use the
new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the
old name), and normalize spec_type option tokens before passing them to
common_speculative_types_from_names so existing model configs that say
spec_type:draft / spec_type:ngram_simple keep working.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
hsinhoyeh added a commit to hsinhoyeh/llama.cpp that referenced this pull request May 20, 2026
Add two pure-additive C API accessors so language bindings (Go / Rust
/ Python) can detect whether a loaded model has Multi-Token Prediction
heads without spelunking through internal hparams:

  LLAMA_API bool     llama_model_has_mtp     (const struct llama_model *);
  LLAMA_API uint32_t llama_model_n_mtp_layers(const struct llama_model *);

Both read the existing hparams.nextn_predict_layers field that the
qwen35moe loader (and other MTP-aware model loaders) populate from
the GGUF metadata. Returns 0 / false on non-MTP models.

No behavior change for existing callers — these are accessors, not
mutators. The MTP activation mechanism (setting
llama_context_params.ctx_type = LLAMA_CONTEXT_TYPE_MTP and wiring up
the common_speculative_* chain) remains opt-in exactly as merged in
PR ggml-org#22964.

Motivation: high-level bindings around llama.cpp construct contexts
with a default ctx_type and have no easy way to decide "should I
switch to MTP mode for this model?" — the nextn_predict_layers field
isn't part of the public C API surface today. Exposing these helpers
lets bindings add an opt-in WithAutoMTP option without forking the
model struct.
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* spec : update CLI arguments for better consistency

* cont : fix CLI arg message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants