spec : update CLI arguments for better consistency#22964
Merged
Conversation
am17an
reviewed
May 12, 2026
| std::vector<enum common_speculative_type> types = { COMMON_SPECULATIVE_TYPE_NONE }; | ||
|
|
||
| // used by Simple, MTP, Eagle3, etc. - all methods that require some kind of draft model | ||
| common_params_speculative_draft draft; |
Contributor
There was a problem hiding this comment.
I'm wondering if this should be more a flexible type, I guess it's made for the "simple draft" case
Member
Author
There was a problem hiding this comment.
It's not a problem to extend it as needed. Anything in mind specific?
Atm, all of the parameters seem applicable to MTP, Eagle, etc.
Contributor
There was a problem hiding this comment.
You're right, I was thinking the cache type wouldn't be applicable to MTP or Eagle but it is required
ServeurpersoCom
approved these changes
May 12, 2026
pwilkin
approved these changes
May 12, 2026
|
Could the CLI arguments for |
Contributor
That unfortunately is a larger refactor, I plan to do it soon |
xxmustafacooTR
pushed a commit
to xxPlayground/llama-cpp-turboquant
that referenced
this pull request
May 13, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
mudler
added a commit
to ci-forks/LocalAI
that referenced
this pull request
May 13, 2026
ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better consistency") renamed the speculative type enum values: COMMON_SPECULATIVE_TYPE_DRAFT -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3 and the registered name strings flipped from underscore- to dash- separated form (e.g. ngram_simple -> ngram-simple), with the bare draft/eagle3 aliases replaced by draft-simple/draft-eagle3. This broke the build with the new LLAMA_VERSION on every variant (vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461. Update the upstream branch of the speculative-type fallback to use the new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the old name), and normalize spec_type option tokens before passing them to common_speculative_types_from_names so existing model configs that say spec_type:draft / spec_type:ngram_simple keep working. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7
mudler
added a commit
to mudler/LocalAI
that referenced
this pull request
May 14, 2026
…dfee071c332` (#9809) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): adapt to upstream COMMON_SPECULATIVE_TYPE_DRAFT rename ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better consistency") renamed the speculative type enum values: COMMON_SPECULATIVE_TYPE_DRAFT -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3 and the registered name strings flipped from underscore- to dash- separated form (e.g. ngram_simple -> ngram-simple), with the bare draft/eagle3 aliases replaced by draft-simple/draft-eagle3. This broke the build with the new LLAMA_VERSION on every variant (vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461. Update the upstream branch of the speculative-type fallback to use the new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the old name), and normalize spec_type option tokens before passing them to common_speculative_types_from_names so existing model configs that say spec_type:draft / spec_type:ngram_simple keep working. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
rsenthilkumar6
pushed a commit
to rsenthilkumar6/llama.cpp
that referenced
this pull request
May 19, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
ArberSephirotheca
pushed a commit
to ArberSephirotheca/llama.cpp
that referenced
this pull request
May 19, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
hsinhoyeh
added a commit
to hsinhoyeh/llama.cpp
that referenced
this pull request
May 20, 2026
Add two pure-additive C API accessors so language bindings (Go / Rust / Python) can detect whether a loaded model has Multi-Token Prediction heads without spelunking through internal hparams: LLAMA_API bool llama_model_has_mtp (const struct llama_model *); LLAMA_API uint32_t llama_model_n_mtp_layers(const struct llama_model *); Both read the existing hparams.nextn_predict_layers field that the qwen35moe loader (and other MTP-aware model loaders) populate from the GGUF metadata. Returns 0 / false on non-MTP models. No behavior change for existing callers — these are accessors, not mutators. The MTP activation mechanism (setting llama_context_params.ctx_type = LLAMA_CONTEXT_TYPE_MTP and wiring up the common_speculative_* chain) remains opt-in exactly as merged in PR ggml-org#22964. Motivation: high-level bindings around llama.cpp construct contexts with a default ctx_type and have no easy way to decide "should I switch to MTP mode for this model?" — the nextn_predict_layers field isn't part of the public C API surface today. Exposing these helpers lets bindings add an opt-in WithAutoMTP option without forking the model struct.
baramofme
pushed a commit
to baramofme/llama-cpp-turboquant
that referenced
this pull request
May 23, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
winstonma
pushed a commit
to winstonma/llama.cpp
that referenced
this pull request
May 27, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
fewtarius
pushed a commit
to fewtarius/llama.cpp
that referenced
this pull request
May 30, 2026
* spec : update CLI arguments for better consistency * cont : fix CLI arg message
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
draft-[type]. For example:draft-simple,draft-mtp,draft-eagle3, etc.common_params_speculative_draftparams--spec-draft-*CLI arguments apply to any of thedraft-*typesRequirements