Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
4d74287
build : use umbrella Headers directory for XCFramework module map (#2…
gmarzjr Jun 4, 2026
4586479
webui: fix tool selector toggle/counter, key tools by stable identity…
ServeurpersoCom Jun 4, 2026
a121232
agents: refactor, include more guidelines (#24111)
ngxson Jun 4, 2026
6f3a9f3
server: avoid unnecessary checkpoint restore when new tokens are pres…
Abioy Jun 4, 2026
4c51309
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209)
sirohikartik Jun 4, 2026
e802356
convert: Fix Gemma 4 Unified conversion (#24118)
pcuenca Jun 4, 2026
0dbfa66
return filter to save memory (#24125)
forforever73 Jun 4, 2026
5269770
ui: added single line reasoning preview (#23601)
gugugiyu Jun 4, 2026
21444c8
ui: Fixed packages (#24119)
allozaur Jun 4, 2026
e7bcf1c
Move duplicated imatrix code into single common imatrix-loader.cpp (#…
bartowski1182 Jun 4, 2026
42b2d60
webui: [a11y] fix keyboard navigation issues in chat interface and si…
vignesh191 Jun 4, 2026
260862b
arg: fix double mtp downloads (#24128)
ngxson Jun 4, 2026
7c158fb
server : disable on-device spec checkpoints (#24108)
ggerganov Jun 4, 2026
7fe2ae4
sycl : port multi-column MMVQ from CUDA backend (#21845)
masonmilby Jun 5, 2026
46fa662
ci : build-msys job slimming [no ci] (#24157)
danbev Jun 5, 2026
2154a0f
CUDA: enroll mul_mat_vec_q_moe into pdl (#24087)
ORippler Jun 5, 2026
3ecfb15
kleidiai : dynamic chunck-based scheduling for hybrid execution (#23819)
chaxu01 Jun 5, 2026
7acb4e8
hparams : refactor `hparams.n_layer` (#24060)
ggerganov Jun 5, 2026
59917d3
minor : fix lint issues (#24165)
ggerganov Jun 5, 2026
ad1b88c
docs: Update quantization readme (#24133)
pcuenca Jun 5, 2026
cc7bef3
ui: add ignore-scripts=true to npmrc (#24149)
ngxson Jun 5, 2026
9c955c4
Fix link to available UI settings (#24169)
wariuccio Jun 5, 2026
2016bf2
ui: run npm install when package-lock.json is newer than node_modules…
ServeurpersoCom Jun 5, 2026
96fbe00
model : fix llama_model::n_gpu_layers() (#24188)
ggerganov Jun 5, 2026
86591c7
cli: fix model params not propagated (#23893)
therealkenc Jun 5, 2026
6effcec
TP: round up granularity to 128 (#24180)
JohannesGaessler Jun 5, 2026
64086f2
model, mtmd: Granite4 Vision (#23545)
gabe-l-hart Jun 5, 2026
c4a278d
model: fix build failed (#24193)
ngxson Jun 5, 2026
e82beaa
vulkan: add fwht support for Intel with shmem reduction (#23964)
0cc4m Jun 5, 2026
da87e9b
common/chat : unify and fix LFM2/LFM2.5 tool parser (#24178)
tdakhran Jun 5, 2026
308f61c
opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160)
lhez Jun 5, 2026
603300b
context : fix off-by-one comparisons to n_gpu_layers (#24208)
CISC Jun 6, 2026
5343f45
model : rename local n_layer_all variable (#24209)
CISC Jun 6, 2026
5a69c97
vulkan: check coopmat2 features before reporting support (#24186)
0cc4m Jun 6, 2026
f5c6ae1
mtmd, server: add "placeholder bitmap" for counting tokens , add */in…
ngxson Jun 6, 2026
588f0dc
completion : fix format specifier in LOG_INF (#24213)
angt Jun 6, 2026
6b80c74
completion : remove useless statics (#24226)
angt Jun 6, 2026
31e8249
mtmd: support "frame merge" for qwen-vl-based models (#21858)
ngxson Jun 6, 2026
98d5e8b
common/chat : fix LFM2/LFM2.5 reasoning round-trip and <think> leak (…
tdakhran Jun 6, 2026
3f7c79d
docker : bump cuda13 to 13.3.0 (#24228)
CISC Jun 7, 2026
f71af35
convert : fix Gemma4 with no audio encoder (#24242)
CISC Jun 7, 2026
465b1f0
arg: Skip mmproj download when user supplied mmproj (#24239)
konradmb Jun 7, 2026
8942789
chore(sync): merge upstream master into ht (42 commits, 006640408..46…
Jun 7, 2026
94246d1
chore(sync): adapt DFlash to hparams.n_layer() method post-#24060
Jun 7, 2026
9f003ed
llama: Gemma 4 MTP
am17an May 19, 2026
af56714
fix multi-seq
am17an May 19, 2026
6289feb
add assert that draft + shared kv should be on same device
am17an May 20, 2026
1cf8220
add Q rot when cache is quantized
am17an May 21, 2026
5fa9213
add temp hack to not use fit with gemma4, rm later
am17an May 28, 2026
5edc87f
add exception in test-llama-archs
am17an May 28, 2026
571a9dd
move assistant to separate file
am17an May 28, 2026
b300965
add unified assistant
am17an Jun 5, 2026
bcaf30d
cont : adjust to hparams changes
ggerganov Jun 5, 2026
57a2246
cont : avoid computations on the CPU
ggerganov Jun 5, 2026
93aa400
cont : clean-up
ggerganov Jun 6, 2026
89f00b7
cont : clean-up
ggerganov Jun 6, 2026
5af09f1
cont : fix handling of unused tensors
ggerganov Jun 6, 2026
1df52f7
cont : fix undefined
ggerganov Jun 6, 2026
86ef699
fix typo
CISC Jun 6, 2026
4278550
cont : enable gemma4 graph reuse
ggerganov Jun 7, 2026
05e89f8
cont : fix assert
ggerganov Jun 7, 2026
a66b027
cont : fix quantized cache
ggerganov Jun 7, 2026
7e2848a
cont : fix names
ggerganov Jun 7, 2026
b00c1d6
cont : fix names
ggerganov Jun 7, 2026
bf67004
cont : add reference for draft positions
ggerganov Jun 7, 2026
96a14a9
cont : fix multi-modality
ggerganov Jun 7, 2026
e10ad04
cont : add comment about ctx_src
ggerganov Jun 7, 2026
024ac5f
cont : clean-up server fit logic
ggerganov Jun 7, 2026
6caeb6a
cont : clean-up llama_context
ggerganov Jun 7, 2026
e41c9b0
py : fix names
ggerganov Jun 7, 2026
0f2f35a
cont : rename ctx_src -> ctx_other
ggerganov Jun 7, 2026
5e6dff2
chore(sync): drop intermediate llama_set_mtp_source call
Jun 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/workflows/build-msys.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ jobs:
fail-fast: false
matrix:
include:
- { sys: UCRT64, env: ucrt-x86_64, build: Release }
- { sys: CLANG64, env: clang-x86_64, build: Release }
- { sys: UCRT64, env: ucrt-x86_64, compiler: gcc, build: Release }
- { sys: CLANG64, env: clang-x86_64, compiler: clang, build: Release }

steps:
- name: Clone
Expand All @@ -48,9 +48,7 @@ jobs:
update: true
msystem: ${{matrix.sys}}
install: >-
base-devel
git
mingw-w64-${{matrix.env}}-toolchain
mingw-w64-${{matrix.env}}-${{matrix.compiler}}
mingw-w64-${{matrix.env}}-cmake
mingw-w64-${{matrix.env}}-openblas
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ jobs:
{ "tag": "cpu", "dockerfile": ".devops/s390x.Dockerfile", "platforms": "linux/s390x", "full": true, "light": true, "server": true, "free_disk_space": false, "runs_on": "ubuntu-24.04-s390x" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.8.1", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.8.1", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.1.1", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.1.1", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.3.0", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.3.0", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "musa", "dockerfile": ".devops/musa.Dockerfile", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "intel", "dockerfile": ".devops/intel.Dockerfile", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "vulkan", "dockerfile": ".devops/vulkan.Dockerfile", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": false, "runs_on": "ubuntu-24.04" },
Expand Down
4 changes: 2 additions & 2 deletions .pi/gg/SYSTEM.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Pull requests (PRs):
- New branch names are prefixed with "gg/"
- Before opening a pull request, ask the user to confirm the description
- When creating a pull request, look for the repository's PR template and follow it
- For the AI usage disclosure section, write "YES. llama.cpp + pi + [MODEL]"
- For the AI usage disclosure section, write "YES. pi:llama.cpp/[MODEL]"
- Ask the user to tell you what model was used and write it in place of [MODEL]
- Always create the pull requests in draft mode

Commits:
- On every commit that you make, include a "Assisted-by: llama.cpp:local pi" tag
- On every commit that you make, include a "Assisted-by: pi:llama.cpp/[MODEL]" tag
- Do not explicitly set the git author in commits - rely on the default git config
- Always use `--no-gpg-sign` when committing
- Never `git push` without explicit confirmation from the user
Expand Down
9 changes: 1 addition & 8 deletions build-xcframework.sh
Original file line number Diff line number Diff line change
Expand Up @@ -130,14 +130,7 @@ setup_framework_structure() {
# Create module map (common for all platforms)
cat > ${module_path}module.modulemap << EOF
framework module llama {
header "llama.h"
header "ggml.h"
header "ggml-alloc.h"
header "ggml-backend.h"
header "ggml-metal.h"
header "ggml-cpu.h"
header "ggml-blas.h"
header "gguf.h"
umbrella "Headers"
link "c++"
link framework "Accelerate"
Expand Down
2 changes: 2 additions & 0 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ add_library(${TARGET}
hf-cache.cpp
hf-cache.h
http.h
imatrix-loader.cpp
imatrix-loader.h
json-partial.cpp
json-partial.h
json-schema-to-grammar.cpp
Expand Down
14 changes: 10 additions & 4 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -446,7 +446,13 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
opts.offline = params.offline;
opts.skip_download = params.skip_download;
opts.download_mtp = spec_type_draft_mtp;
opts.download_mmproj = !params.no_mmproj;
opts.download_mmproj = !params.no_mmproj && params.mmproj.path.empty() && params.mmproj.url.empty();

// sub-models (draft, mmproj, vocoder) are explicitly specified by the user,
// so we should not auto-discover mtp/mmproj siblings for them
common_download_opts sub_opts = opts;
sub_opts.download_mtp = false;
sub_opts.download_mmproj = false;

try {
auto res = common_params_handle_model(params.model, opts);
Expand All @@ -459,7 +465,7 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
// only download mmproj if the current example is using it
for (const auto & ex : mmproj_examples) {
if (curr_ex == ex) {
common_params_handle_model(params.mmproj, opts);
common_params_handle_model(params.mmproj, sub_opts);
break;
}
}
Expand All @@ -472,8 +478,8 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
params.speculative.draft.mparams.url.empty()) {
params.speculative.draft.mparams.path = res.mtp.path;
}
common_params_handle_model(params.speculative.draft.mparams, opts);
common_params_handle_model(params.vocoder.model, opts);
common_params_handle_model(params.speculative.draft.mparams, sub_opts);
common_params_handle_model(params.vocoder.model, sub_opts);
return true;
} catch (const common_skip_download_exception &) {
return false;
Expand Down
60 changes: 52 additions & 8 deletions common/chat-peg-parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ static std::string normalize_quotes_to_json(const std::string & input) {
bool in_single_quoted = false;
bool in_double_quoted = false;

auto is_word_char = [](char ch) { return std::isalnum(static_cast<unsigned char>(ch)) || ch == '_'; };

for (size_t i = 0; i < input.size(); ++i) {
char c = input[i];

Expand Down Expand Up @@ -151,6 +153,29 @@ static std::string normalize_quotes_to_json(const std::string & input) {
in_single_quoted = true;
result += '"';
}
} else if (!in_single_quoted && !in_double_quoted && (c == 'T' || c == 'F' || c == 'N') &&
(i == 0 || !is_word_char(input[i - 1]))) {
// Python literals -> JSON; prefix match keeps streamed partials monotonic.
static constexpr std::pair<std::string_view, std::string_view> literals[] = {
{ "True", "true" }, { "False", "false" }, { "None", "null" },
};
size_t n = 0;
while (i + n < input.size() && is_word_char(input[i + n])) {
++n;
}
std::string_view token(input.data() + i, n);
bool matched = false;
for (const auto & [py, js] : literals) {
if (py.substr(0, n) == token) {
result += js.substr(0, n);
i += n - 1;
matched = true;
break;
}
}
if (!matched) {
result += c;
}
} else {
result += c;
}
Expand Down Expand Up @@ -353,12 +378,8 @@ void common_chat_peg_mapper::map(const common_peg_ast_node & node) {
}
value_to_add += escape_json_string_inner(value_content);
} else if (!value_content.empty()) {
// For potential containers, normalize Python-style single quotes to JSON double quotes
bool is_potential_container = value_content[0] == '[' || value_content[0] == '{';
if (is_potential_container) {
value_content = normalize_container_value(value_content);
}
value_to_add += value_content;
// Pythonic scalars/containers -> JSON.
value_to_add += normalize_container_value(value_content);
}

args_target() += value_to_add;
Expand Down Expand Up @@ -466,11 +487,34 @@ common_peg_parser common_chat_peg_builder::standard_constructed_tools(
return force_tool_calls ? section : optional(section);
}

// Like python_value(), but the leaf also accepts JSON-cased true/false/null, used by LFM2/LFM2.5
common_peg_parser common_chat_peg_builder::python_or_json_value() {
return rule("python-or-json-value", [this]() {
auto ws = space();
auto value = python_or_json_value();

auto member = sequence({ python_string(), ws, literal(":"), ws, value });
auto members = sequence({ member, zero_or_more(sequence({ ws, literal(","), ws, member })) });
auto dict = rule("python-or-json-dict", [&]() {
return sequence({ literal("{"), ws, choice({ literal("}"), sequence({ members, ws, literal("}") }) }), ws });
});

auto elements = sequence({ value, zero_or_more(sequence({ literal(","), ws, value })) });
auto array = rule("python-or-json-array", [&]() {
return sequence({ literal("["), ws, choice({ literal("]"), sequence({ elements, ws, literal("]") }) }), ws });
});

return choice({ dict, array, python_string(), python_number(),
python_bool(), python_null(), json_bool(), json_null() });
});
}

// Python-style tool calls: name(arg1="value1", arg2=123)
// Used only by LFM2 for now, so we don't merge it into autoparser
common_peg_parser common_chat_peg_builder::python_style_tool_calls(
const ordered_json & tools,
bool parallel_tool_calls) {
bool parallel_tool_calls,
bool allow_json_literals) {
if (!tools.is_array() || tools.empty()) {
return eps();
}
Expand Down Expand Up @@ -504,7 +548,7 @@ common_peg_parser common_chat_peg_builder::python_style_tool_calls(
if (is_string_type) {
arg_value_parser = string_value_parser;
} else {
arg_value_parser = tool_arg_value(python_value());
arg_value_parser = tool_arg_value(allow_json_literals ? python_or_json_value() : python_value());
}

// Full argument: name="value" or name=value
Expand Down
7 changes: 5 additions & 2 deletions common/chat-peg-parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,9 +132,13 @@ class common_chat_peg_builder : public common_peg_parser_builder {
// Helper for Python-style function call format: name(arg1="value1", arg2=123)
// Used by LFM2 and similar templates
common_peg_parser python_style_tool_calls(const nlohmann::ordered_json & tools,
bool parallel_tool_calls);
bool parallel_tool_calls,
bool allow_json_literals);

private:
// Python values plus JSON true/false/null.
common_peg_parser python_or_json_value();

// Implementation helpers for standard_json_tools — one per JSON tool call layout mode
common_peg_parser build_json_tools_function_is_key(const nlohmann::ordered_json & tools,
const std::string & args_key,
Expand Down Expand Up @@ -195,4 +199,3 @@ struct tagged_peg_parser {

tagged_peg_parser build_tagged_peg_parser(
const std::function<common_peg_parser(common_peg_parser_builder & builder)> & fn);

Loading
Loading